python regex- getting everything (except \n) between two characters in a multiline string -

- September 15, 2012

i have file input:

>x0 cuugacgauca cgcaucg >x55 uacggcgg uucagc aucg >x300 aaacccgggg

and need concatenation of lines between '>' characters:

cuugacgaucacgcaucg uacggcgguucagcaucg aaacccgggg

my attempt use "re.match(r'^>.*\n(.*)>.*' ,a,re.dotall)" , delete '\n' each match, regex not returning anything. wrong?

some people, when confronted problem, think "i know, i'll use regular expressions." have 2 problems. - jamie zawinski

that being said, why not more understandable string processing?

tmp = [] seqs = [] open('txtfile') f:     line in f:         if line.startswith('>'):             seqs.append(''.join(tmp))             tmp = []         else:             tmp.append(line.strip())     else:         seqs.pop(0)         seqs.append(''.join(tmp))

alternatively, if want use regex, try first stripping newlines , splitting >x[digit] patterns:

re.split(r'>x\d+', re.sub(r'\n', '', data))

but has downside entire textfile has loaded variable data, not interesting large file (which in bio-informatics quite common). then, approach given first more interesting, memory-wise, process each finished dna/rna-sequence in turn.

Search This Blog

Crty

python regex- getting everything (except \n) between two characters in a multiline string -

Comments

Post a Comment

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -