Thursday 5 May 2011

Sequence Analysis

There is an R package for biological sequence analysis called seqinr.


> install.packages("seqinr")
> library("seqinr")
> mature <- read.fasta(file="mature.fa") # This reads in the database of miRNA fragments> length(mature) # How many entries are there in the database
> mirna <-mature[[1]] # Load the first sequence into mirna> table(mirna)
mirna
a g t
5 8 9
> count(mirna, 1) # Counts the number of bases

a c g t
5 0 8 9

> count(mirna, 2) # Counts the number of two letter words

aa ac ag at ca cc cg ct ga gc gg gt ta tc tg tt 
0 0 4 1 0 0 0 0 1 0 2 5 4 0 2 2

You can keep going up to longer and longer words but beyond 6 is going to be a waste of computing time as it will be very sparsely populated.

No comments:

Post a Comment