Learning Bioinformatics: 2. Using of a BLAST search to edit a Trace file

BLAST stands for 'Basic Local Alignment Search Tool'. It is an online(free) computer program that identifies homologous genes in different organisms. There are different types of BLAST program in internet, for example- nucleotide blast, protein blast, blastx, tblastx and tblastn.(http://blast.ncbi.nlm.nih.gov/Blast.cgi).

We will only discuss with Nucleotide blast here-

If you give a sample for sequencing, operator will provide you a trace file like 'abc.ab1'. ab1 is the file extension of a trace file. Open the .ab1 file with Chromas software and it would be like this--

'.ab1' file

2. In the chromatogram, the initial and ending peaks are always clumsy. This is a normal happening in all sorts of sequences. So remove the nucleotides (nt) from first and last part of the sequences till you get a clear peak. The number of removed nucleotide can be 20-30 or more, it depends on the quality of the sequence.

##How to: - Select the first nt from where the following nt peaks are good; then click Edit/set left cut off

- Select the nt at the end with same perception; then click Edit/set right cut off

- Now click edit/delete cut-off sequences.

Now the bad sequences have been removed.

3. Click File/Blast search, a dialog box will open, just click ok.

4. Then another dialog box will open, and set all the parameters like this picture

5. Click ‘View report’ and the result page will be like this

6. The Red bars in the picture shows the similarity among your sequence and other related sequences from the GeneBank database. The first one is the most similar, then the second one and….. (I will discuss the others parameters part by part later), don’t close the Chromas window.

7. Now the editing part begins-

The ‘Query’ sequence is your sequence and others are most similar sequences (listed according to highest similarity). Dot lines means the nt is same. Now you have to check your sequence.

8. Go to the chromas window and check the dissimilar nt’s. For example- you have ‘A’ in query sequence but the similar sequences suggest ‘T’. Look at your sequence peak of that nt. If you see a clear peak of ‘A (green)’ that means your sequence nt seems fine, so you don’t have to edit this nt. But if you see a good peak of ‘T (red)’ rather than ‘A(green)’ then delete ‘A’ from there and put a small ‘t’.

9. ‘N’ means the sequencer was unable to detect the actual nt, so with observing and comparing the peak you can put the correct nt there.

10. ‘ \’ means the probability that the sequencer missed one or couple of nts in that position. Put the nt/nts in that position if you think the suggestions of those similar sequences are correct.

11. Edit all the misleading nt position is this way.

12. Edit both sequences (with forward and reverse primer) in this way.

13. At end, save the file in ‘.scf’ format.

This editing part is easy though it looks like little tough. Basically it will be the easiest part when you get used to with sequences regularly.

Thanks.

Learning Bioinformatics

Pages

Search This Blog

August 25, 2011

2. Using of a BLAST search to edit a Trace file

No comments:

Post a Comment