CLC Bioinformatics Cell - hardware-acceleration on your desktop

Do you know how many search results you are missing when using BLAST? Up to 50% of the hits are not found using BLAST. If you want 100% of the answers, and still want fast searches, you can improve the quality of your research by using the CLC Bioinformatics Cell.

The Cell includes the fastest Smith Waterman implementation ever made on standard hardware - nucleotide searches are accelerated up to 110 times and protein searches are accelerated up to 50 times on new computers.

3 Cases

Below are three examples of situations where it is critical to use high quality, high sensitivity Smith-Waterman searches instead of BLAST - if the correct results are of the essence.

Gene "knock-down" using siRNA

siRNAs are small (~20 nt) stretches of double stranded RNA which can cause the degradation of mRNA molecules with complementary sequence motifs. siRNAs are effective and popular tools for studying "knock-down" of selected genes and are intensively studied for their use as therapeutical agents.

However, in both cases it is important to ensure that strong cross-hybridization to non-target genes does not occur as this can lead to inaccurate results or harmful side effects.

Thus, the design phase on an siRNA includes a sequence analysis to verify whether candidate oligos can cross-hybridize with known sequences.

BLAST is commonly used for this analysis but is a bad choice because it is based on the initial seeding of alignments from small exact matching words. Due to the short length of siRNAs, the presence of only a few mismatches or gaps can prevent any alignment seeds from being recognized within the siRNA molecule and potential cross-matches can therefore be missed in the BLAST search.

In comparison, the Smith-Waterman algorithm does not depend on alignment seeding and is guaranteed to find the optimal local alignment and is thus a superior tool for the task of recognizing these important cross-matches.

Protein characterization in general

"To predict functions of a possible protein product of any new or an uncharacterized DNA sequence, it is important first to detect all significant similarities between the encoded amino acid sequence and any accumulated protein sequence data. We have implemented a set of queries and database sequences and proceeded to test and compare various similarity search methods and their parameterizations. We demonstrate here that the Smith-Waterman (S-W) dynamic programming method and the optimized version of FASTA are significantly better able to distinguish true similarities from statistical noise than is the popular database search tool BLAST"1

1 E.G. Shpaer, M. Robinson, D. Yee, J.D. Candlin, R. Mines, and T. Hunkapiller, "Sensitivity and Selectivity in Protein Similarity Searches: A Comparison of Smith-Waterman in Hardware to BLAST and FASTA", Genomics 38, 179-191 (1996).

Studies of protein functions - mosaic protein

Protein function is often studied by homology comparisons to known proteins available from various public databases. Protein function is highly correlated with available protein domains thus protein families often carry the same protein domains.

Global alignment methods will not be able to detect similar protein domains in two evolutionary related proteins. E.g. a global alignment method will not be able to detect the relationship between the calcium binding domain in calmodulin and calpain, thus the usage of local alignment methods is obvious. BLAST, a heuristic method, has become a de facto standard for local alignments because it is fast and fairly accurate.

Nevertheless, very precise methods like Smith-Waterman are preferred because the finding of the best local alignment is guaranteed.

On the contrary, BLAST initially indexes the searched sequence and searches for small exact matches between the query sequence and the database. After the initial finding the alignment is extended in both directions.

This is not the optimal approach when searching for mosaic proteins where it is likely that BLAST searches will fail and not find all available homologous proteins.

Smith-Waterman searches are thus likely to generate far more hit sequences.

 

 

 

With the Cell, you can speed up a Smith Waterman search previously taking 2 hours to around 1 minute, and the Cell thus removes the argument for using BLAST
With the Cell, you can speed up a Smith Waterman search previously taking 2 hours to around 1 minute, and the Cell thus removes the argument for using BLAST Visit clcbio.com Visit clcbio.com
Apple logo