2012年12月6日 星期四

illumin read and phred 33


Illumina reads:
@ + [Read Name] + [Paried_Direction(1/2)]
@HWI-ST688:211:C0F02ACXX:6:1101:12568:67545 2:N:0:AGTCAA
GGGAGGAAGGTGCAGGTCCCTCTGCCCTTTCTGCCAAGGTGCAGAATAGCGCCCGGGCGTGTGTTTTGGCTCCAGAGCAGTTCCACGTGGAGCAACTTCGT
+
BCCFFFFFHHFHHJJJHIJJJJJJJJJIJJJJJJJJIJJ?FGHJGIGIHIGGHIIJHHD>@;AACDDDD:ACDDDDDDDDCDDDEDDBDDBDDBDDDDD@#
@HWI-ST688:211:C0F02ACXX:6:1101:12568:67545 1:N:0:AGTCAA
CCTCCTCACAGATCAAGTACACAACACACACACACACACACACACACACACACGAAGTTGCTCCACGTGGAACTGCTCTGGAACCAAAACACACGCCCGGG
+
CC@FFFFFHHHHHJJJJGIJJJJJIJJJJJIJJJJJJJJJJJIJJJJJJJJIHFEFF?>ACACDCDD?ABDDDCC>ACDCDC(9<ABBBDDDBB>>BB<55

[From http://en.wikipedia.org/wiki/FASTQ_format]

--------------------------------
TruSeq / Quality 101 / Quality Scores Overview
Quality Scores
Quality scores measure the probability that a base is called incorrectly. With SBS technology, each base in a read is assigned a quality score by a phred-like algorithm1,2, similar to that originally developed for Sanger sequencing experiments. The quality score of a given base, Q, is defined by the equation
Q = -10log10(e)
where e is the estimated probability of the base call being wrong. Thus, a higher quality score indicates a smaller probability of error. In the table below, a quality score of 20 represents an error rate of 1 in 100, with a corresponding call accuracy of 99%.
The Relationship Between Quality Score and Base Call Accuracy Quality Score Probability of Incorrect Base Call Inferred Base Call Accuracy
10 (Q10) 1 in 10 90%
20 (Q20) 1 in 100 99%
30 (Q30) 1 in 1000 99.9%
From [http://www.illumina.com/truseq/quality_101/quality_scores.ilmn]

Phred 33:
How are qualities scaled?
Q = ord(q) - 33
q = chr(Q+33)
Q  integer quality
q  character representation
[From http://www.google.com/url?q=http://faculty.washington.edu/jht/GS373_2010/lectures/G373_Shendure_Wk9_Monday_lec24.pdf&sa=U&ei=-4fBULi5DM6WmQWc9YCgDA&ved=0CBcQFjAA&sig2=cLOTAx1vQje5eSYCseyaOg&usg=AFQjCNFWi32peUKA4MaYr6_dEEGgGDD7WA]

沒有留言:

張貼留言