Click here to DOWNLOAD SPCR 3.0!
Cases ever used to test and SPCR program
(1) Predict 16S rRNA gene from bacteria genomes
Primer sequences:
Primer1: GAGAGTTTGATCCTGGCTCAG
Primer2: CTACGGCTACCTTGTTACGA
Parameters:
I up =0.85 £¬ I dn =0.85 £¬ P a =0.85 £¬ L max =3200bp £¬ L min =200bp
Templates:
Select completed genome sequences of 59 eubacteria as templates, they are included in table as follow:
Bacteria Name |
Accession Number |
Agrobacterium tumefaciens str. C58 (U. Washington) |
NC_003304,NC_003305 |
Aquifex aeolicus |
NC_000918 |
Bacillus subtilis |
NC_000964 |
Bacteroides thetaiotaomicron VPI-5482 |
NC_004663 |
BBUR Borrelia burgdorferi |
NC_001318 |
Bifidobacterium longum NCC2705 |
NC_004307 |
Bordetella pertussis |
NC_002929 |
Bradyrhizobium japonicum |
NC_004463 |
Brucella melitensis |
NC_003317,NC_003318 |
Buchnera aphidicola str. APS (Acyrthosiphon pisum) |
NC_002528 |
Candidatus Blochmannia floridanus |
NC_005061 |
Caulobacter crescentus CB15 |
NC_002696 |
Chlamydia muridarum |
NC_002620 |
Chlamydophila pneumoniae TW-183 |
NC_005043 |
Chlorobium tepidum TLS |
NC_002932 |
Chromobacterium violaceum ATCC 12472 |
NC_005085 |
Clostridium tetani E88 |
NC_004557 |
Corynebacterium efficiens YS-314 |
NC_004369 |
Coxiella burnetii RSA 493 |
NC_002971 |
Deinococcus radiodurans |
NC_001263,NC_001264 |
Escherichia coli K12 |
NC_000913 |
Fusobacterium nucleatum subsp. nucleatum ATCC 25586 |
NC_003454 |
Haemophilus ducreyi 35000HP |
NC_002940 |
Helicobacter hepaticus ATCC 51449 |
NC_004917 |
Lactobacillus plantarum WCFS1 |
NC_004567 |
Lactococcus lactis subsp. lactis |
NC_002662 |
Leptospira interrogans serovar lai str. 56601 |
NC_004342,NC_004343 |
Listeria innocua Clip11262 |
NC_003212 |
Mycobacterium tuberculosis H37Rv |
NC_000962 |
Mycoplasma gallisepticum R |
NC_004829 |
Neisseria meningitidis serogroup B strain MC58 |
NC_003112 |
Nitrosomonas europaea ATCC 19718 |
NC_004757 |
Nostoc sp. PCC 7120 |
NC_003272 |
Oceanobacillus iheyensis HTE831 |
NC_004193 |
Pirellula sp. |
NC_005027 |
Porphyromonas gingivalis W83 |
NC_002950 |
Prochlorococcus marinus str. MIT 9313 |
NC_005071 |
Pseudomonas aeruginosa PA01 |
NC_002516 |
Ralstonia solanacearum |
NC_003295 |
Rickettsia conorii Malish 7 |
NC_003103 |
Salmonella enterica subsp. enterica serovar Typhi |
NC_003198 |
Shewanella oneidensis MR-1 |
NC_004347 |
Shigella flexneri 2a str. 2457T |
NC_004741 |
Sinorhizobium meliloti 1021 |
NC_003047 |
Staphylococcus aureus subsp. aureus MW2 |
NC_003923 |
Streptococcus agalactiae |
NC_004116 |
Streptomyces avermitilis MA-4680 |
NC_003155 |
Synechococcus sp. WH 8102 |
NC_005070 |
Thermoanaerobacter tengcongensis strain MB4T |
NC_003869 |
Thermosynechococcus elongatus BP-1 |
NC_004113 |
Thermotoga maritima |
NC_000853 |
Treponema pallidum |
NC_000919 |
Tropheryma whipplei TW08/27 |
NC_004551 |
Ureaplasma urealyticum |
NC_002162 |
Vibrio cholerae |
NC_002505,NC_002506 |
Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis |
NC_004344 |
Xanthomonas axonopodis pv. citri str. 306 |
NC_003919 |
Xylella fastidiosa Temecula1 |
NC_004556 |
Yersinia pestis strain CO92 |
NC_003143 |
Results:
The number, location and direction of 16S rRNAs in 52 prediction results is completely same as data in GenBank, and some variances found in other 7 bacteria. The variances and reasons listed as follow.
Num |
Bacteria Name |
SPCR |
Plus |
Minus |
GenBank |
Plus |
Minus |
A |
BBUR Borrelia burgdorferi |
1 |
0 |
1 |
1 |
1 |
0 |
B |
Clostridium tetani E88 |
6 |
2 |
4 |
5 |
2 |
3 |
C |
Lactococcus lactis |
12 |
2 |
10 |
6 |
1 |
5 |
D |
Salmonella enterica subsp. Typhi |
8 |
2 |
6 |
7 |
2 |
5 |
E |
Ureaplasma urealyticum |
4 |
4 |
0 |
2 |
2 |
0 |
F |
Vibrio cholerae |
8 |
5 |
3 |
8 |
7 |
1 |
G |
Yersinia pestis strain CO92 |
5 |
2 |
3 |
6 |
2 |
4 |
Verification:
We verified the 7 different results through sequence alignment and BLAST, and found the reasons that caused those variances.
(2) Predict SOX gene family from homo sapiens genome
Primer sequences:
Primer1: CCMATGAAYGCSTTYATSGTSTGG
Primer2: GGYYKRTAYTTRTARTYSGG
Parameters:
I up =0.9, I dn =0.9, P a =0.9, L max =800, L min =80
Templates:
Homo sapiens whole genome (build 25)
Results:
No. |
P a |
Length |
Chromosome |
I up |
I dn |
SOX genes |
Spcr_seq1 |
0.971468 |
205 |
13 |
1.000000 |
0.971468 |
SOX21 |
Spcr_seq2 |
0.947803 |
205 |
17 |
0.975639 |
0.971468 |
SOX20 |
Spcr_seq3 |
0.925834 |
205 |
13 |
1.000000 |
0.925834 |
SOX1 |
Spcr_seq4 |
0.931180 |
205 |
20 |
0.958528 |
0.971468 |
SOX22 |
Spcr_seq5 |
0.925834 |
401 |
20 |
1.000000 |
0.925834 |
SOX18 |
Spcr_seq6 |
0.983530 |
205 |
3 |
0.983530 |
1.000000 |
SOX14 |
Spcr_seq7 |
0.971468 |
205 |
6 |
1.000000 |
0.971468 |
SOX4 |
Spcr_seq8 |
0.912196 |
203 |
8 |
0.940659 |
0.969741 |
SOX29 |
Spcr_seq9 |
0.954924 |
205 |
X |
0.983530 |
0.970915 |
SOX3 |
Spcr_seq10 |
0.952519 |
205 |
Y |
0.983530 |
0.968469 |
SRY |
Verification:
Through sequence alignment with SOX gene family and BLAST analysis, each of SPCR products can found their counterpart in SOX gene family. But the number of SOX gene in human genome seems more than 10, so we decreased the value of parameters group to I up =0.85, I dn =0.85, P a =0.85, L max =800, L min =80, but we get only 2 more SOX homologous gene and other 29 unrelated products.
(3) Predict ARR5, ARR7 and GEN12, GEN13 gene family (VPCR)
We repeated the experiments used in VPCR paper with our SPCR program, the results are much better than VPCR.
Primer sequences:
ARR5:
ARR5a = GTTGATTCTCTCTATCTCTCTCACG
ARR5b = CACACCACCATTTTACATATCTC
ARR7:
ARR7a = GTTGGTGAGGTCATGAGGATGGAGATTC
ARR7b = GTTTTGCTAAGGTCTTGGCCTCTATACAT
GEN12:
GEN1 = CATGTTCTTGCYGTYGATGAYAGT
GEN2 = CCAGTCATKCCAGGCATWSAG
GEN13:
GEN1 = CATGTTCTTGCYGTYGATGAYAGT
GEN3 = ATAARAAATCYTCAGCWCCTTC
Parameters:
ARR5 and ARR7: I up =0.90 £¬ I dn =0.90 £¬ P a =0.90 £¬ L max =3200bp £¬ L min =200bp
GEN12 and GEN13: I up =0.85 £¬ I dn =0.85 £¬ P a =0.85 £¬ L max =3200bp £¬ L min =200bp
Templates:
Arabidopsis thaliana genome sequences
Results:
Prediction of ARR5 and ARR7
Gene |
P a |
Length |
Chromosome |
Start |
End |
I up |
I dn |
ARR5 |
1.0 |
1929 |
3 |
17635890 |
17637819 |
1.0 |
1.0 |
ARR7 |
1.0 |
1187 |
1 |
6577884 |
6579071 |
1.0 |
1.0 |
Prediction of GEN12
P a |
Product Length |
Chromosome |
Start |
End |
I up |
I dn |
0.972001 |
405 |
3 |
17637290 |
17637695 |
0.972001 |
1.000000 |
0.939634 |
380 |
1 |
3443279 |
3443659 |
1.000000 |
0.939634 |
0.929476 |
367 |
5 |
24547541 |
24547908 |
0.953077 |
0.975237 |
0.918687 |
440 |
2 |
17170848 |
17171288 |
0.963917 |
0.953077 |
0.892865 |
554 |
1 |
6578451 |
6579005 |
0.892865 |
1.000000 |
0.863344 |
392 |
2 |
16918887 |
16919279 |
0.925680 |
0.932659 |
Prediction of GEN13
P a |
Product Length |
Chromosome |
Start |
End |
I up |
I dn |
1.000000 |
1410 |
3 |
17636285 |
17637695 |
1.000000 |
1.000000 |
0.933932 |
872 |
1 |
6578133 |
6579005 |
0.933932 |
1.000000 |
0.920168 |
1186 |
5 |
24546722 |
24547908 |
0.943533 |
0.975237 |
0.912305 |
689 |
1 |
3442970 |
3443659 |
0.970915 |
0.939634 |
0.896893 |
833 |
1 |
27343239 |
27344072 |
0.896893 |
1.000000 |
0.889511 |
828 |
3 |
20987119 |
20987947 |
0.963917 |
0.922808 |
0.889511 |
936 |
2 |
17170848 |
17171784 |
0.963917 |
0.922808 |
Verification:
SPCR predicted ARR5 and ARR7 gene exactly from Arabidopsis thaliana genome sequences. In GEN12 and GEN13 cases, SPCR predicted 6 products with GEN12 and 7 products with GEN13. Through BLAST analysis, we can prove all of predicted GEN12 and GEN13 sequences are ARR family gene. This result is much better than VPCR, we almost can not find any significant relationship between VPCR prediction results and real world PCR control, and at the same time, the results of SPCR can match with real world PCR almost perfectly.
(4) Predict products of ERIC-PCR from E. coli genome
Primer sequences:
Primer 1(ERIC2): AAGTAAGTGACTGGGGTGAGCG
Primer 2(ERIC1R): ATGTAAGCTCCTGGGGATTCAC
Parameters:
I up =0.80 £¬ I dn =0.80 £¬ P a =0.70 £¬ L max =3200bp £¬ L min =200bp
Templates:
E. coli MG1655 genome sequence
Results:
Prediction of E. coli ERIC-PCR
P a |
Product Length |
Direction |
Start |
End |
I up |
I dn |
0.795308 |
3184 |
1->2 |
2651306 |
2654490 |
0.850454 |
0.935157 |
0.706402 |
3088 |
1->1 |
1276797 |
1279885 |
0.830617 |
0.850454 |
0.702913 |
2955 |
2->1 |
1894652 |
1897607 |
0.817093 |
0.86026 |
0.788584 |
2934 |
1->1 |
127817 |
130751 |
0.973009 |
0.810459 |
0.778256 |
2789 |
1->2 |
250611 |
253400 |
0.800251 |
0.972514 |
0.703985 |
2719 |
1->2 |
3724115 |
3726834 |
0.840574 |
0.837505 |
0.829637 |
2700 |
2->1 |
1275937 |
1278637 |
0.877381 |
0.945583 |
0.703092 |
2674 |
1->2 |
3779473 |
3782147 |
0.810459 |
0.867523 |
0.711631 |
2535 |
1->1 |
2789722 |
2792257 |
0.800251 |
0.889259 |
0.784583 |
2244 |
2->2 |
251156 |
253400 |
0.806758 |
0.972514 |
0.709959 |
2069 |
1->2 |
4175218 |
4177287 |
0.800251 |
0.887171 |
0.774671 |
2029 |
1->2 |
190675 |
192704 |
0.936337 |
0.827341 |
0.806758 |
1935 |
2->2 |
802608 |
804543 |
1.0 |
0.806758 |
0.706402 |
1935 |
1->1 |
285167 |
287102 |
0.850454 |
0.830617 |
0.827499 |
1903 |
1->1 |
435533 |
437436 |
0.850454 |
0.973009 |
0.785417 |
1840 |
1->1 |
1276797 |
1278637 |
0.830617 |
0.945583 |
0.703092 |
1740 |
2->1 |
2138318 |
2140058 |
0.867523 |
0.810459 |
0.741863 |
1476 |
1->1 |
2654491 |
2655967 |
0.927038 |
0.800251 |
0.772629 |
1330 |
2->1 |
1277307 |
1278637 |
0.817093 |
0.945583 |
0.711631 |
1322 |
1->1 |
2790935 |
2792257 |
0.800251 |
0.889259 |
0.71487 |
1078 |
1->1 |
3137589 |
3138667 |
0.840574 |
0.850454 |
0.810459 |
922 |
1->1 |
595343 |
596265 |
0.810459 |
1.0 |
0.701414 |
555 |
2->2 |
3072576 |
3073131 |
0.837505 |
0.837505 |
0.747895 |
487 |
1->2 |
2654491 |
2654978 |
0.927038 |
0.806758 |
0.8149 |
464 |
1->2 |
127817 |
128281 |
0.973009 |
0.837505 |
0.701874 |
260 |
2->1 |
699036 |
699296 |
0.806758 |
0.869994 |
0.749305 |
220 |
1->1 |
4580750 |
4580970 |
0.936337 |
0.800251 |
Verification:
ERIC-PCR is a kind of random amplified PCR; the process in molecular level is very complex so it's very hard to find a proper mathematical model to describe this process, and very difficult to simulate this process with computational methods. But in spite of this, the counterparts of most of significant bands of real ERIC-PCR pattern can be found in SPCR prediction pattern.
(5) Predict PCR product of Coronavirus genomes
Primer sequences:
Primer pair1 £º
upstream £º IN-2(+) £º 5'-GGGTTGGGACTATCCTAAGTGTGA-3'
downstream £º IN-4(-) £º 5'-TAACACACAACNCCATCATCA-3'
Primer pair 2 £º
upstream £º 2Bp: 5'- ACTCARWTRAATYTNAAATAYGC-3'
downstream £º 4Bm: 5'- TCACAYTTWGGATARTCCCA-3'
Parameters:
I up =0. 90 £¬ I dn =0. 90 £¬ P a =0. 90 £¬ L max =3200bp £¬ L min =200bp
Templates:
7 complete coronavirus genomes
1. Porcine epidemic diarrhea virus ( 28,033 bp)
2. Human coronavirus 229E ( 27,317 bp)
3. Transmissible gastroenteritis virus ( 28,586 bp)
4. Murine hepatitis virus ( 31,100 bp)
5. Bovine coronavirus ( 31,276 bp)
6. Avian infectious bronchitis virus ( 27,608 bp)
7. SARS coronavirus ( 29,751 bp)
Results :
Number |
Accession Number |
Length (bp) |
SPCR product of Primer1 pair1 (IN-2(+)/IN-4(-)) |
SPCR product of Primer1 pair2 (2Bp/4Bm) |
1 |
NC_003436 |
28,033 |
14424 (453bp) |
14197(251bp) |
2 |
AF304460 |
27,317 |
14324(453bp) |
13915(251bp) |
3 |
AJ271965 |
28,586 |
14142(453bp) |
14921(251bp) |
4 |
AF220295 |
31,100 |
15148(453bp) |
14097(251bp) |
5 |
AF201929 |
31,276 |
15246(453bp) |
14987(251bp) |
6 |
M95169 |
27,608 |
14179(453bp) |
13952(251bp) |
7 |
AY274119 |
29,751 |
15214(453bp) |
15019(251bp) |
The result of PCR production of 7 complete coronavirus genomes using SPCR software prediction.
The PCR product length using primer pair1 is 453bp £» The PCR product length using primer pair2 is 251bp
Templates:
14 other coronavirus species containing polymerase gene
1. Murine-Cov open reading frame 1a (gene 1), complete cds and open reading frame 1b (gene 1), 3' end.
2. Human coronavirus 229E mRNA for RNA polymerase and proteases.
3. Transmissible gastroenteritis virus (Purdue-115) mRNA for polymerase locus.
4. Mouse hepatitis virus RNA for viral polymerase open reading frame 1b.
5. Bovine coronavirus RNA-directed RNA polymerase (pol) gene
6. Avian infectious bronchitis virus mRNA for chimeric gene.
7. Human coronavirus (strain OC43) RNA-directed RNA polymerase (pol) gene.
8. Turkey coronavirus RNA-directed RNA polymerase (pol) gene.
9. Rat sialodacryoadenitis coronavirus RNA-directed RNA polymerase (pol) gene.
10. Porcine hemagglutinating encephalomyelitis virus RNA-directed RNA polymerase (pol) gene.
11. Feline infectious peritonitis virus RNA-directed RNA polymerase (pol) gene.
12. Porcine transmissible gastroenteritis virus RNA-directed RNA polymerase (pol) gene.
13. Infectious bronchitis virus RNA (defective RNA CD-61).
14. Canine coronavirus RNA-directed RNA polymerase (pol) gene, partial cds.
Results :
Number |
Accession Number |
Length (bp) |
Primer pair1 (IN-2(+)/IN-4(-)) |
Primer pair2 (2Bp/4Bm) |
1 |
M55148 |
21798 |
15467(453bp) |
15240(251bp) |
2 |
X69721 |
20580 |
14324(453bp) |
14097(251bp) |
3 |
Z34093 |
20680 |
14222(453bp) |
13995(251bp) |
4 |
X51939 |
8473 |
2108(453bp) |
1881(251bp) |
5 |
AF124985 |
922 |
330(453bp) |
103(251bp) |
6 |
Z30541 |
9081 |
2890(453bp) |
2663(251bp) |
7 |
AF124989 |
922 |
330(453bp) |
103(251bp) |
8 |
AF124991 |
919 |
327(453bp) |
100(251bp) |
9 |
AF124990 |
922 |
330(453bp) |
103(251bp) |
10 |
AF124988 |
922 |
330(453bp) |
103(251bp) |
11 |
AF124987 |
922 |
330(453bp) |
103(251bp) |
12 |
AF124992 |
922 |
330(453bp) |
103(251bp) |
13 |
Z69629 |
6127 |
2890(453bp) |
2663(251bp) |
14 |
AF124986 |
922 |
330(453bp) |
103(251bp) |
The result of 14 other coronavirus species containing polymerase gene using SPCR software prediction. The product length using primer pair1 is 453bp £» The product length using primer pair2 is 251bp
Verification:
The following articles prove our results ( Ksiazek, Erdman et al. 2003 ; Stephensen, Casebolt et al. 1999 ). Some of the SPCR prediction results have also been verified by our lab. (We have amplified IBV and TGEV in our web lab experiment. The wet lab result proved that the PCR production of IBV and TGEV conform to our SPCR prediction)