Click here to DOWNLOAD SPCR 3.0!

 

Cases ever used to test and SPCR program

 

(1)  Predict 16S rRNA gene from bacteria genomes

Primer sequences:

Primer1: GAGAGTTTGATCCTGGCTCAG

Primer2: CTACGGCTACCTTGTTACGA

Parameters:

I up =0.85 £¬ I dn =0.85 £¬ P a =0.85 £¬ L max =3200bp £¬ L min =200bp

Templates:

Select completed genome sequences of 59 eubacteria as templates, they are included in table as follow:

Bacteria Name

Accession Number

Agrobacterium tumefaciens str. C58 (U. Washington)

NC_003304,NC_003305

Aquifex aeolicus

NC_000918

Bacillus subtilis

NC_000964

Bacteroides thetaiotaomicron VPI-5482

NC_004663

BBUR Borrelia burgdorferi

NC_001318

Bifidobacterium longum NCC2705

NC_004307

Bordetella pertussis

NC_002929

Bradyrhizobium japonicum

NC_004463

Brucella melitensis

NC_003317,NC_003318

Buchnera aphidicola str. APS (Acyrthosiphon pisum)

NC_002528

Candidatus Blochmannia floridanus

NC_005061

Caulobacter crescentus CB15

NC_002696

Chlamydia muridarum

NC_002620

Chlamydophila pneumoniae TW-183

NC_005043

Chlorobium tepidum TLS

NC_002932

Chromobacterium violaceum ATCC 12472

NC_005085

Clostridium tetani E88

NC_004557

Corynebacterium efficiens YS-314

NC_004369

Coxiella burnetii RSA 493

NC_002971

Deinococcus radiodurans

NC_001263,NC_001264

Escherichia coli K12

NC_000913

Fusobacterium nucleatum subsp. nucleatum ATCC 25586

NC_003454

Haemophilus ducreyi 35000HP

NC_002940

Helicobacter hepaticus ATCC 51449

NC_004917

Lactobacillus plantarum WCFS1

NC_004567

Lactococcus lactis subsp. lactis

NC_002662

Leptospira interrogans serovar lai str. 56601

NC_004342,NC_004343

Listeria innocua Clip11262

NC_003212

Mycobacterium tuberculosis H37Rv

NC_000962

Mycoplasma gallisepticum R

NC_004829

Neisseria meningitidis serogroup B strain MC58

NC_003112

Nitrosomonas europaea ATCC 19718

NC_004757

Nostoc sp. PCC 7120

NC_003272

Oceanobacillus iheyensis HTE831

NC_004193

Pirellula sp.

NC_005027

Porphyromonas gingivalis W83

NC_002950

Prochlorococcus marinus str. MIT 9313

NC_005071

Pseudomonas aeruginosa PA01

NC_002516

Ralstonia solanacearum

NC_003295

Rickettsia conorii Malish 7

NC_003103

Salmonella enterica subsp. enterica serovar Typhi

NC_003198

Shewanella oneidensis MR-1

NC_004347

Shigella flexneri 2a str. 2457T

NC_004741

Sinorhizobium meliloti 1021

NC_003047

Staphylococcus aureus subsp. aureus MW2

NC_003923

Streptococcus agalactiae

NC_004116

Streptomyces avermitilis MA-4680

NC_003155

Synechococcus sp. WH 8102

NC_005070

Thermoanaerobacter tengcongensis strain MB4T

NC_003869

Thermosynechococcus elongatus BP-1

NC_004113

Thermotoga maritima

NC_000853

Treponema pallidum

NC_000919

Tropheryma whipplei TW08/27

NC_004551

Ureaplasma urealyticum

NC_002162

Vibrio cholerae

NC_002505,NC_002506

Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis

NC_004344

Xanthomonas axonopodis pv. citri str. 306

NC_003919

Xylella fastidiosa Temecula1

NC_004556

Yersinia pestis strain CO92

NC_003143

Results:

The number, location and direction of 16S rRNAs in 52 prediction results is completely same as data in GenBank, and some variances found in other 7 bacteria. The variances and reasons listed as follow.

Num

Bacteria Name

SPCR

Plus

Minus

GenBank

Plus

Minus

A

BBUR Borrelia burgdorferi

1

0

1

1

1

0

B

Clostridium tetani E88

6

2

4

5

2

3

C

Lactococcus lactis

12

2

10

6

1

5

D

Salmonella enterica subsp. Typhi

8

2

6

7

2

5

E

Ureaplasma urealyticum

4

4

0

2

2

0

F

Vibrio cholerae

8

5

3

8

7

1

G

Yersinia pestis strain CO92

5

2

3

6

2

4

Verification:

We verified the 7 different results through sequence alignment and BLAST, and found the reasons that caused those variances.

  1. BBUR Borrelia burgdorferi : Through alignment and BLAST analysis, the direction of the only one 16S rRNA gene was mislabeled in GenBank data. This 16S rRNA gene should be located in minus strand of genome, not in plus strand.
  2. Clostridium tetani E88 : Through alignment analysis, one of 16S rRNA sequence was mislabeled as 5S rRNA in GenBank data.
  3. Lactococcus lactis : The results are same as GenBank when change parameters to: I up =0.9; I dn =0.9; I ave =0.9
  4. Salmonella enterica subsp. Typhi : Same as GenBank when change parameters to: I up =0.9; I dn =0.9; I ave =0.9
  5. Ureaplasma urealyticum : Same as GenBank when change parameters to: I up =0.86; I dn =0.86; I ave =0.86
  6. Vibrio cholerae : Two minus strand 16S rRNAs were mislabeled as plus strand in GenBank data.
  7. Yersinia pestis strain CO92 : When reduce thresholds to 0.78, we predicted 6 products, and their locations in genome is consensus with GenBank data, but the downstream primer of the last prediction product can not dock to template correctly, the coefficient is only 0.8, and the product length is 2603bp. This is only one uncertain product that can not be corrected in this case.

 

• (2) Predict SOX gene family from homo sapiens genome

Primer sequences:

Primer1: CCMATGAAYGCSTTYATSGTSTGG

Primer2: GGYYKRTAYTTRTARTYSGG

Parameters:

I up =0.9, I dn =0.9, P a =0.9, L max =800, L min =80

Templates:

Homo sapiens whole genome (build 25)

Results:

No.

P a

Length

Chromosome

I up

I dn

SOX genes

Spcr_seq1

0.971468

205

13

1.000000

0.971468

SOX21

Spcr_seq2

0.947803

205

17

0.975639

0.971468

SOX20

Spcr_seq3

0.925834

205

13

1.000000

0.925834

SOX1

Spcr_seq4

0.931180

205

20

0.958528

0.971468

SOX22

Spcr_seq5

0.925834

401

20

1.000000

0.925834

SOX18

Spcr_seq6

0.983530

205

3

0.983530

1.000000

SOX14

Spcr_seq7

0.971468

205

6

1.000000

0.971468

SOX4

Spcr_seq8

0.912196

203

8

0.940659

0.969741

SOX29

Spcr_seq9

0.954924

205

X

0.983530

0.970915

SOX3

Spcr_seq10

0.952519

205

Y

0.983530

0.968469

SRY

Verification:

Through sequence alignment with SOX gene family and BLAST analysis, each of SPCR products can found their counterpart in SOX gene family. But the number of SOX gene in human genome seems more than 10, so we decreased the value of parameters group to I up =0.85, I dn =0.85, P a =0.85, L max =800, L min =80, but we get only 2 more SOX homologous gene and other 29 unrelated products.

 

•  (3) Predict ARR5, ARR7 and GEN12, GEN13 gene family (VPCR)

We repeated the experiments used in VPCR paper with our SPCR program, the results are much better than VPCR.

Primer sequences:

ARR5:

ARR5a = GTTGATTCTCTCTATCTCTCTCACG

ARR5b = CACACCACCATTTTACATATCTC

ARR7:

ARR7a = GTTGGTGAGGTCATGAGGATGGAGATTC

ARR7b = GTTTTGCTAAGGTCTTGGCCTCTATACAT

GEN12:

GEN1 = CATGTTCTTGCYGTYGATGAYAGT

GEN2 = CCAGTCATKCCAGGCATWSAG

GEN13:

GEN1 = CATGTTCTTGCYGTYGATGAYAGT

GEN3 = ATAARAAATCYTCAGCWCCTTC

Parameters:

ARR5 and ARR7: I up =0.90 £¬ I dn =0.90 £¬ P a =0.90 £¬ L max =3200bp £¬ L min =200bp

GEN12 and GEN13: I up =0.85 £¬ I dn =0.85 £¬ P a =0.85 £¬ L max =3200bp £¬ L min =200bp

Templates:

Arabidopsis thaliana genome sequences

Results:

Prediction of ARR5 and ARR7

Gene

P a

Length

Chromosome

Start

End

I up

I dn

ARR5

1.0

1929

3

17635890

17637819

1.0

1.0

ARR7

1.0

1187

1

6577884

6579071

1.0

1.0

Prediction of GEN12

P a

Product Length

Chromosome

Start

End

I up

I dn

0.972001

405

3

17637290

17637695

0.972001

1.000000

0.939634

380

1

3443279

3443659

1.000000

0.939634

0.929476

367

5

24547541

24547908

0.953077

0.975237

0.918687

440

2

17170848

17171288

0.963917

0.953077

0.892865

554

1

6578451

6579005

0.892865

1.000000

0.863344

392

2

16918887

16919279

0.925680

0.932659

Prediction of GEN13

P a

Product Length

Chromosome

Start

End

I up

I dn

1.000000

1410

3

17636285

17637695

1.000000

1.000000

0.933932

872

1

6578133

6579005

0.933932

1.000000

0.920168

1186

5

24546722

24547908

0.943533

0.975237

0.912305

689

1

3442970

3443659

0.970915

0.939634

0.896893

833

1

27343239

27344072

0.896893

1.000000

0.889511

828

3

20987119

20987947

0.963917

0.922808

0.889511

936

2

17170848

17171784

0.963917

0.922808

Verification:

SPCR predicted ARR5 and ARR7 gene exactly from Arabidopsis thaliana genome sequences. In GEN12 and GEN13 cases, SPCR predicted 6 products with GEN12 and 7 products with GEN13. Through BLAST analysis, we can prove all of predicted GEN12 and GEN13 sequences are ARR family gene. This result is much better than VPCR, we almost can not find any significant relationship between VPCR prediction results and real world PCR control, and at the same time, the results of SPCR can match with real world PCR almost perfectly.

 

•  (4) Predict products of ERIC-PCR from E. coli genome

Primer sequences:

Primer 1(ERIC2): AAGTAAGTGACTGGGGTGAGCG

Primer 2(ERIC1R): ATGTAAGCTCCTGGGGATTCAC

Parameters:

I up =0.80 £¬ I dn =0.80 £¬ P a =0.70 £¬ L max =3200bp £¬ L min =200bp

Templates:

E. coli MG1655 genome sequence

 

Results:

Prediction of E. coli ERIC-PCR

P a

Product Length

Direction

Start

End

I up

I dn

0.795308

3184

1->2

2651306

2654490

0.850454

0.935157

0.706402

3088

1->1

1276797

1279885

0.830617

0.850454

0.702913

2955

2->1

1894652

1897607

0.817093

0.86026

0.788584

2934

1->1

127817

130751

0.973009

0.810459

0.778256

2789

1->2

250611

253400

0.800251

0.972514

0.703985

2719

1->2

3724115

3726834

0.840574

0.837505

0.829637

2700

2->1

1275937

1278637

0.877381

0.945583

0.703092

2674

1->2

3779473

3782147

0.810459

0.867523

0.711631

2535

1->1

2789722

2792257

0.800251

0.889259

0.784583

2244

2->2

251156

253400

0.806758

0.972514

0.709959

2069

1->2

4175218

4177287

0.800251

0.887171

0.774671

2029

1->2

190675

192704

0.936337

0.827341

0.806758

1935

2->2

802608

804543

1.0

0.806758

0.706402

1935

1->1

285167

287102

0.850454

0.830617

0.827499

1903

1->1

435533

437436

0.850454

0.973009

0.785417

1840

1->1

1276797

1278637

0.830617

0.945583

0.703092

1740

2->1

2138318

2140058

0.867523

0.810459

0.741863

1476

1->1

2654491

2655967

0.927038

0.800251

0.772629

1330

2->1

1277307

1278637

0.817093

0.945583

0.711631

1322

1->1

2790935

2792257

0.800251

0.889259

0.71487

1078

1->1

3137589

3138667

0.840574

0.850454

0.810459

922

1->1

595343

596265

0.810459

1.0

0.701414

555

2->2

3072576

3073131

0.837505

0.837505

0.747895

487

1->2

2654491

2654978

0.927038

0.806758

0.8149

464

1->2

127817

128281

0.973009

0.837505

0.701874

260

2->1

699036

699296

0.806758

0.869994

0.749305

220

1->1

4580750

4580970

0.936337

0.800251

 

Verification:

ERIC-PCR is a kind of random amplified PCR; the process in molecular level is very complex so it's very hard to find a proper mathematical model to describe this process, and very difficult to simulate this process with computational methods. But in spite of this, the counterparts of most of significant bands of real ERIC-PCR pattern can be found in SPCR prediction pattern.

 

 

(5) Predict PCR product of Coronavirus genomes

Primer sequences:

Primer pair1 £º

upstream £º IN-2(+) £º 5'-GGGTTGGGACTATCCTAAGTGTGA-3'

downstream £º IN-4(-) £º 5'-TAACACACAACNCCATCATCA-3'

Primer pair 2 £º

upstream £º 2Bp: 5'- ACTCARWTRAATYTNAAATAYGC-3'

downstream £º 4Bm: 5'- TCACAYTTWGGATARTCCCA-3'

Parameters:

I up =0. 90 £¬ I dn =0. 90 £¬ P a =0. 90 £¬ L max =3200bp £¬ L min =200bp

Templates:

7 complete coronavirus genomes

1. Porcine epidemic diarrhea virus ( 28,033 bp)

2. Human coronavirus 229E ( 27,317 bp)

3. Transmissible gastroenteritis virus ( 28,586 bp)

4. Murine hepatitis virus ( 31,100 bp)

5. Bovine coronavirus ( 31,276 bp)

6. Avian infectious bronchitis virus ( 27,608 bp)

7. SARS coronavirus ( 29,751 bp)

Results :

Number

Accession Number

Length (bp)

SPCR product of Primer1 pair1

(IN-2(+)/IN-4(-))

SPCR product of Primer1 pair2

(2Bp/4Bm)

1

NC_003436

28,033

14424 (453bp)

14197(251bp)

2

AF304460

27,317

14324(453bp)

13915(251bp)

3

AJ271965

28,586

14142(453bp)

14921(251bp)

4

AF220295

31,100

15148(453bp)

14097(251bp)

5

AF201929

31,276

15246(453bp)

14987(251bp)

6

M95169

27,608

14179(453bp)

13952(251bp)

7

AY274119

29,751

15214(453bp)

15019(251bp)

 

The result of PCR production of 7 complete coronavirus genomes using SPCR software prediction.

The PCR product length using primer pair1 is 453bp £» The PCR product length using primer pair2 is 251bp

Templates:

14 other coronavirus species containing polymerase gene

1. Murine-Cov open reading frame 1a (gene 1), complete cds and open reading frame 1b (gene 1), 3' end.

2. Human coronavirus 229E mRNA for RNA polymerase and proteases.

3. Transmissible gastroenteritis virus (Purdue-115) mRNA for polymerase locus.

4. Mouse hepatitis virus RNA for viral polymerase open reading frame 1b.

5. Bovine coronavirus RNA-directed RNA polymerase (pol) gene

6. Avian infectious bronchitis virus mRNA for chimeric gene.

7. Human coronavirus (strain OC43) RNA-directed RNA polymerase (pol) gene.

8. Turkey coronavirus RNA-directed RNA polymerase (pol) gene.

9. Rat sialodacryoadenitis coronavirus RNA-directed RNA polymerase (pol) gene.

10. Porcine hemagglutinating encephalomyelitis virus RNA-directed RNA polymerase (pol) gene.

11. Feline infectious peritonitis virus RNA-directed RNA polymerase (pol) gene.

12. Porcine transmissible gastroenteritis virus RNA-directed RNA polymerase (pol) gene.

13. Infectious bronchitis virus RNA (defective RNA CD-61).

14. Canine coronavirus RNA-directed RNA polymerase (pol) gene, partial cds.

Results :

Number

Accession Number

Length

(bp)

Primer pair1

(IN-2(+)/IN-4(-))

Primer pair2

(2Bp/4Bm)

1

M55148

21798

15467(453bp)

15240(251bp)

2

X69721

20580

14324(453bp)

14097(251bp)

3

Z34093

20680

14222(453bp)

13995(251bp)

4

X51939

8473

2108(453bp)

1881(251bp)

5

AF124985

922

330(453bp)

103(251bp)

6

Z30541

9081

2890(453bp)

2663(251bp)

7

AF124989

922

330(453bp)

103(251bp)

8

AF124991

919

327(453bp)

100(251bp)

9

AF124990

922

330(453bp)

103(251bp)

10

AF124988

922

330(453bp)

103(251bp)

11

AF124987

922

330(453bp)

103(251bp)

12

AF124992

922

330(453bp)

103(251bp)

13

Z69629

6127

2890(453bp)

2663(251bp)

14

AF124986

922

330(453bp)

103(251bp)

The result of 14 other coronavirus species containing polymerase gene using SPCR software prediction. The product length using primer pair1 is 453bp £» The product length using primer pair2 is 251bp

 

 

Verification:

The following articles prove our results ( Ksiazek, Erdman et al. 2003 ; Stephensen, Casebolt et al. 1999 ). Some of the SPCR prediction results have also been verified by our lab. (We have amplified IBV and TGEV in our web lab experiment. The wet lab result proved that the PCR production of IBV and TGEV conform to our SPCR prediction)