ANALISI IN SILICO E RELAZIONE TRA ENTEROTOSSINE STAFILOCOCCICHE

Download National Reference Laboratory for Coagulase Positive Staphylococci including S. aureus – Torino. VI WORKSHOP DEL LABORATORIO NAZIONALE DI ...

0 downloads 420 Views 3MB Size
IZSTO Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta

VI WORKSHOP DEL LABORATORIO NAZIONALE DI RIFERIMENTO (NRL) PER GLI STAFILOCOCCHI COAGULASI POSITIVI COMPRESO S.AUREUS 12 / 13 Dicembre 2013

Analisi in silico e relazione tra enterotossine stafilococciche e tossine ipotetiche in silico analysis and relation between SEs and HPs

Guerrino Macori National Reference Laboratory for Coagulase Positive Staphylococci including S.aureus – Torino

IZSTO Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d’Aosta

VI WORKSHOP DEL LABORATORIO NAZIONALE DI RIFERIMENTO (NRL) PER GLI STAFILOCOCCHI COAGULASI POSITIVI COMPRESO S.AUREUS 12 / 13 Dicembre 2013

in silico analysis and relation between SEs and HPs

Summary - Definition of bioinformatic - What is done, units information, scale overview - Databases - Some practices • Reverse vaccinology • Hypotetical proteins and SEs - Conclusion

What is Bioinformatics/computational biology? A marriage between biology and informatic

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

What is done in bioinformatics? R&D - Nucleotide and aminoacid sequences, protein domains and protein structures - models

Development of new algorithms for large data sets

Development and implementation of tools that enable efficient access and management of different types of information

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

“A prerequisite to understanding the complete biology of an organism is the determination of its entire genome sequence” Fleischmann et al. 1995

Whole Genome sequencing (linear sequence of DNA base units – A T C G-) Human genome: 3.12 10*9 bp Whole genome → exponential data → bioinformatic to organize and collect 2000-2001

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Post-genomic era

Is the Sequence sufficient to understand biological function of the organisms? Bioinformatic to analyze in rational manner the genomic data

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

What “units of information” do we deal with bioinformatics?

• DNA • RNA • PROTEIN

• Sequence • Structure • Evolution

• Pathways • Interactions • Mutations

Biological data used: • DNA - Genome • RNA - Transcriptome • PROTEIN - Proteome

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

What “units of information” do we deal with bioinformatics?

• DNA • RNA • PROTEIN

• Simple Sequence Analysis • Database searching • Pairwise analysis • Regulatory Regions • Gene finding • Whole Genome Annotations • Comparative genomics (Species and strains e.g. oldest methods as PFGE) >gi|8886401|gb|AF162269.1|

DNA sequences

CCCACTCCTCCATCTCACAAACACTTCTCTATACCCAACAATCCCTTTTACAATCCCTGCTCATTTAGTCAA AATGGTCAAGATTGCTGCTATCATCCTCCTCATGGGCATTCTCGCCAATGCTGCCGCCATCCCTGTCATT TCAACACCCAAATTACAGAGCCAACCGGCGAGGGCGACCGTGGGGACGTGGCCGAC

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

What “units of information” do we deal with bioinformatics?

• DNA • RNA • PROTEIN

•Splice Variants •Tissue specific expression •Structure •Single gene analysis •Experimental data/thousands genes simultaneously (DNA chips, microarray, expressione arrays)

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

What “units of information” do we deal with bioinformatics?

• DNA • RNA • PROTEIN

• Proteome of an organism • 2D gels • Mass spectromy • Structure: 2D/3D/4D

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Protein analysis: scale overview

Organismo

Genome (Mb) Genes

E. coli

464300 (4300)

S. cerevisiae

13,5 (6000)

Drosophila melanogaster

165 (13600)

Arabidopsis thaliana

119 (25500)

Homo sapiens

3300 (30000/40000)

S.aureus

2,84 (2700

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Protein analysis: scale overview Organismo

Genome (Mb) Genes

E. coli

464300 (4300)

S. cerevisiae

13,5 (6000)

Drosophila melanogaster

165 (13600)

Arabidopsis thaliana

119 (25500)

Homo sapiens

3300 (30000/40000)

S.aureus

2,84 (2700

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Protein analysis: scale overview and databases

Transcription and translation

folding

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

ORF-Finder • Nucleotide sequences → translation (any frame) ORF (Open Reading Frame) discover • ORF: proteic sequence with right lenght for an average protein (> 70-100 aa). • Genome scanned by software for Hypotetical proteins (Hps): possible but not verified

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Protein analysis: scale overview and databases

HPs and Functional SEs domain

Highlight the similarities and differences of functionally important sites

Derive a structural alignment

Detect evolutionary relationships can not be perceived by the sequence

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Protein analysis: databases GenBank

www.ncbi.nlm.nih.gov

nucleotide sequences

Ensembl

www.ensembl.org human/mouse genome (and others)

PubMed

www.ncbi.nlm.nih.gov

literature references

NR

www.ncbi.nlm.nih.gov

protein sequences

SWISS-PROT www.expasy.ch protein

sequences

InterPro

www.ebi.ac.uk

protein domains

OMIM

www.ncbi.nlm.nih.gov

genetic diseases

Enzymes

www.chem.qmul.ac.uk

enzymes

PDB

www.rcsb.org/pdb

protein structures

KEGG

www.genome.ad.jp

metabolic pathways

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

NCBI databases

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Proteic sequences databases • Less data than nucleotidic sequences; • Rarely protein seq come from sequencing; • Obtained for nucleotidic seq tradution;

www.expasy.org

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Practice • Reverse vaccinology • in-silico analysis and relation between staphylococcal enterotoxins and hypothetical toxins: a prediction study for Staphylococcus aureus

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Practice Reverse vaccinology First genomic approach for the development of a vaccine: The Reverse Vaccinology applied to Neisseria meningitidis

Immunogenicity testing in animal models

Vaccine

VACCINE DEVELOPMENT

Express recombinant proteins

1-2 years

In silico vaccine candidates

Computer Prediction Start From the Whole Genome Sequence

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Practice Reverse vaccinology First genomic approach for the development of a vaccine: The Reverse Vaccinology applied to Neisseria meningitidis

ORF prediction on the partial genomic sequence (ORF Finder)

Homology searches for all the predicted ORFs (PSI-BLAST, FASTA) Hits found (function assigned)

Enzyme, cytoplasmic localization

Already known Neisseria antigen

No hits found (hypothetical proteins)

Homology to bacterial surfaceassociated proteins

Localization prediction (PSORT, SignalP, TMPRED) -Secreted -Outer membrane

SELECTED DISCARDED

-Inner membrane -Periplasmic -Lipoproteins VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

cytoplasmic

Practice in-silico analysis and relation between staphylococcal enterotoxins and hypothetical toxins: a prediction study for Staphylococcus aureus

Background Staphylococcus aureus carries a large repertoire of virulence factors, including over 40 secreted proteins and enzymes that it uses to establish and maintain infections. • toxic shock syndrome toxin (TSST) • Panton-Valentine leukocidin (PVL) • the exfoliative toxins A and B (ETA and ETB) • the family of staphylococcal enterotoxins A and B (SEA and SEB) and food poisoning

S.aureus may produce 21 different SEs - excluding variants species

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Background S.aureus may produce 21 different SEs Toxin

Molecular Mass (kDa)

Emetic Activity

Crystal Structure Solved

Gene Accessory genetic element

classical Staphylococcal Enterotoxins (SEs) SEA SEB SEC SED SEE

27:01:00 28:04:00 27.5–27.6 26:09:00 26:04:00

yes yes yes yes yes

yes yes yes yes no

ΦMu50a

sea seb sec sed see

pZA10, SaPI3 SaPIn1, SaPIm1, SaPImw2, SaPIbov1 pIB485-like Φsab

seg seh sei ser ses set

egc 1 (v Saβ I); egc 2 (v Saβ III); egc 3; egc 4 MGEmw2/mssa476 seh /seo egc 1 (v Saβ I); egc 2 (v Saβ III) ); egc 3 pIB485-like; pF5 pF5 pF5

selu selv selj selk sell selm seln selo selp selq

egc 2 (v Saβ III); egc 3 egc 4 pIB485-like; pF5 SaPIbov1, SaPI5

new types Staphylococcal Enterotoxins (SEs) SEG SEH SEI SER SES SET

27:00:00 25:01:00 24:09:00 27:00:00 26:02:00 22:06

yes yes weak yes yes weak

yes yes yes no no no

Staphylococcal Enterotoxins-like proteins (SEls) SEl U SEl V SEl J SEl K SEl L SEl M SEl N SEl O SEl P SEl Q

27:01:00 nd 28:05:00 26:00:00 26:00:00 24:08:00 26:01:00 26:07:00 27:00:00 25:00:00

nd nd nd nd no Nd Nd Nd Nd No

no no no yes no no no no no no

SaPIn1, SaPIm1, SaPImw2, SaPIbov1 egc 1 (v Saβ I); egc 2 (v Saβ III) egcegc 1 (v1 Saβ I); egc 2 (v2 Saβ III);III); egcegc 3; egc (v Saβ I); egc (v Saβ 3; 4 egc 4; MGEmw2/mssa476 seh /seo ΦN315, ΦMu3A SaPI5

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Background: ‘‘hypothetical proteins’’:



protein that is predicted to be expressed from an Open Reading Frame, but for which there is no experimental evidence of translation

• •

Substantial fraction of proteomes There is so far no classification, proteins predicted from nucleic acid sequences and that have not been shown to exist by experimental protein chemical evidence.

Similarity between S.aureus 13 well known deposited SEs and 50 HPs through following databases: SEA - SEB – SEC – SED – SEG – SEI – SEH – SEK – SEL – SEM – SEN – SEO - SEQ 1.

Expasy's Protparam: computation of various physical and chemical parameters for a given entered sequence protein -

http://web.expasy.org/protparam/

2.

NCBI Conserved Domains: search for Conserved Domains within a coding nucleotide sequence-

http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi

3.

PROTEIN DATA BANK - PDB The PDB archive contains information about experimentally-determined structures of proteins, and allows to visualize and align the most similar known structures - http://www.rcsb.org/pdb/home/home.do

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

47/50 HPS have at least one conserved domain The instability index (I.I.) (provides an estimate of the stability of HPs in a test tube)

classified 32 protein as stable

Within stable HPs:

6 HPs show conserved domain homologies with SEs Staphylococcal/Streptococcal toxin, Oligonucleotide Binding (OB)-fold domain Staphylococcal/Streptococcal toxin and β-grasp domain

6 HPs result unknown function and belonging family of S.aureus uncharacterized proteins: 4 sequences match with an high E-value to well-known proteins (E-value connects the score of an alignment between a user-supplied sequence and a database sequence)

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Background Experimentally-determined structures of the 4 Sequences with the high E-value matched (NCBI Access and Protein code are shown)

gi446958341 (1TS2) gi501167136 (1I4G)

gi446958339 (1Q1L) gi446958340 (1TS5)

“in-silico” analysis of the important functionally domains and protein families demonstrate that 6 of the 50 HPs reveals relation as the same family of SEs. This would provide useful solution for the identification of many hypothetical proteins in databases and prediction of their possible involvement in the mechanisms of foodborne illness. VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Two example: biosequences alignment and algorithmic solutions

But we must always remember that: The methods utilized (algorithm for example and modeling) allow you to find the "best" alignment efficiently but do not guarantee that the result is biologically true If the biological sense matchs with function The gene seq of a protein is less conserved than secondary structure, tertiary and quaternary in the course of evolution. two effects: Homologous proteins can have very different sequence and then produce alignments with a low similarity score. If the similarity between two protein sequences is high (statistically significant) is quite reasonable to assume that among them there is a relationship of functional homology.

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013

Grazie Grazieper perl’attenzione l’attenzione

VI Workshop NRL – CPS including S.aureus - Torino, 12-13 dicembre 2013