– personal genomics discussion board assignment – DO NOT PLAGIRIZE – MAKE SURE YOU READ THE ARTICLE IN ATTACHED BELOW IN ORDER TO ANSWER THE QUESTIONS!!- DELIVER WORK ON TIME – look at the document attach for instructions
personal_genomics_db.docx

synderome.pdf

Unformatted Attachment Preview

Personal genomics discussion board assignment





DO NOT PLAGIARIZE
Your discussion board post for this week should me minimum 150 words; 180 max
This can be written in first person (REMEMBER THIS IS JUST A DISSCUSION BOARD)
USE YOUR OWN WORDS
DO NOT COPY THE EXAMPLES SINCE THOSE ARE POST THA HAVE ALREADY BEEN
POSTED ON THE DISCUSSION BOARD PAGE AND IT BELONGS TO OTHER STUDENTS
Instructions:
READ THE ARTICLE FIRST THEN,
For the discussion this week, please comment on two things that surprised you about the
Cell article and this project in general. Also, think about and provide your opinion on whether
you would want to participate in this type of work. For example, would you want to give your
blood and other samples on a regular basis, perhaps wearing something similar to a fitbit
(non-intrusive monitoring device), etc. What things would you be willing to do, and under
what context? Would you only do it if your information was held private? Would you want to
know all the information? Looking forward to our remaining classes, I want us to explore and
continue this conversation regarding the larger broader impacts of this field on society.
Resource
Personal Omics Profiling
Reveals Dynamic Molecular
and Medical Phenotypes
Rui Chen,1,11 George I. Mias,1,11 Jennifer Li-Pook-Than,1,11 Lihua Jiang,1,11 Hugo Y.K. Lam,1,12 Rong Chen,2,12
Elana Miriami,1 Konrad J. Karczewski,1 Manoj Hariharan,1 Frederick E. Dewey,3 Yong Cheng,1 Michael J. Clark,1
Hogune Im,1 Lukas Habegger,6,7 Suganthi Balasubramanian,6,7 Maeve O’Huallachain,1 Joel T. Dudley,2
Sara Hillenmeyer,1 Rajini Haraksingh,1 Donald Sharon,1 Ghia Euskirchen,1 Phil Lacroute,1 Keith Bettinger,1 Alan P. Boyle,1
Maya Kasowski,1 Fabian Grubert,1 Scott Seki,2 Marco Garcia,2 Michelle Whirl-Carrillo,1 Mercedes Gallardo,9,10
Maria A. Blasco,9 Peter L. Greenberg,4 Phyllis Snyder,1 Teri E. Klein,1 Russ B. Altman,1,5 Atul J. Butte,2 Euan A. Ashley,3
Mark Gerstein,6,7,8 Kari C. Nadeau,2 Hua Tang,1 and Michael Snyder1,*
1Department
of Genetics, Stanford University School of Medicine
of Systems Medicine and Division of Immunology and Allergy, Department of Pediatrics
3Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine
4Division of Hematology, Department of Medicine
5Department of Bioengineering
Stanford University, Stanford, CA 94305, USA
6Program in Computational Biology and Bioinformatics
7Department of Molecular Biophysics and Biochemistry
8Department of Computer Science
Yale University, New Haven, CT 06520, USA
9Telomeres and Telomerase Group, Molecular Oncology Program, Spanish National Cancer Centre (CNIO), Madrid E-28029, Spain
10Life Length, Madrid E-28003, Spain
11These authors contributed equally to this work
12Present address: Personalis, Palo Alto, CA 94301, USA
*Correspondence: mpsnyder@stanford.edu
DOI 10.1016/j.cell.2012.02.009
2Division
SUMMARY
INTRODUCTION
Personalized medicine is expected to benefit from
combining genomic information with regular monitoring of physiological states by multiple highthroughput methods. Here, we present an integrative
personal omics profile (iPOP), an analysis that
combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single
individual over a 14 month period. Our iPOP analysis
revealed various medical risks, including type 2
diabetes. It also uncovered extensive, dynamic
changes in diverse molecular components and
biological pathways across healthy and diseased
conditions. Extremely high-coverage genomic
and transcriptomic data, which provide the basis
of our iPOP, revealed extensive heteroallelic
changes during healthy and diseased states and an
unexpected RNA editing mechanism. This study
demonstrates that longitudinal iPOP can be used
to interpret healthy and diseased states by connecting genomic information with additional dynamic
omics activity.
Personalized medicine aims to assess medical risks, monitor,
diagnose and treat patients according to their specific genetic
composition and molecular phenotype. The advent of genome
sequencing and the analysis of physiological states has proven
to be powerful (Cancer Genome Atlas Research Network,
2011). However, its implementation for the analysis of otherwise
healthy individuals for estimation of disease risk and medical
interpretation is less clear. Much of the genome is difficult to
interpret and many complex diseases, such as diabetes, neurological disorders and cancer, likely involve a large number of
different genes and biological pathways (Ashley et al., 2010;
Grayson et al., 2011; Li et al., 2011), as well as environmental
contributors that can be difficult to assess. As such, the combination of genomic information along with a detailed molecular
analysis of samples will be important for predicting, diagnosing
and treating diseases as well as for understanding the onset, progression, and prevalence of disease states (Snyder et al., 2009).
Presently, healthy and diseased states are typically followed
using a limited number of assays that analyze a small number
of markers of distinct types. With the advancement of many
new technologies, it is now possible to analyze upward of 105
molecular constituents. For example, DNA microarrays have
allowed the subcategorization of lymphomas and gliomas
Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1293
(Mischel et al., 2003), and RNA sequencing (RNA-Seq) has
identified breast cancer transcript isoforms (Li et al., 2011; van
der Werf et al., 2007; Wu et al., 2010; Lapuk et al., 2010).
Although transcriptome and RNA splicing profiling are powerful
and convenient, they provide a partial portrait of an organism’s
physiological state. Transcriptomic data, when combined with
genomic, proteomic, and metabolomic data are expected to
provide a much deeper understanding of normal and diseased
states (Snyder et al., 2010). To date, comprehensive integrative
omics profiles have been limited and have not been applied to
the analysis of generally healthy individuals.
To obtain a better understanding of: (1) how to generate an
integrative personal omics profile (iPOP) and examine as many
biological components as possible, (2) how these components
change during healthy and diseased states, and (3) how this
information can be combined with genomic information to
estimate disease risk and gain new insights into diseased states,
we performed extensive omics profiling of blood components
from a generally healthy individual over a 14 month period
(24 months total when including time points with other molecular
analyses). We determined the whole-genome sequence (WGS)
of the subject, and together with transcriptomic, proteomic, metabolomic, and autoantibody profiles, used this information to
generate an iPOP. We analyzed the iPOP of the individual over
the course of healthy states and two viral infections (Figure 1A).
Our results indicate that disease risk can be estimated by
a whole-genome sequence and by regularly monitoring health
states with iPOP disease onset may also be observed. The
wealth of information provided by detailed longitudinal iPOP revealed unexpected molecular complexity, which exhibited
dynamic changes during healthy and diseased states, and
provided insight into multiple biological processes. Detailed
omics profiling coupled with genome sequencing can provide
molecular and physiological information of medical significance.
This approach can be generalized for personalized health monitoring and medicine.
RESULTS
Overview of Personal Omics Profiling
Our overall iPOP strategy was to: (1) determine the genome
sequence at high accuracy and evaluate disease risks, (2)
monitor omics components over time and integrate the relevant
omics information to assess the variation of physiological states,
and (3) examine in detail the expression of personal variants
at the level of RNA and protein to study molecular complexity
and dynamic changes in diseased states.
We performed iPOP on blood components (peripheral blood
mononuclear cells [PBMCs], plasma and sera that are highly
accessible) from a 54-year-old male volunteer over the course
of 14 months (IRB-8629). The samples used for iPOP were taken
over an interval of 401 days (days 0–400). In addition, a complete
medical exam plus laboratory and additional tests were performed before the study officially launched (day 123) and blood
glucose was sampled multiple times after the comprehensive
omics profiling (days 401–602) (Figure 1A). Extensive sampling
was performed during two viral infections that occurred during
this period: a human rhinovirus (HRV) infection beginning on
1294 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
day 0 and a respiratory syncytial virus (RSV) infection starting
on day 289. A total of 20 time points were extensively analyzed
and a summary of the time course is indicated in Figure 1A.
The different types of analyses performed are summarized in
Figures 1B and 1C. These analyses, performed on PBMCs
and/or serum components, included WGS, complete transcriptome analysis (providing information about the abundance of
alternative spliced isoforms, heteroallelic expression, and RNA
edits, as well as expression of miRNAs at selected time points),
proteomic and metabolomic analyses, and autoantibody
profiles. An integrative analysis of these data highlights dynamic
omics changes and provides rich information about healthy and
diseased phenotypes.
Whole-Genome Sequencing
We first generated a high quality genome sequence of this
individual using a variety of different technologies. Genomic
DNA was subjected to deep WGS using technologies from
Complete Genomics (CG, 35 nt paired end) and Illumina
(100 nt paired end) at 150- and 120-fold total coverage, respectively, exome sequencing using three different technologies to
80- to 100-fold average coverage (see Extended Experimental
Procedures available online) and analysis using genotyping
arrays and RNA sequencing.
The vast majority of genomic sequences (91%) mapped to the
hg19 (GRCh37) reference genome. However, because of the
depth of our sequencing, we were able to identify sequences
not present in the reference sequence. Assembly of the
unmapped Illumina sequencing reads (60,434,531, 9% of the
total) resulted in 1,425 (of 29,751) contigs (spanning 26 Mb) overlapping with RefSeq gene sequences that were not annotated in
the hg19 reference genome. The remaining sequences appeared
unique, including 2,919 exons expressed in the RNA-Seq data
(e.g., Figure S1A). These results confirm that a large number of
undocumented genetic regions exist in individual human
genome sequences and can be identified by very deep
sequencing and de novo assembly (Li et al., 2010).
Our analysis detected many single nucleotide variants (SNVs),
small insertions and deletions (indels) and structural variants
(SVs; large insertions, deletions, and inversions relative to
hg19), (summarized in Table 1 and Experimental Procedures).
134,341 (4.1%) high-confidence SNVs are not present in
dbSNP, indicating that they are very rare or private to the
subject. Only 302 high-confidence indels reside within RefSeq
protein coding exons and exhibit enrichments in multiples of
three nucleotides (p < 0.0001). In addition to indels, 2,566 high-confidence SVs were identified (Experimental Procedures and Table S1) and 8,646 mobile element insertions were identified (Stewart et al., 2011). Analysis of the subject’s mother’s genome by comprehensive genome sequencing (as above) and imputation allowed a maternal/paternal chromosomal phasing of 92.5% of the subject’s SNVs and indels (see Extended Experimental Procedures for details). Of 1,162 compound heterozygous mutations in genes, 139 contain predicted compound heterozygous deleterious and/or nonsense mutations. Phasing enabled the assembly of a personal genome sequence of very high confidence (c.f., Rozowsky et al., 2011). A B C Figure 1. Summary of Study (A) Time course summary. The subject was monitored for a total of 726 days, during which there were two infections (red bar, HRV; green bar, RSV). The black bar indicates the period when the subject: (1) increased exercise, (2) ingested 81 mg of acetylsalicylic acid and ibuprofen tablets each day (the latter only during the first 6 weeks of this period), and (3) substantially reduced sugar intake. Blue numbers indicate fasted time points. (B) iPOP experimental design indicating the tissues and analyses involved in this study. (C) Circos (Krzywinski et al., 2009) plot summarizing iPOP. From outer to inner rings: chromosome ideogram; genomic data (pale blue ring), structural variants >
50 bp (deletions [blue tiles], duplications [red tiles]), indels (green triangles); transcriptomic data (yellow ring), expression ratio of HRV infection to healthy states;
proteomic data (light purple ring), ratio of protein levels during HRV infection to healthy states; transcriptomic data (yellow ring), differential heteroallelic
expression ratio of alternative allele to reference allele for missense and synonymous variants (purple dots) and candidate RNA missense and synonymous edits
(red triangles, purple dots, orange triangles and green dots, respectively).
See also Figure S1.
WGS-Based Disease Risk Evaluation
We identified variants likely to be associated with increased
susceptibility to disease (Dewey et al., 2011). The list of high
confidence SNVs and indels was analyzed for rare alleles (<5% of the major allele frequency in Europeans) and for changes in genes with known Mendelian disease phenotypes (data summarized in Table 2), revealing that 51 and 4 of the rare coding SNV and indels, respectively, in genes present in OMIM are predicted to lead to loss-of-function (Table S2A). This list of genes was further examined for medical relevance (Table S2A; example alleles are summarized in Figure 2A), and 11 were validated by Sanger sequencing. High interest genes include: (1) a mutation (E366K) in the SERPINA1 gene previously known in the subject, (2) a damaging mutation in TERT, associated with acquired aplastic anemia (Yamaguchi et al., 2005), and (3) variants associated with hypertriglyceridemia and diabetes, such as GCKR Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc. 1295 Table 1. Summary and Breakdown of DNA Variants Type Total Variants Total High Confidence Heterozygous High Confidence Homozygous High Confidence Total SNVs 3,739,701 3,301,521 1,971,629 1,329,892 Total gene-associated SNVs 1,312,780 1,183,847 717,485 466,362 Total coding/UTR 49,017 44,542 27,383 17,159 Missense 10,592 9,683 5,944 3,739 Nonsense 83 73 49 24 Synonymous 11,459 10,864 6,747 4,117 50 UTR 4,085 2,978 1,802 1,176 30 UTR 22,798 20,944 12,841 8,103 Intron 1,263,763 1,139,305 690,102 449,203 Ts/Tv — 2.14 — — dbSNP 3,493,748 3,167,180 — — Candidate private SNV 245,953 134,341 — — Indels (107 +36 bp) 1,022,901 216,776 — — Coding Structural variants (>50 bp)
In 1000G projecta
3,263
302


44,781
2,566


4,434
1,967


High confidence values are from variants identified across multiple platforms (Illumina and CG) and/or Exome and RNA-Seq data. Annotations were
based from variant call formatted (vcf) files for heterozygous calls: 0/1, reference (ref)/alternative (alt); 1/2, alt/alt and homozygous calls; 1/1, alt/alt; 1/,
(alt/alt-incomplete call). Polyphen-2 was used to identify the location of the SNVs.
a
1000G (1000 Genomes Project Consortium, 2010).
(homozygous) (Vaxillaire et al., 2008), and KCNJ11 (homozygous) (Hani et al., 1998) and TCF7 (heterozygous) (Erlich et al.,
2009).
Genetic disease risks were also assessed by the RiskOGram
algorithm, which integrates information from multiple alleles
associated with disease risk (Ashley et al., 2010) (Figure 2B).
This analysis revealed a modest elevated risk for coronary artery
disease and significantly elevated risk levels of basal cell carcinoma (Figure 2B), hypertriglyceridemia, and type 2 diabetes
(T2D) (Figures 2B and 2C).
In addition to coding region variants we also analyzed genomic
variants that may affect regulatory elements (transcription
factors [TF]), which had not been attempted previously (Data
S1). A total of 14,922 (of 234,980) SNVs lie in the motifs of 36
TFs known to be associated with the binding data (see Experimental Procedures), indicating that these are likely having a
direct effect on TF binding. Comparison of SNPs that alter
binding patterns of NFkB and Pol II sites (Kasowski et al.,
2010), also revealed a number of other interesting regulatory
variants, some of which are associated with human disease
(e.g., EDIL) (Sun et al., 2010) (Figure S1B).
Medical Phenotypes Monitoring
Based on the above analysis of medically relevant variants and
the RiskOGram, we monitored markers associated with highrisk disease phenotypes and performed additional medically
relevant assays.
Monitoring of glucose levels and HbA1c revealed the onset of
T2D as diagnosed by the subject’s physician (day 369, Figures
2A and 2C). The subject lacked many known factors associated
with diabetes (nonsmoker; BMI = 23.9 and 21.7 on day 0 and day
511, respectively) and glucose levels were normal for the first
1296 Cell 148, 1293–1307, March 16, 2012 ª2012 Elsevier Inc.
part of the study. However, glucose levels elevated shortly after
the RSV infection (after day 301) extending for several months
(Figure 2D). High levels of glucose were further confirmed using
glycated HbA1c measurements at two time points (days 329,
369) during this period (6.4% and 6.7%, respectively). After
a dramatic change in diet, exercise and ingestion of low doses
of acetylsalicylic acid a gradual decrease in glucose (to
93 mg/dl at day 602) and HbA1c levels to 4.7% was observed.
Insulin resistance was not evident at day 322. The patient was
negative for anti-GAD and anti-islet antibodies, and insulin levels
correlated well with the fasted and nonfasted states (Figure S2C),
consistent with T2D. These results indicate that a genome
sequence can be used to estimate disease risk in a healthy individual, and by monitoring traits associated with that disease,
disease markers can be detected and the phenotype treated.
The subject contained a TERT mutation previously associated
with aplastic anemia (Yamaguchi et al., 2005). However, measurements of telomere length suggested little or no decrease in
telomere length and modest increase in numbers of cells with
short telomeres relative to age-matched controls (Figures S2A
and S2B). Importantly, the patient and his 83-year-old mother
share the same mutation but neither exhibit symptoms of aplastic anemia, indicating that this mutation does not always result in
disease and is likely context specific in its effects.
Consistent with the elevated hypertriglyceridemia risk, triglycerides were found to be high (321 mg/dl) at the beginning of the
study. These levels were reduced (81–116 mg/dl) after regularly
taking simvastatin (20 mg/day).
We also examined the variants for their potential effects on
drug response (see Extended Experimental Procedures). Among
the alleles of interest, (Figure 2A and Table S2B) two genotypes
affecting the LPIN1 and SLC22A1 genes were associated with
Table 2. Summary of Disease-Related Rare Variants
Category
Count
Total high confidence rare SNVs
289,989
Coding
2,546
Missense
1,320
Synonymous
1,214
Nonsense
11
Nonstop
1
Damaging or possibly damaging
233
Putative loss-of-function SNVsa
51
Total high confidence rare indels
51,248
Coding indels
61
Frameshift indels
27
miRNA indels
3
miRNA target sequence indels
Putative loss-of-function indels
5
a
4
a
In curated Mendelian disease genes.
favorable (glucose lowering) responses to two diabetic drugs, rosiglitazone and metformin, respectively.
We followed the levels of 51 cytokines along with the C-reactive
pro …
Purchase answer to see full
attachment