当前位置:首页 / 文章测试 / Cancer Genome Landscapes (1)

Cancer Genome Landscapes (1)

开始打字练习

Abstract

Over the past decade, comprehensive sequencing efforts have revealed the genomic landscapes of common forms of human cancer. For most cancer types, this landscape consists of a small number of "mountains" (genes altered in a high percentage of tumors) and a much larger number of "hills" (genes altered infrequently). To date, these studies have revealed ~140 genes that, when altered by intragenic mutations, can promote or "drive" tumorigenesis. A typical tumor contains two to eight of these "driver gene" mutations; the remaining mutations are passengers that confer no selective growth advantage. Driver genes can be classified into 12 signaling pathways that regulate three core cellular processes: cell fate, cell survival, and genome maintenance. A better understanding of these pathways is one of the most pressing needs in basic cancer research. Even now, however, our knowledge of cancer genomes is sufficient to guide the development of more effective approaches for reducing cancer morbidity and mortality.

Ten years ago, the idea that all of the genes altered in cancer could be identified at base-pair resolution would have seemed like science fiction. Today, such genome-wide analysis, through sequencing of the exome (see Box 1, Glossary, for definitions of terms used in this Review) or of the whole genome, is routine.

Box 1

Glossary

Adenoma: A benign tumor composed of epithelial cells.

Alternative lengthening of telomeres (ALT): A process of maintaining telomeres independent of telomerase, the enzyme normally responsible for telomere replication.

Amplification: A genetic alteration producing a large number of copies of a small segment (less than a few megabases) of the genome.

Angiogenesis: the process of forming vascular conduits, including veins, arteries, and lymphatics.

Benign tumor: An abnormal proliferation of cells driven by at least one mutation in an oncogene or tumor suppressor gene. These cells are not invasive (i.e., they cannot penetrate the basement membrane lining them), which distinguishes them from malignant cells.

Carcinoma: A type of malignant tumor composed of epithelial cells.

Clonal mutation: A mutation that exists in the vast majority of the neoplastic cells within a tumor.

Driver gene mutation (driver): A mutation that directly or indirectly confers a selective growth advantage to the cell in which it occurs.

Driver gene: A gene that contains driver gene mutations (Mut-Driver gene) or is expressed aberrantly in a fashion that confers a selective growth advantage (Epi-Driver gene).

Epi-driver gene: A gene that is expressed aberrantly in cancers in a fashion that confers a selective growth advantage.

Epigenetic Changes: in gene expression or cellular phenotype caused by mechanisms other than changes in the DNA sequence.

Exome: The collection of exons in the human genome. Exome sequencing generally refers to the collection of exons that encode proteins.

Gatekeeper: A gene that, when mutated, initiates tumorigenesis. Examples include RB, mutations of which initiate retinoblastomas, and VHL, whose mutations initiate renal cell carcinomas.

Germline genome: An individual's genome, as inherited from their parents.

Germline variants: Variations in sequences observed in different individuals. Two randomly chosen individuals differ by ~20,000 genetic variations distributed throughout the exome.

Human leukocyte antigen (HLA): A protein encoded by genes that determine an individual's capacity to respond to specific antigens or reject transplants from other individuals.

Homozygous deletion: Deletion of both copies of a gene segment (the one inherited from the mother, as well as that inherited from the father).

Indel: A mutation due to small insertion or deletion of one or a few nucleotides.

Karyotype: Display of the chromosomes of a cell on a microscopic slide, used to evaluate changes in chromosome number as well as structural alterations of chromosomes.

Kinase: A protein that catalyzes the addition of phosphate groups to other molecules, such as proteins or lipids. These proteins are essential to nearly all signal transduction pathways.

Liquid tumors: Tumors composed of hematopoietic (blood) cells, such as leukemias. Though lymphomas generally form solid masses in lymph nodes, they are often classified as liquid tumors because of their derivation from hematopoietic cells and ability to travel through lymphatics.

Malignant tumor: An abnormal proliferation of cells driven by mutations in oncogenes or tumor suppressor genes that has already invaded their surrounding stroma. It is impossible to distinguish an isolated benign tumor cell from an isolated malignant tumor cell. This distinction can be made only through examination of tissue architecture.

Metastatic tumor: A malignant tumor that has migrated away from its primary site, such as to draining lymph nodes or another organ.

Methylation: Covalent addition of a methyl group to a protein, DNA, or other molecule.

Missense mutation: A single-nucleotide substitution (e.g., C to T) that results in an amino acid substitution (e.g., histidine to arginine).

Mut-driver gene: A gene that contains driver gene mutations.

Nonsense mutation: A single-nucleotide substitution (e.g., C to T) that results in the production of a stop codon.

Nonsynonymous mutation: A mutation that alters the encoded amino acid sequence of a protein. These include missense, nonsense, splice site, translation start, translation stop, and indel mutations.

Oncogene: A gene that, when activated by mutation, increases the selective growth advantage of the cell in which it resides.

Passenger mutation (passenger): A mutation that has no direct or indirect effect on the selective growth advantage of the cell in which it occurred.

Primary tumor: The original tumor at the site where tumor growth was initiated. This can be defined for solid tumors, but not for liquid tumors.

Promoter: A region within or near the gene that helps regulate its expression.

Rearrangement: A mutation that juxtaposes nucleotides that are normally separated, such as those on two different chromosomes.

Selective growth advantage (s): The difference between birth and death in a cell population. In normal adult cells in the absence of injury, s = 0.000000.

Self-renewing tissues: Tissues whose cells normally repopulate themselves, such as those lining the gastrointestinal or urogenital tracts, as well as blood cells.

Single-base substitution (SBS)A single-nucleotide substitution (e.g., C to T) relative to a reference sequence or, in the case of somatic mutations, relative to the germline genome of the person with a tumor.

Solid tumors: Tumors that form discrete masses, such as carcinomas or sarcomas.

Somatic mutations: Mutations that occur in any non–germ cell of the body after conception, such as those that initiate tumorigenesis.

Splice sites: Small regions of genes that are juxtaposed to the exons and direct exon splicing.

Stem cell: An immortal cell that can repopulate a particular cell type.

Subclonal mutation: A mutation that exists in only a subset of the neoplastic cells within a tumor.

Translocation: A specific type of rearrangement where regions from two nonhomologous chromosomes are joined.

Tumor suppressor gene: A gene that, when inactivated by mutation, increases the selective growth advantage of the cell in which it resides.

Untranslated regions: Regions within the exons at the 5′ and 3′ ends of the gene that do not encode amino acids.

The prototypical exomic studies of cancer evaluated ~20 tumors at a cost of >$100,000 per case (1–3). Today, the cost of this sequencing has been reduced 100-fold, and studies reporting the sequencing of more than 100 tumors of a given type are the norm (table S1A). Although vast amounts of data can now be readily obtained, deciphering this information in meaningful terms is still challenging. Here, we review what has been learned about cancer genomes from these sequencing studies-and, more importantly, what this information has taught us about cancer biology and future cancer management strategies.

How Many Genes Are Subtly Mutated in a Typical Human Cancer?

In common solid tumors such as those derived from the colon, breast, brain, or pancreas, an average of 33 to 66 genes display subtle somatic mutations that would be expected to alter their protein products (Fig. 1A). About 95% of these mutations are single-base substitutions (such as C>G), whereas the remainder are deletions or insertions of one or a few bases (such as CTT>CT) (table S1B). Of the base substitutions, 90.7% result in missense changes, 7.6% result in nonsense changes, and 1.7% result in alterations of splice sites or untranslated regions immediately adjacent to the start and stop codons (table S1B).

Fig. 1

Number of somatic mutations in representative human cancers, detected by genome-wide sequencing studies

(A) The genomes of a diverse group of adult (right) and pediatric (left) cancers have been analyzed. Numbers in parentheses indicate the median number of nonsynonymous mutations per tumor. (B) The median number of nonsynonymous mutations per tumor in a variety of tumor types. Horizontal bars indicate the 25 and 75% quartiles. MSI, microsatellite instability; SCLC, small cell lung cancers; NSCLC, non–small cell lung cancers; ESCC, esophageal squamous cell carcinomas; MSS, microsatellite stable; EAC, esophageal adenocarcinomas. The published data on which this figure is based are provided in table S1C.

Certain tumor types display many more or many fewer mutations than average (Fig. 1B). Notable among these outliers are melanomas and lung tumors, which contain ~200 nonsynonymous mutations per tumor (table S1C). These larger numbers reflect the involvement of potent mutagens (ultraviolet light and cigarette smoke, respectively) in the pathogenesis of these tumor types. Accordingly, lung cancers from smokers have 10 times as many somatic mutations as those from nonsmokers (4). Tumors with defects in DNA repair form another group of outliers (5). For example, tumors with mismatch repair defects can harbor thousands of mutations (Fig. 1B), even more than lung tumors or melanomas. Recent studies have shown that high numbers of mutations are also found in tumors with genetic alterations of the proofreading domain of DNA polymerases POLE or POLD1 (6, 7). At the other end of the spectrum, pediatric tumors and leukemias harbor far fewer point mutations: on average, 9.6 per tumor (table S1C). The basis for this observation is considered below.

Mutation Timing

When do these mutations occur? Tumors evolve from benign to malignant lesions by acquiring a series of mutations over time, a process that has been particularly well studied in colorectal tumors (8, 9). The first, or "gatekeeping," mutation provides a selective growth advantage to a normal epithelial cell, allowing it to outgrow the cells that surround it and become a microscopic clone (Fig. 2). Gatekeeping mutations in the colon most often occur in the APC gene (10). The small adenoma that results from this mutation grows slowly, but a second mutation in another gene, such as KRAS, unleashes a second round of clonal growth that allows an expansion of cell number (9). The cells with only the APC mutation may persist, but their cell numbers are small compared with the cells that have mutations in both genes. This process of mutation followed by clonal expansion continues, with mutations in genes such as PIK3CA, SMAD4, and TP53, eventually generating a malignant tumor that can invade through the underlying basement membrane and metastasize to lymph nodes and distant organs such as the liver (11). The mutations that confer a selective growth advantage to the tumor cell are called "driver" mutations. It has been estimated (12) that each driver mutation provides only a small selective growth advantage to the cell, on the order of a 0.4% increase in the difference between cell birth and cell death. Over many years, however, this slight increase, compounded once or twice per week, can result in a large mass, containing billions of cells.

Fig. 2

Genetic alterations and the progression of colorectal cancer

The major signaling pathways that drive tumorigenesis are shown at the transitions between each tumor stage. One of several driver genes that encode components of these pathways can be altered in any individual tumor. Patient age indicates the time intervals during which the driver genes are usually mutated. Note that this model may not apply to all tumor types. TGF-β, transforming growth factor–β.

The number of mutations in certain tumors of self-renewing tissues is directly correlated with age (13). When evaluated through linear regression, this correlation implies that more than half of the somatic mutations identified in these tumors occur during the preneoplastic phase; that is, during the growth of normal cells that continuously replenish gastrointestinal and genito-urinary epithelium and other tissues. All of these pre-neoplastic mutations are "passenger" mutations that have no effect on the neoplastic process. This result explains why a colorectal tumor in a 90-year-old patient has nearly twice as many mutations as a morphologically identical colorectal tumor in a 45-year-old patient. This finding also partly explains why advanced brain tumors (glioblastomas) and pancreatic cancers (pancreatic ductal adenocarcinomas) have fewer mutations than colorectal tumors; glial cells of the brain and epithelial cells of the pancreatic ducts do not replicate, unlike the epithelial cells lining the crypts of the colon. Therefore, the gatekeeping mutation in a pancreatic or brain cancer is predicted to occur in a precursor cell that contains many fewer mutations than are present in a colorectal precursor cell. This line of reasoning also helps to explain why pediatric cancers have fewer mutations than adult tumors. Pediatric cancers often occur in non–self-renewing tissues, and those that arise in renewing tissues (such as leukemias) originate from precursor cells that have not renewed themselves as often as in adults. In addition, pediatric tumors, as well as adult leukemias and lymphomas, may require fewer rounds of clonal expansion than adult solid tumors (8, 14). Genome sequencing studies of leukemia patients support the idea that mutations occur as random events in normal precursor cells before these cells acquire an initiating mutation (15).

When during tumorigenesis do the remaining somatic mutations occur? Because mutations in tumors occur at predictable and calculable rates (see below), the number of somatic mutations in tumors provides a clock, much like the clock used in evolutionary biology to determine species divergence time. The number of mutations has been measured in tumors representing progressive stages of colorectal and pancreatic cancers (11, 16). Applying the evolutionary clock model to these data leads to two unambiguous conclusions: First, it takes decades to develop a full-blown, metastatic cancer. Second, virtually all of the mutations in metastatic lesions were already present in a large number of cells in the primary tumors.

The timing of mutations is relevant to our understanding of metastasis, which is responsible for the death of most patients with cancer. The primary tumor can be surgically removed, but the residual metastatic lesions-often undetectable and widespread-remain and eventually enlarge, compromising the function of the lungs, liver, or other organs. From a genetics perspective, it would seem that there must be mutations that convert a primary cancer to a metastatic one, just as there are mutations that convert a normal cell to a benign tumor, or a benign tumor to a malignant one (Fig. 2). Despite intensive effort, however, consistent genetic alterations that distinguish cancers that metastasize from cancers that have not yet metastasized remain to be identified.

One potential explanation invokes mutations or epigenetic changes that are difficult to identify with current technologies (see section on "dark matter" below). Another explanation is that meta-static lesions have not yet been studied in sufficient detail to identify these genetic alterations, particularly if the mutations are heterogeneous in nature. But another possible explanation is that there are no metastasis genes. A malignant primary tumor can take many years to metastasize, but this process is, in principle, explicable by stochastic processes alone (17, 18). Advanced tumors release millions of cells into the circulation each day, but these cells have short half-lives, and only a miniscule fraction establish metastatic lesions (19). Conceivably, these circulating cells may, in a nondeterministic manner, infrequently and randomly lodge in a capillary bed in an organ that provides a favorable microenvironment for growth. The bigger the primary tumor mass, the more likely that this process will occur. In this scenario, the continual evolution of the primary tumor would reflect local selective advantages rather than future selective advantages. The idea that growth at metastatic sites is not dependent on additional genetic alterations is also supported by recent results showing that even normal cells, when placed in suitable environments such as lymph nodes, can grow into organoids, complete with a functioning vasculature (20).

Other Types of Genetic Alterations in Tumors

Though the rate of point mutations in tumors is similar to that of normal cells, the rate of chromosomal changes in cancer is elevated (21). Therefore, most solid tumors display widespread changes in chromosome number (aneuploidy), as well as deletions, inversions, translocations, and other genetic abnormalities. When a large part of a chromosome is duplicated or deleted, it is difficult to identify the specific "target" gene(s) on the chromosome whose gain or loss confers a growth advantage to the tumor cell. Target genes are more easily identified in the case of chromosome translocations, homozygous deletions, and gene amplifications. Translocations generally fuse two genes to create an oncogene (such as BCR-ABL in chronic myelogenous leukemia) but, in a small number of cases, can inactivate a tumor suppressor gene by truncating it or separating it from its promoter. Homozygous deletions often involve just one or a few genes, and the target is always a tumor suppressor gene. Amplifications contain an oncogene whose protein product is abnormally active simply because the tumor cell contains 10 to 100 copies of the gene per cell, compared with the two copies present in normal cells.

Most solid tumors have dozens of translocations; however, as with point mutations, the majority of translocations appear to be passengers rather than drivers. The breakpoints of the translocations are often in "gene deserts" devoid of known genes, and many of the translocations and homozygous deletions are adjacent to fragile sites that are prone to breakage. Cancer cells can, perhaps, survive such chromosome breaks more easily than normal cells because they contain mutations that incapacitate genes like TP53, which would normally respond to DNA damage by triggering cell death. Studies to date indicate that there are roughly 10 times fewer genes affected by chromosomal changes than by point mutations. Figure 3 shows the types and distribution of genetic alterations that affect protein-coding genes in five representative tumor types. Protein-coding genes account for only ~1.5% of the total genome, and the number of alterations in noncoding regions is proportionately higher than the number affecting coding regions. The vast majority of the alterations in noncoding regions are presumably passengers. These noncoding mutations, as well as the numerous epigenetic changes found in cancers, will be discussed later.

Fig. 3

Total alterations affecting protein-coding genes in selected tumors

Average number and types of genomic alterations per tumor, including single-base substitutions (SBS), small insertions and deletions (indels), amplifications, and homozygous deletions, as determined by genome-wide sequencing studies. For colorectal, breast, and pancreatic ductal cancer, and medulloblastomas, translocations are also included. The published data on which this figure is based are provided in table S1D.

Drivers Versus Passenger Mutations

Though it is easy to define a "driver gene mutation" in physiologic terms (as one conferring a selective growth advantage), it is more difficult to identify which somatic mutations are drivers and which are passengers. Moreover, it is important to point out that there is a fundamental difference between a driver gene and a driver gene mutation. A driver gene is one that contains driver gene mutations. But driver genes may also contain passenger gene mutations. For example, APC is a large driver gene, but only those mutations that truncate the encoded protein within its N-terminal 1600 amino acids are driver gene mutations. Missense mutations throughout the gene, as well as protein-truncating mutations in the C-terminal 1200 amino acids, are passenger gene mutations.

Numerous statistical methods to identify driver genes have been described. Some are based on the frequency of mutations in an individual gene compared with the mutation frequency of other genes in the same or related tumors after correction for sequence context and gene size (22, 23). Other methods are based on the predicted effects of mutation on the encoded protein, as inferred from biophysical studies (24–26). All of these methods are useful for prioritizing genes that are most likely to promote a selective growth advantage when mutated. When the number of mutations in a gene is very high, as with TP53 or KRAS, any reasonable statistic will indicate that the gene is extremely likely to be a driver gene. These highly mutated genes have been termed "mountains" (1). Unfortunately, however, genes with more than one, but still relatively few mutations (so called "hills") numerically dominate cancer genome landscapes (1). In these cases, methods based on mutation frequency and context alone cannot reliably indicate which genes are drivers, because the background rates of mutation vary so much among different patients and regions of the genome. Recent studies of normal cells have indicated that the rate of mutation varies by more than 100-fold within the genome (27). In tumor cells, this variation can be higher and may affect whole regions of the genome in an apparently random fashion (28). Thus, at best, methods based on mutation frequency can only prioritize genes for further analysis but cannot unambiguously identify driver genes that are mutated at relatively low frequencies.

Further complicating matters, there are two distinct meanings of the term "driver gene" that are used in the cancer literature. The driver-versus-passenger concept was originally used to distinguish mutations that caused a selective growth advantage from those that did not (29). According to this definition, a gene that does not harbor driver gene mutations cannot be a driver gene. But many genes that contain few or no driver gene mutations have been labeled driver genes in the literature. These include genes that are overexpressed, underexpressed, or epigenetically altered in tumors, or those that enhance or inhibit some aspect of tumorigenicity when their expression is experimentally manipulated. Though a subset of these genes may indeed play an important role in the neoplastic process, it is confusing to lump them all together as driver genes.

To reconcile the two connotations of driver genes, we suggest that genes suspected of increasing the selective growth advantage of tumor cells be categorized as either "Mut-driver genes" or "Epi-driver genes." Mut-driver genes contain a sufficient number or type of driver gene mutations to unambiguously distinguish them from other genes. Epi-driver genes are expressed aberrantly in tumors but not frequently mutated; they are altered through changes in DNA methylation or chromatin modification that persist as the tumor cell divides.

A Ratiometric Method to Identify and Classify Mut-Driver Genes

If mutation frequency, corrected for mutation context, gene length, and other parameters, cannot reliably identify modestly mutated driver genes, what can? In our experience, the best way to identify Mut-driver genes is through their pattern of mutation rather than through their mutation frequency. The patterns of mutations in well-studied oncogenes and tumor suppressor genes are highly characteristic and nonrandom. Oncogenes are recurrently mutated at the same amino acid positions, whereas tumor suppressor genes are mutated through protein-truncating alterations throughout their length (Fig. 4 and table S2A).

Fig. 4

Distribution of mutations in two oncogenes (PIK3CA and IDH1) and two tumor suppressor genes (RB1 and VHL)

The distribution of missense mutations (red arrowheads) and truncating mutations (blue arrowheads) in representative oncogenes and tumor suppressor genes are shown. The data were collected from genome-wide studies annotated in the COSMIC database (release version 61). For PIK3CA and IDH1, mutations obtained from the COSMIC database were randomized by the Excel RAND function, and the first 50 are shown. For RB1 and VHL, all mutations recorded in COSMIC are plotted. aa, amino acids.

On the basis of these mutation patterns rather than frequencies, we can determine which of the 18,306 mutated genes containing a total of 404,863 subtle mutations that have been recorded in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (30) are Mut-driver genes and whether they are likely to function as oncogenes or tumor suppressor genes. To be classified as an oncogene, we simply require that >20% of the recorded mutations in the gene are at recurrent positions and are missense (see legend to table S2A). To be classified as a tumor suppressor gene, we analogously require that >20% of the recorded mutations in the gene are inactivating. This "20/20 rule" is lenient in that all well-documented cancer genes far surpass these criteria (table S2A).

The following examples illustrate the value of the 20/20 rule. When IDH1 mutations were first identified in brain tumors, their role in tumorigenesis was unknown (2, 31). Initial functional studies suggested that IDH1 was a tumor suppressor gene and that mutations inactivated this gene (32). However, nearly all of the mutations in IDH1 were at the identical amino acid, codon 132 (Fig. 4). As assessed by the 20/20 rule, this distribution unambiguously indicated that IDH1 was an oncogene rather than a tumor suppressor gene, and this conclusion was eventually supported by biochemical experiments (33, 34). Another example is provided by mutations in NOTCH1. In this case, some functional studies suggested that NOTCH1 was an oncogene, whereas others suggested it was a tumor suppressor gene (35, 36). The situation could be clarified through the application of the 20/20 rule to NOTCH1 mutations in cancers. In "liquid tumors" such as lymphomas and leukemias, the mutations were often recurrent and did not truncate the predicted protein (37). In squamous cell carcinomas, the mutations were not recurrent and were usually inactivating (38–40). Thus, the genetic data clearly indicated that NOTCH1 functions differently in different tumor types. The idea that the same gene can function in completely opposite ways in different cell types is important for understanding cell signaling pathways.

How Many Mut-Driver Genes Exist?

Though all 20,000 protein-coding genes have been evaluated in the genome-wide sequencing studies of 3284 tumors, with a total of 294,881 mutations reported, only 125 Mut-driver genes, as defined by the 20/20 rule, have been discovered to date (table S2A). Of these, 71 are tumor suppressor genes and 54 are oncogenes. An important but relatively small fraction (29%) of these genes was discovered to be mutated through unbiased genome-wide sequencing; most of these genes had already been identified by previous, more directed investigations.

How many more Mut-driver genes are yet to be discovered? We believe that a plateau is being reached, because the same Mut-driver genes keep being "rediscovered" in different tumor types. For example, MLL2 and MLL3 mutations were originally discovered in medulloblastomas (41) and were subsequently discovered to be mutated in non-Hodgkin lymphomas, prostate cancers, breast cancers, and other tumor types (42–45). Similarly, ARID1A mutations were first discovered to be mutated in clear-cell ovarian cancers (46, 47) and were subsequently shown to be mutated in tumors of several other organs, including those of the stomach and liver (48–50). In recent studies of several types of lung cancer (4, 51, 52), nearly all genes found to be mutated at significant frequencies had already been identified in tumors of other organs. In other words, the number of frequently altered Mut-driver genes (mountains) is nearing saturation. More mountains will undoubtedly be discovered, but these will likely be in uncommon tumor types that have not yet been studied in depth.

The newly discovered Mut-driver genes that have been detected through genome-wide sequencing have often proved illuminating. For example, nearly half of these genes encode proteins that directly regulate chromatin through modification of histones or DNA. Examples include the histones HIST1H3B and H3F3A, as well as the proteins DNMT1 and TET1, which covalently modify DNA, EZH2, SETD2, and KDM6A, which, in turn, methylate or demethylate histones (53–57). These discoveries have profound implications for understanding the mechanistic basis of the epigenetic changes that are rampant in tumors (58). The discovery of genetic alterations in genes encoding mRNA splicing factors, such as SF3B1 and U2AF1 (59–61), was similarly stunning, as mutations in these genes would be expected to lead to a plethora of nonspecific cellular stresses rather than to promote specific tumor types. Another example is provided by mutations in the cooperating proteins ATRX and DAXX (62). Tumors with mutations in these genes all have a specific type of telomere elongation process termed "ALT" (for "alternative lengthening of telomeres") (63). Though the ALT phenotype had been recognized for more than a decade, its genetic basis was mysterious before the discovery of mutations of these genes and their perfect correlation with the ALT phenotype (64). A final example is provided by IDH1 and IDH2, whose mutations have stimulated the burgeoning field of tumor metabolism (65) and have had fascinating implications for epigenetics (66, 67).

The Mut-driver genes listed in table S2A are affected by subtle mutations: base substitutions, intragenic insertions, or deletions. As noted above, Mut-driver genes can also be altered by less subtle changes, such as translocations, amplifications, and large-scale deletions. As with point mutations, it can be difficult to distinguish Mut-driver genes that are altered by these types of changes from genes that contain only passenger mutations. Genes that are not point-mutated, but are recurrently amplified (e.g., MYC family genes) or homozygously deleted (e.g., MAP2K4) and that meet other criteria (e.g., being the only gene in the amplicon or homozygously deleted region) are listed in table S2B. This adds 13 Mut-driver genes-10 oncogenes that are amplified and 3 tumor suppressor genes that are homozygously deleted-to the 125 driver genes that are affected by subtle mutations, for a total of 138 driver genes discovered to date (table S2).

Translocations provide similar challenges for driver classification. An important discovery related to this point is chromothripsis (68), a rare cataclysmic event involving one or a small number of chromosomes that results in a large number of chromosomal rearrangements. This complicates any inferences about causality, in the same way that mismatch repair deficiency compromises the interpretation of point mutations. However, for completeness, all fusion genes that have been identified in at least three independent tumors are listed in table S3. Virtually all of these genes were discovered through conventional approaches before the advent of genome-wide DNA sequencing studies, with some notable exceptions such as those described in (6) and (69). The great majority of these translocations are found in liquid tumors (leukemias and lymphomas) (table S3C) or mesenchymal tumors (table S3B) and were initially identified through karyotypic analyses. A relatively small number of recurrent fusions, the most important of which include ERG in prostate cancers (70) and ALK in lung cancers (71), have been described in more common tumors (table S3A).

Genes exist that predispose to cancer when inherited in mutant form in the germ line, but are not somatically mutated in cancer to a substantial degree. These genes generally do not confer an increase in selective growth advantage when they are abnormal, but they stimulate tumorigenesis in indirect ways (such as by increasing genetic instability, as discussed later in this Review). For completeness, these genes and the hereditary syndromes for which they are responsible are listed in table S4.

Dark Matter

Classic epidemiologic studies have suggested that solid tumors ordinarily require five to eight "hits," now interpreted as alterations in driver genes, to develop (72). Is this number compatible with the molecular genetic data? In pediatric tumors such as medulloblastomas, the number of driver gene mutations is low (zero to two), as expected from the discussion above (Fig. 5). In common adult tumors-such as pancreatic, colorectal, breast, and brain cancers-the number of mutated driver genes is often three to six, but several tumors have only one or two driver gene mutations (Fig. 5). How can this be explained, given the widely accepted notion that tumor development and progression require multiple, sequential genetic alterations acquired over decades?

Fig. 5

Number and distribution of driver gene mutations in five tumor types

The total number of driver gene mutations [in oncogenes and tumor suppressor genes (TSGs)] is shown, as well as the number of oncogene mutations alone. The driver genes are listed in tables S2A and S2B. Translocations are not included in this figure, because few studies report translocations along with the other types of genetic alterations on a per-case basis. In the tumor types shown here, translocations affecting driver genes occur in less than 10% of samples. The published data on which this figure is based are provided in table S1E.

First, technical issues explain some of the "missing mutations." Genome-wide sequencing is far from perfect, at least with the technologies available today. Some regions of the genome are not well represented because their sequences are difficult to amplify, capture, or unambiguously map to the genome (73–76). Second, there is usually a wide distribution in the number of times that a specific nucleotide in a given gene is observed in the sequence data, so some regions will not be well represented by chance factors alone (77). Finally, primary tumors contain not only neoplastic cells, but also stromal cells that dilute the signal from the mutated base, further reducing the probability of finding a mutation (78).

What fraction of mutations are missed by these three technical issues? A recent study of pancreatic cancers is informative in this regard. Biankin et al. used immunohistochemical and genetic analyses to select a set of primary tumor samples enriched in neoplastic cells (79). They used massively parallel sequencing to analyze the exomes of these samples, then compared their mutational data with a set of pancreatic cancer cell lines and xenografts in which mutations had previously been identified, using conventional Sanger sequencing, and confirmed to be present in the primary tumors (3, 16). Only 159 (63%) of the expected 251 driver gene mutations were identified in the primary tumors studied by next-generation sequencing alone, indicating a false-negative rate of 37%. Genome-wide studies in which the proportion of neoplastic cells within tumors is not as carefully evaluated as in (79) will have higher false-negative rates. Moreover, these technical problems are exacerbated in whole-genome studies compared with exomic analyses, because the sequence coverage of the former is often lower than that of the latter (generally 30-fold in whole-genome studies versus more than 100-fold in exomic studies).

Conceptual issues also limit the number of detectable drivers. Virtually all studies, either at the whole-genome or whole-exome level, have focused on the coding regions. The reason for this is practical; it is difficult enough to identify driver gene mutations when they qualitatively alter the sequence of the encoded protein. Trying to make sense of intergenic or intronic mutations is much more difficult. Based on analogous studies of the identifiable mutations in patients with monogenic diseases, more than 80% of mutations should be detectable through analysis of the coding regions (80). However, this still leaves some mutations as unidentifiable "dark matter," even in the germline genomes of heritable cases, which are usually easier to interpret than the somatic mutations in cancers. The first examples of light coming to such dark matter have recently been published: Recurrent mutations in the promoter of the TERT gene, encoding the catalytic subunit of telomerase, have been identified and shown to activate its transcription (81, 82).

Mut-driver genes other than those listed in table S2 will undoubtedly be discovered as genome-wide sequencing continues. However, based on the trends noted above, most of the Mut-driver genes will likely be mountains in rare tumor types or small hills in common tumor types; thus, these genes are unlikely to account for the bulk of the presumptive dark matter. Other types of dark matter can be envisioned, however. Copy-number alterations are ubiquitous in cancers, at either the whole-chromosome or subchromosomal levels. These alterations could subtly change the expression of their driver genes. Recent studies have suggested that the loss of one copy of chromosomes containing several tumor suppressor genes, each plausibly connected to neoplasia but not altered by mutation, may confer a selective growth advantage (83, 84).

声明:以上文章均为用户自行发布,仅供打字交流使用,不代表本站观点,本站不承担任何法律责任,特此声明!如果有侵犯到您的权利,请及时联系我们删除。