posted by Mosaic Data Science
In June of 2000, with much fanfare, the Human Genome Project completed the initial draft of the human genome. President Bill Clinton, with British PM Tony Blair and Francis Collins, then director of the National Human Genome Research Institute, announced that the newly decoded human genome would “revolutionize the diagnosis, prevention and treatment of most, if not all, human diseases.” Collins forecasted a grand vision of “personalized medicine” by 2010. The molecular biology revolution was producing an exponentially growing volume of data and expectations were high. But ten years later, in an article entitled “Revolution Postponed,” Scientific American conceded that the Human Genome Project had “failed so far to produce the medical miracles that scientists had promised.” Much excellent research work had been accomplished, but in the age of Big Data, the Human Genome Project is an example of how complex problems are not always solved merely with more data. Big Data sometimes needs Big Analysis. Consider the recent findings that the common squid, Doryteuthis pealeiirecodes, massively reprograms its own genetic data in real time.
The central dogma of molecular biology, first stated 60 years ago by Nobel Laureate Francis Crick, is that information in the cell flows in one direction, moving from genes (DNA) to copies of genes (RNA) and finally to proteins. Like most rules in biology it has been broken many times. One famous exception is reverse transcriptase, a protein machine that inserts a new gene into DNA based on an RNA template. Some types of viruses use this machinery to commandeer the host cell.
Another exception is the so-called alternative splicing of genes. Genes are not simply comprised of a single, uninterrupted stretch of DNA nucleotides. Instead, genes typically consist of several, separate, stretches of DNA, known as exons, with spacer regions in between. After the gene is copied into an RNA molecule, the spacer regions are removed by splicing machinery. One function of all this is that a gene can be modified using different arrangements of its exons—a process known as alternative splicing. Biologists were surprised when the total number of genes in the human genome was found to be less than 30,000. But with alternative splicing the number of unique genes which can be constructed is much higher. Alternative splicing violates the central dogma because the splicing machinery is made of proteins. Again, proteins are reversing the information flow, this time back to RNA.
Yet another exception to the central dogma is RNA editing. Whereas alternative splicing swaps and rearranges lengthy stretches of information, RNA editing modifies single nucleotides. Again the RNA editing machinery is made of proteins. Interestingly, they modify the RNA not by removing and inserting nucleotides but simply by removing an amine group (a central nitrogen atom with three groups attached ranging from simple hydrogen atoms to more complicated chemicals) from the nucleotide. The modified nucleotide is then interpreted differently at the ribosome machine, which translates the final RNA string into a protein molecule. For example when the nucleotide adenosine is deaminated, it instead looks like guanosine to the ribosome.
Consider an example. At the ribosome, the RNA triplet, or codon, AAG (adenosine-adenosine-guanosine) translates into the amino acid lysine according to the universal genetic code (i.e., the DNA code). So when an AAG codon is read, the ribosome fetches a lysine amino acid and attaches it to the protein molecule that is under construction. But if the middle adenosine has been deaminated, then the ribosome instead reads an AGG codon, which translates into an arginine amino acid. So the deamination of this single adenosine nucleotide causes a swap from lysine to arginine in the protein’s amino acid sequence.
Whereas alternative splicing is known to be a significant influence in the cell’s information flow, until recently RNA editing has been thought to be a minor player. Very few RNA deamination events were seen in experiments where they could be observed. That all changed in a new international study  published in March of this year that reported massive RNA editing in common squid, Doryteuthis pealeiirecodes. The researchers found that the RNA copies of more than half of the squid’s genes undergo RNA editing. That is an astronomical shift from the previous thinking. Overall they found more than eighty thousand edit events—several orders of magnitude more than had been observed in other species.
What is the reason for this RNA editing? One function may be in response to temperature changes in the squid’s environment. Answering this question is complicated by the fact that the RNA editing varies across the different squid tissues. It is interesting that the edits usually do result in a change in the resulting protein amino acid sequence. This is not a given because the genetic code is redundant. There are 64 different codons (4 different nucleotides and three nucleotides per codon, or 4^3) and only 20 different amino acids. Therefore many of the amino acids have multiple codons assigned to them and swapping a nucleotide in a codon can result in no change to the assigned amino acid. Such nucleotide substitutions are referred to as synonymous. The squid’s RNA edits produce fewer synonymous changes than would be expected if the edits were random. Also, there is a very strong correlation with the degree of editing. The more that an RNA transcript is edited, the smaller is the fraction of edits that is synonymous.
The squid’s RNA edits also are concentrated in particular types of proteins. For instance, proteins with nervous system functions are heavily affected by the RNA edits. That’s interesting because when it comes to researching nerve cells, the squid has traditionally been the go-to model. That’s because it can have, relatively speaking, huge nerve cells. Its axons (the long tail of the nerve cell) can be as much as a millimeter across—wide enough to insert tiny electrodes to make electrical measurements as nerve signals traverse the cell.
This finding of massive, real-time and tissue-specific RNA editing in the squid raises many questions, including what is it all is for, does it mean that squid research cannot be easily applied to other species, and what level of RNA editing will we find in other species? It also is an example of how some problems require much more than mere data mining. Modeling and analysis are also important steps in understanding the problem.