Let’s start with the basics here. For the adequately initiated, the first bit of this post would be a bit of a snoozefest…
Much like most good things in life today, this story too starts with a cell. Only difference – the cell I speak of isn’t a phone, but the microscopic fundamental working unit of every living system. Our bodies comprise of close to 37.2 trillion cells! (Yeah, that’s 37 followed by 12 zeroes…Just like my bank balance; except for the 37 preceding the 12 zeroes.)
There are various types of cells in our bodies, each serving a specific function. The instructions needed by them to carry out their activities are contained within the DNA present at their centrally-located nucleus. DNA, or Deoxyribonucleic Acid, from all organisms is composed of the same physical and chemical components. The DNA sequence is the specific sequential arrangement of the bases (or nucleotides) along the DNA strand, which spells out the instructions necessary to create a particular organism possessing it’s own unique traits.
The complete set of DNA in an organism is referred to as its genome. Genomes vary widely in size between organisms, ranging from as low as about 600,000 base pairs (certain bacteria) to as high as over 3 billion base pairs (humans and mice). Yeah…”genomically” speaking, we are as big as mice! Haha..ah the wonders of DNA! If you want wacky facts about DNA…read this first, and then come back here.
Each chromosome contains several genes (human genome is estimated to have ~25000 genes), which are the basic functional and physical units of heredity. Genes are a specified sequence of bases, encoding instructions on how to synthesise specific proteins. Genes comprise only about 2% of the entire genome. Though it’s fairly intuitive that most of our focus would be on genes (Well…duh!); it would be rather unfair on my part not to give a shout out to the proteins that these genes code for. In all honesty, it’s the protein that carries out most life functions, and make up the majority of cellular structures. The collection of all the proteins present in a cell at any given time point, is called it’s Proteome. Quite unlike the genome (which, for the most part is rather static), the proteome is extremely dynamic, and changes every minute in response to both intra- and extracellular environmental signals.
The Human Genome Project (HGP)
The inception, goals and legacy of the HGP
Would you believe me if I told you that the roots of the Human Genome Project can be traced back to an initiative by the US Department Of Energy (DOE)? Yeah…you read that right. You see…since the second half of the 1940s, the DOE and its preceding agencies have been instructed by the Congress to develop new energy resources and technologies, and the pursuit to identify potential risks or the impact of the production and use of such technologies on human health and environment. In 1986, the DOE took a bold step towards this venture by initiating the human genome project, stating that the establishment of a reference human genome would greatly benefit their cause. To this end, the DOE joined hands with the National Institutes of Health (NIH), and thereby launched the HGP in 1990. The early years of this project saw a major partner in the Wellcome Trust (a major private charitable institution based in the UK). Soon, several other nations such as Japan, China, Germany and France, also began contributing to this project, which would take 13 years to complete.
The quintessential goal of the HGP was the generation of a high quality reference sequence of the human genome’s 3.2 billion odd base pairs, and the identification of all genes. Alongside the fulfilment of this objective, the HGP also aimed to sequence the genomes of model organisms in order to aid the interpretation of the human genome, enhancement of computational resources for supporting research and commercial applications of the knowledge so obtained, understanding gene function through a genomics comparison between humans and mice, studying the genetic basis of human variation, and executing training for future scientists in the field of genomics.
Amidst a lot of excitement, scientists announced the completion of the first working draft of the complete human genome in June 2000, which was subsequently published first in February 2001, in prominent journals such as Nature and Science. The successful realisation of the human genome project was marked by the completion of the high-quality reference human genome in 2003, over 2 years ahead of schedule (Yeah! A government project that ended BEFORE time! Take that cynics!) What made this occasion even more special, was that it coincided with the 50th anniversary of Watson and Crick’s publication about the structure of the DNA molecule – the discovery that launched the era of molecular biology!
Available to researchers worldwide, the reference genome proved to be a spectacular, unprecedented resource that led to a plethora of R&D and practical applications. Since the completion of HGP, thousands of genomes (humans and other species) have been sequenced, creating a global genomic database which can give competition to our universe in terms of how rapidly it’s expanding. Stemming from the knowledge we glean from these massive sequencing projects, we have identified (and are still discovering a lot more with each passing day) important elements in the genome responsible for cellular function regulation, thereby providing the basis for human variation. The biggest challenge now is to enhance our understanding of the various components at work inside each cell (genes, proteins etc), and how the interplay between these processes/systems leads to the creation and sustenance of complex life forms.
What did we learn from the HGP?
In the seminal paper published by the consortium of HGP scientists, you find this one wonderful line – They wrote and I quote “…the more we learn about the human genome, the more there is to explore”. We couldn’t agree more.
A few of the highlights of our learnings from the very first publication following the creation of the reference genome are listed below:
- The human genome contains 3.2 billion nucleotide base pairs (A, T, G and C)
- On an average, genes consist of about 3000 bases pairs. Having said that, gene length varies tremendously, with the largest gene (CNTNAP2 – for Caspr2 protein) comprising of over 2.4 million base pairs.
- Functions of over 50% of the discovered genes are as yet unknown
- The human genome is highly conserved between people, with every human sharing close to 99.9% of his genome with the rest of the population.
- Only ~2% of the genome encodes instructions for protein synthesis (This is the functional part of the genome, also called the exome)
- At least 50% of the human genome is made of up non coding repetitive sequences. While these repeat sequences are thought to have no direct functions, they shed light on the structure and dynamics of chromosomes. Compared to other species, humans have a very high percentage of repeat sequences (Mustard weed – 11%; worm – 7%; fly – 3%)
- Over 40% of the predicted proteins (in humans) share structural and/or functional similarity with the proteins found in fruit flies and worms.
- The distribution of genes, though random, shows areas of high concentration interspersed throughout the genome with vast expanses of non coding DNA segments in between them.
- The largest human chromosome, Chromosome 1, contains the highest number of genes (3,168), while the Y chromosome has the fewest (344)
- Specific gene sequences have been associated with various disorders such as breast cancer, deafness, autism, blindness, muscular dystrophies etc.
- Scientists have identified millions of locations within the genome where single nucleotide differences (SNPs) occur in humans, laying the basis of identifying an individual’s genetic propensity (risk) associated with various diseases, such as diabetes, cancers, cardiovascular diseases, arthritis and neurological disorders, to name a few.
With the vast expanse of data we collected through the HGP, came the advent of new technologies, fields of study, and endless commercial possibilities. This shall be touched upon in the second post in this blog series. Watch this space for more!
Discover your true genetic selfie. Discover what makes you YOU. Get a peek into your DNA and #KnowYourself better.
Why the delay? Get your Genomepatri today!
About the author
Udbhav Relan is a member of the Mapmygenome gene pool, having joined us after completing his MSc in Biotechnology and Enterprise from University of Manchester (UK). He loves hitting the gym regularly and going on biking trips. When he isn’t working, he can be found either in a nature reserve – clicking away to his heart’s content, or strumming his guitar trying to ape his childhood heroes – Metallica, but mostly he would be found in a bar, cheering on Manchester United while chugging down gallons of…Green tea (Hey! He’s a fitness freak!)