How do you, as a pioneer, biologist and geneticist, evaluate this new work, which gives access to the first complete sequencing of the human genome?
The world separates what we can do today from what was done 20 years ago and even earlier. Twenty years ago, we looked at it as a detail, it was primarily repetitive DNA, which should not hold big surprises, but this is not entirely true. But the scale of this is still difficult to assess, and much of what was seen was expected. In general, these 200 million additional bases will be the subject of numerous comments and feeding all sorts of hypotheses and conjectures. This work, which is a real experimental and computer tour de force, allows you to answer some of the questions underlying the project, but will generate even more questions and questions. What is always interesting in science.
What are the technologies that could contribute to his exposure?
We are moving to 3rd generation sequencing technologies that allow reading sequences of considerable length, from 10,000 to 20,000 bases and even more, even if these sequences contain many errors. The same sequence can be read many times, and therefore errors that are mostly random from one read to the next can be corrected. The corrections are based on purely computer methods that analyze and compare the read sequences. After reading the fragments must be put together. Which in the case of repetitions is almost impossible, unless you can cover (read) very large fragments, which these new sequencing methods allow. These new sequencing methods are also accompanied by the development of very important software for the assembly of very similar sequences.
In addition, normal cells contain two copies of the genome (one from the mother, the other from the father), which greatly complicates the assembly, especially of repetitive sequences. The authors used hydatidiform mole, an anomaly that can occur during embryo formation. Then the cells contain only one copy of the genome from one parent. Therefore, there will be no variation that could come from the genome of the other parent.
Why is it important to decipher the entire human genome?
These repetitive regions, which globally represent 8% of the genome, have not been known in detail. However, we have known for a long time that there are approximately 3 types of regions consisting of single-file repetitions over very long segments. These are 1) the ends of chromosomes, telomeres, 2) centromeres, which play a decisive role in the division of chromosomes during cell division, 3) large sections that carry a lot of copies of RNA genes, ribosomes (rRNA), which form the basis of ribosomes. Ribosomes are the machinery that makes proteins in cells. It has also been observed that these repeating regions may contain genes that code for proteins, but no one knew exactly how many. When you know the island only by its outline and the tree that goes beyond the horizon, you can’t help but go and see what is on this island. It’s the same here, people wanted to know exactly what these little-known 8% are, this is primarily a curiosity. Thus, out of the 200 million bases sequenced and replaced in the genome, there are several thousand different genes representing about twenty categories. These various categories were known, but now we know where all these elements are. About 150 protein-coding genes have also been discovered. They are also usually copies of genes present in other parts of the genome. But little is known about the expression of these new instances. Since they are redundant and therefore may be redundant, they can evolve rapidly and eventually code for new features – pure conjecture at this stage. Another important reason for the consistency of all this was the presence of a new link. The link used so far is close to the version published by an international public consortium in 2004, which still included several hundred “holes” of poorly estimated size, a version that has been updated from time to time. Now we will have a new, much more detailed guide. We’re going to change stallions. But this stallion is a special sequence. All other human sequences of any origin include numerous variations, this is the biodiversity of humanity. At this stage, we cannot say whether it will be important to systematically sequence complete genomes of individuals. In any case, the 2nd generation methods used for genome sequencing provide data, but the results have been difficult to use. The use of the new standard will facilitate this operation. However, we are under the impression that the information contained in these 200 million additional databases is not medically significant at this stage of our knowledge. However, very limited areas of the 8% can target very specific questions.
What are the next steps in human genomics?
Many experiments will be devised to try to better understand the possible role of the unique (non-repeating) sequences included in this 8%. There is also considerable variability in this 8%, even on a quantitative level: 8% is an average with strong individual variations. We will, of course, try to find out whether genetic characteristics and, in particular, comorbidities may be related. There are plenty of hypotheses. Of course, we also want to compare different human populations and see what happens in other mammals, the rest of the living world. Once again biodiversity at the rendezvous.