James Watson was the co-author of the research that determined the helical structure of DNA and for this he was awarded the Nobel Prize in Medicine in 1962. James Watson said: “Before we thought that the future was in the stars, now we know that the future is in our genes.”
The current vision of a part of medicine coincides with the vision of James Watson: we know that part of the future of medicine lies in the realization of fine-grained medicines that will be obtained thanks to research on the human genome.
It is plausible to think that genomics will be one of the engines of change necessary to cover future health care needs that are coming, caused in part by two factors that no one can control: the Silver Tsunami caused by the Baby Boomers (those born between 1945 and 1965) and on the other hand, the reduction of the birth rate worldwide.
It is estimated that by 2050 more than 30% of the population will be over 65 years of age, that is, people who, at retirement age, will have long-lasting medical and care needs due to their greater longevity.
The Human Genome Project (HGP) cost $3B and a genome can now be sequenced for less than $1,000. The PGH consisted of obtaining the sequencing of the genome or, what is the same, putting in the correct order all the chemical bases that make up the molecular chain of DNA.
The state of the art has made the sequencing of the human genome more affordable and new technologies give people the possibility of owning their own genome, which is highly valuable data and research centers and laboratories are willing to pay for it.
The conventional thing is that companies that carry out genomic studies, for example, of ancestors, give their users a report in which they present the possible connections that existed between their genome and other reference genomes from other parts of the world. The user pays for an ancestor report and that’s what he gets: everyone is happy.
But the genomic studies company becomes the custodian of the genome itself and can market it to laboratories or research centers without any restrictions, finding there a more valuable source of income than what is actually generated when they provide the basic service of genomic studies to people .
In order to make the user aware of the ownership of their genomic data, an architecture based on Blockchain technology can be thought of that makes it possible to have valuable information based on a very useful genomic study for the entire life of the user, and at the same time that empowers the owner of their data to be able to monetize them, that is, to have a source of passive income from their own genome.
And all this, in addition, in compliance with personal data protection such as the RGPD in Europe and HIPAA in the United States.
Decentralized platform for the generation, commercialization and analysis of genomic data
A genomic analysis platform can be based on 3 blocks:
- A bioinformatics platform specialized in genomic data and medical records. It stores large amounts of data, and processes it very efficiently and quickly by executing workflows or complex workflows using the CWL language to carry out genomic or other types of analyses.
A private Blockchain that allows us to have security, traceability and continuous availability, as well as the possibility of executing Smart Contracts to automate the compensation system and other basic functions. - An encryption scheme:
From the point of view of the related entities, we could say that there are these types of participants:
- Data owners: they can be individuals, institutions or research centers. They will store encrypted genomic data in private clouds within the bioinformatics platform. They are able to control access to their data and receive compensation payments.
Blockchain managers. They manage the encrypted keys, verify the transactions and maintain data traceability, as well as the workflows executed on them.
Data buyers: they are researchers or laboratories that want to access genomic data. After the access control and the data owners have given their consent, they will have a copy of the metadata that is in the Blockchain and that will serve to access the data stored outside of it.
How we use the Blockchain
The data generated from the analysis of a DNA sample is obtained from genetic sequencing machines that provide text files in a format called FASTQ.
Comparing the information in the FASTQ file with the reference genome used for sequencing, we will obtain a type of file called BAM (Binary Alignment Map) and a BAI file (BAM file index file).
Once you have the BAM file of a genome, you can view it in an interactive genome browser. The entire exome of a person can weigh in terms of file capacity around 4GB.
But… how can we put that huge amount of data in a Blockchain if the size of a block, for example of the Bitcoin network, is 2M? Or being that the size of a complete Blockchain may not exceed 300GB? With a decentralized storage solution such as the IPFS system, it can be achieved, because the Blockchain is not really used to store the data, but rather it is used to store the hash of the data.
With this image it can be clearer:
It is important to note that data, both in storage and in transit, is encrypted to increase security. Since the IPFS system is public we have to implement some encryption measure to increase security.
Thus, each time any of the actors who want to access genomic data must know the public key of the user to be able to use said data.
What does Blockchain bring in this use case
It provides us with security since the user’s health data will be protected against unauthorized access and only by knowing the public key of the actors who want to share their data, can they be accessed.
It provides us with traceability, since all the actions that occur will be recorded in the Blockchain: we are talking about the user’s registration on the platform, the user’s informed consent and who accessed which file, among others.
And above all, it allows us to have a system of incentives or payments implemented through a Smart Contract that will automatically decide who receives payment for the use of their data or, on the other hand, how much will have to be paid for access to them by buyers of data.