For Genetic Privacy


Personal genetic information is getting easier to access and easier to decode; opening massive opportunities to cure and prevent diseases, but raising privacy concerns.


| Otho Mantegazza


On May 9th 2018, US Senator Marco Rubio and Chris Smith wrote a letter to the US Commerce Secretary Wilbur Ross inquiring about sales of “technology for repression and control” by an US company to the Chinese government, which is under scrutiny for its action in the Xinjiang region.

The main technology under scrutiny was for “advanced biometrics such as DNA sequencing”. A spotlight was placed on Thermo Fisher Scientific for selling DNA sequencers to the “Chinese Ministry of Public Security and its Public Security bureaus across China”. Take note: it wasn’t the Ministry of Health or the Ministry of Agriculture purchasing huge amount of DNA sequencers, but by the Ministry of Public Security. This technology (as a consequence?) is considered surveillance equipment.

What can we learn from this interest by authorities for DNA sequencing?

Warnings

Earlier in December 2017, Human Right Watch issued a detailed warning about authorities in the Xinjiang region collecting advanced biometrics such as “DNA samples, fingerprints and iris scans” for “all residents in the region between the age of 12 and 65”. This evidences where collected “through a free annual physical exams program called Physicals for All”.

As detailed by the the HRW article, the program “Physicals for All” was presented as an health check and as a program for “Scientific decision making”, together with statements from participants “how they received treatments for previously undiagnosed illnesses, in some cases saving their lives”.

The “Physicals for All” program was mandatory. Special recommendations were issued to make sure that every individual in every household was reached, and data collection was supervised by authorities. Data collection was quite thorough: “Staff should verify the name, gender, ethnicity, date of birth, ID number, education level, employer information, and other information of the individual. Multidimensional biometric data includes: images, fingerprints, iris scans, blood types, DNA samples” (DNA was collected as blood cards).

HRW reports also that Thermo Fisher scientific showed little to no interest on collaborating with them to figure out what authorities were doing with their technology. As seen above Senators had to intercede to stop this market of surveillance equipment.

Eventually, as reported by the New York Times and by The Guardian, this market came to an end, but maybe not without damage done.

This could be a recent and quite substantial violation of DNA and genetic privacy.

Genetic Information, Genetic Data

In genetics, DNA is like an algorithm, your DNA is a long code, a long series of characters, a long book that tells many things about you and basically holds the instruction on how to build and run your body. Many things are clearly encoded in your DNA, for example (forgive me the banality) the color of your hairs. But not everything is determined by DNA, again forgive me the banality of the example, my DNA might say that my hairs should be black but it won’t probably say anything about me (hypothetically) waking up one morning and dying my hairs blue.

Your DNA contains the basic information on how to build your body. What you do with it is hopefully up to you.

To access the information stored in our DNA we must do two things: first, read it and second, make sense of it.

DNA sequencing

To access the information stored in your DNA, first you have to read it, or better, sequence it.

What do you do when you sequence DNA?

First you to extract it from the cells (DNA is molecule) and then you use some technique to read its code, to read its sequence and store it somewhere else. For example a human DNA sequence is a “book” of around 3 (x2) billion characters written in a 4 characters alphabet. This book is stored inside our cells as a molecule. To sequence it, it means that you use a technology to get that book, read every sequence of character that composes it and to save it in a easily accessible format (i.e. a computer file). Reading (or sequencing) DNA, does not imply in anyway that you make sense of it and of the meaning of the sequence of characters that you are reading. But it’s a step toward it.

DNA sequencing is a technology that improved spectacularly in the last 25 years (check the review above) with costs going down, throughput going up and size of the equipment shrinking. It was a common approach that “If you have a molecular biology problem, you turn it into a sequencing problem and you use the power of sequencing technology to solve it”.

DNA sequencing produces tons of data, but luckily we live in the era of data and we know how to deal with them.

Analyzing a DNA sequence

Reading a DNA sequence does not imply that you understand what information it stores, the same way as reading a book does not imply that you understand what’s written in it, or at least not all of it.

Understanding what information is stored in DNA is a long ongoing research process. A process similar in some way to decoding a secret code, and in some other way to reverse engineering.

The DNA sequence of all humans is very similar. As a rule of thumb, the more similar two organisms are, the more similar their DNA sequence is; while a plant and a human, a yeast and a human, a bacteria and a human, will have pretty different DNA sequences, human and other animals and humans and humans will have highly similar sequences. Among organisms, the alphabet is always the same, the grammar might change slightly, the content might vary widely.

Since the DNA sequence of all human is similar, understanding what information is stored in it is a long process of association. one we can identify the small differences of DNA sequence (polymorphisms) and try to associate something to them, some physical aspect, some illness, some peculiar behavior(?). The differences are sometime down to changes in a single character (nucleotide) out of billions, and trying to associate something to them takes a lot of observations and statistics.

What DNA tells about you

We don’t know exactly, or better, we know a lot about it, and we are still missing much.

But once somebody gets holds on a sample of your DNA, or a file that stores (even parts of) your DNA sequence, what can they tell about you?

Well first of all, your DNA identifies you univocally. Even if DNA sequences are very similar among people, and even similar more among relatives, parts of DNA sequences are enough to identify you univocally and without margin of doubt. Even without knowing the meaning and the information stored in those sequences, one can use them as a fingerprint to identify you by comparison. And your DNA sequence cannot be easily altered (though forensic DNA can).

Then, since DNA is by definition “hereditary material”, your DNA can be used to identify your parents and a big part of your family history, together with your ancestry. For example 23andme can identify 3rd degree cousins with a likelihood of 90% (besides telling you a lot on where your ancestors came from).

Last, the most complicated, is what can be told about you from a sample of your DNA, once we have attributed a meaning to the unique sequence changes (polymorphisms) that make up your DNA. Again, we don’t know exactly, but probably a lot. For example, things that can be guessed are:

SNPedia, an open online wiki collects info on all the polymorphism that are publicly available. At this moment if you search for “behavior” you get 314 items as result, and if you search “intelligence” you get 169.

Don’t get me wrong, this knowledge and technologies about human genetics that we are accumulating and developing are great. They can help us treat diseases better, debunk myths about races and gender, they can help us improve our lifestyle and solving crime scenes. But all these technology raise great ethical concerns that we must face and take into account.

Genotype to phenotype in the age of Big Data

As mentioned above, decoding information in DNA is a long and meticulous work of reverse engineering and of association, trying to connect your genotype (Your unique DNA sequence) to your phenotype (you).

Genotypes are becoming easier and easier to get: genotyping services such as 23andme and ancestry offer a pretty comprehensive analysis of your genotype for a hundred dollars, from a sample of your saliva. Hospitals routinely do it from blood samples. Also getting a detailed genotype from forensic evidence or even old blood cards is getting more detailed and reliable.

Tough, to decode this information you must rely on previous study and DNA sequences will hardly generate any new knowledge on themselves if not connected to the phenotype and behavior of the owner. But with social networks and web trackers we are leaving an easily searchable and indexed track of our physical appearance, our thoughts and behavior. This hints to risks that should enter the public debate right now.

European Privacy Law

The European Union took privacy rather seriously. In 2009 it took action to rewrite the Data Protection regulations and its directives that were active at the moment, in an effort to improve it and to unify the laws in member states. After consultations and revisions, the General Data Protection Regulation (GDPR) was published in Mat 2016, and it applies since May 2018.

This legislation provides a strong level of data privacy to people residing in EU territories, expecially concerning healthcare and biometric data.

Mahsa Shabani and Pascal Borry in this article for the European Journal of Human Genetics, describe the effect of GDPR on storing and elaborating genetics and healthcare data. It details how genetic data are now treated as sensitive data in the same way as data on political, religious and philosophical opinions, and data about health and sex life. And how access to them should be strictly consensual and regulated.

The article details also how the GDPR grants some special permission to use this kind of data for scientific research of public interest, with safeguards. But it leaves some grey area on the definition of safeguards and public interest, where states can draft their own guidelines.

Besides those grey area, the GDPR is perceived as a strict privacy regulation. While I tend to agree that “strict rules make bad rules”, privacy (especially regarding genetics, since we have seen how sensible and tightly personal those data are) is such a sensitive matter that I really welcome tight rules on it.

GDPR has been criticized for being to zealous (and also vague), compared to HIPAA, the US law on the matter. This might be favoring big companies that have money to risk, prepare and adapt. Though one could argue that you don’t favour small companies by loosening privacy standard,but rather by giving them legislative and financial support.

The grey area on safeguards that are needed to treat genetic data for research of public interests, and, I think, the definition of public interest itself, are a cause of concern. As stated by Ciara Staunton et al., “There is little insight or guidance contained within the GDPR as to the appropriate safeguards that must be in place, which is alarming considering the potential scope of the derogations”. It will be interesting to see how this are develops.

Public Debate and Conclusions

In this article I’ve been rather critical about new technologies of DNA sequencing and the new possibilities of analyzing those data. Actually, I’ve always been excited about them. Working in the field of genetics, I could use some of them first hand with those and the possibilities that they open are great and impressive.

Though, I think that real progress goes through critical attitude, it goes through awareness and it goes through dialogue. Sometimes we must slow down technological advance and move forward at the a pace at which controversies can be faced and social issues can be addressed. So, let’s ask us: what are the risks, who could be harmed, how can we prevent it.

The case of the Xinjiang region was a great warning. We don’t know what those DNA blood cards are being used for by authorities in the Xinjiang province. We don’t know if they are just a mean of identification, like fingerprints, or if more extensive analysis is being done on that DNA, revealing secrets and personal information that was hidden in the DNA of the people that have been scanned. But this case wasn’t the first warning and many similar cases, maybe of smaller size, might be scattered around the world.

Let’s ask ourselves, if genetic data were not protected and access to them tightly regulated, what could go wrong? We might not want that trained AI (which additionally, sometimes could be biased and poorly designed) try to predict IQ and behavior of citizens, and we must avoid that people get discriminated at work or in healthcare systems or in education on the base of their heritage or genes, we might not want that authorities take into account genetics when dealing with citizens. But unfortunately these are issues that we might have to face, or at least be prepared about.

Many people are already doing it. For example the EU has produced tight privacy laws and bioethics is a very active field of research and communication. Moreover, genotyping company such as 23andme had rigorously lawful conducts when it comes to privacy. But, is everywhere around the world the same? Are we doing enough? I felt that these are important issues and trying to raise awareness about them might be worth it.

We are taking giant steps with genetics and to make best use of it, we must first face concerns and be prepared for its risks.

Further Readings

Personalized medicine

Two essays on ethics and privacy issues of personalized medicines, PDF 1 and PDF 2

Human DNA companies

CNET and Business Insider discuss the personal privacy issues with genetic testing