THE WHITE HOUSE
Office of the Press Secretary
PRESS BRIEFING BY DR. NEAL LANE, ASSISTANT TO THE PRESIDENT FOR SCIENCE AND TECHNOLOGY, DR. FRANCIS COLLINS, DIRECTOR OF THE NATIONAL HUMAN GENOME RESEARCH INSTITUTE, DR. CRAIG VENTER, PRESIDENT AND CHIEF SCIENTIFIC OFFICER, CELERA GENOMICS CORPORATION, AND DR. ARI PATRINOS, ASSOCIATE DIRECTOR FOR BIOLOGICAL AND ENVIRONMENTAL RESEARCH, DEPARTMENT OF ENERGY, ON THE COMPLETION OF THE FIRST SURVEY OF THE ENTIRE HUMAN GENOME The James S. Brady Briefing Room
11:08 A.M. EDT
MR. LOCKHART: I think we've established why we here in the White House Press Secretary were not entrusted with the Human Genome Project. (Laughter.)
We're very honored today to have a very distinguished group to come to brief you. I think you've all seen the event. But Dr. Neal Lane will open this with a brief statement, the President's Science Advisor; followed by Dr. Francis Collins, the Director of the NIH and Dr. Craig Venter the CEO of Celera.
DR. LANE: Thank you. Thank you, Joe. Good morning everyone.
You have just heard President Clinton, Prime Minister Blair congratulating all the members of the scientific teams of the Human Genome partnership, the public effort involving the United States and the United Kingdom and several others countries; having reached an important milestone in the sequencing the human genome; as well, as Craig Venter, President of Celera, and his team who have completed their first assembly of the human genome.
So it is an extremely exciting day. It is a forward-looking time because of the enormous opportunities for the use of this scientific information to benefit all peoples of the world.
I would now like to ask Francis Collins to make a brief statement and then Dr. Craig Venter and then we'll take your questions.
DR. COLLINS: Well thank you, Neal. This is a happy day for science, and I think for the public, both here and around the world. I have the honor of serving as the project manager, I guess is the right word, of the International Human Sequencing Consortium, which has been laboring to try to develop methodologies and then apply them for sequencing the 3 billion letters of the human DNA code. We can now say it's more like 3.15 billion letters, because we have a better handle on it.
That involves investigators, not only in the United States, but also in the United Kingdom, in France, in Germany, in China and in Japan. And that has been a particularly gratifying aspect of this. Because this is, after all, our shared inheritance and it's nice that we're working on it together around the world.
What we are announcing today is that we have reached a milestone that we promised to get to just about now; that is, covering the genome in what we call a working draft of the human sequence. That is not to say that we have it all finished and zipped up and every last letter precisely identified. That will take a number of additional steps and probably the better part of the next couple of years to achieve.
But if you're sitting somewhere in the genome right now there is a very good chance you're in our database. And if you look to one side or the other of any particular letter in the DNA code you will find that you're sitting on an uninterrupted stretch of sequence that runs about 200,000 letters in length, and most of the sequences there. So for the scientist who's working, trying to unravel a mystery of some sort -- and many of these are mysteries about disease -- this database is now in a form that makes it possible to answer many of those questions very quickly.
Back in the 1980s, I had the experience of trying to track down the cystic fibrosis gene. It took us about 10 years of very hard work to finally succeed at that endeavor. And there were probably 100 investigators involved and millions of dollars were spent on this enterprise. I can tell you that with the database that's now available as of today, an average post-doc working in a lab would be able to accomplish that probably in a matter of a couple of weeks. So it is profoundly gratifying to see this come along in this fashion.
Finally, I just would like to say how nice it is to share the podium today with Dr. Venter. I want to recognize his wonderful willingness to come forward in the way that has led to today, with the plans here for a simultaneous announcement of these milestones. The work that his company has done is really quite remarkable. I think it's a wonderful example of the way in which the academic community and the biotech community and the pharmaceutical industry in this country and around the world are really laboring together here to try to achieve what we all hope for, which is an alleviation of suffering and a cure for disease.
I'd also like to thank the other person standing up here, Ari Patrinos, of the Department of Energy, for the important role he's played in leading the Genome Project, and the catalytic efforts that he has played in getting today to happen.
So thank you very much. I'll turn it to Craig.
DR. VENTER: Thank you, Francis, for your very nice comments. In a few hours -- or at 12:30 p.m., Francis and I will be making detailed announcements at a press conference across town, where we will be describing much more detailed information about the scientific accomplishments of the two different programs.
Celera, 18 miles from here in Rockville, Maryland, started sequencing the genome in September, just nine months ago. We announced a while ago that we had finished the sequencing phase, and today we're announcing that we've actually now assembled all that data into the linear sequence of the human chromosomes.
This is an exciting stage. It's far from the end-stage, as Francis said. In fact, annotating this, characterizing the genes, characterizing the information, while that's, in reality, going to take most of this century, we plan to make a very significant start on that between now and later this year, when Francis and I agreed to have the two teams try to simultaneously publish the results of the different efforts. At that stage, they'll be really able to be compared in detail. The scientists will be able to really go through the information in dramatic fashion.
Like Francis, I spent a decade looking for one gene. That gene cost hundreds of millions of dollars to actually find and sequence, and it was a combined effort of NIH funding and work funded by MIRC. That same discovery today would take 15 seconds by scientists using the Celera database. And pharmaceutical companies, biotech companies and university researchers are making those discoveries probably as we speak -- unless they're still watching television.
I'm pleased that Francis worked with me, certainly with the help of Ari Patrinos, to have this event be the focus, and shifting the focus to the importance of this work to all of us and to humanity. And if we're going to be the custodians of the genetic information and be trusted to analyze it and interpret it appropriately, we felt it was important for us to rise above the squabbles that you've read about, to act more at the level appropriate with this situation. And I thank Francis for his effort in that regard.
DR. LANE: I should have introduced Ari Patrinos, who runs the Department of Energy's human sequencing research efforts. Department of Energy has been very important from the outset in the concept of the Human Genome Project.
Do you want to say a word? Good, then we're ready for your questions.
Q Dr. Venter, can you tell us what the thought processes were that made you -- and the timing of when you decided to make this a joint announcement?
DR. VENTER: Well, it's been something that's been under works for a very long period of time, but really became much more actively involved when Dr. Patrinos arranged a secret meeting between himself, Francis Collins and myself that turned into a long series of meetings. And I think it's something that we all had hoped would happen. It took the individual efforts of all of us to really make it happen.
Q When was that?
DR. PATRINOS: It started on May 7th, and was followed by three other meetings. The last one was just last week.
DR. COLLINS: And all of those were in Ari's house, and he served beer and pizza, which was an important part of the good outcome here. (Laughter.) Ari, I think, deserves a great deal of credit for being a catalyst. When I called him up in late April and said, can we try this, he was quick to say, yes, let's give it a shot, and put together that first discussion. And things went very well. And thank you, Ari.
Q Where and when are you guys going to publish, and what are you guys going to do with the accompanying data?
DR. VENTER: It hasn't been absolutely decided either where or when. We expect it to be later this year. We're still working on, by assisting the state of data interpretation and writing of manuscripts in both camps, and then we'll try to collectively decide on a time for a submission.
There are several scientific journals that have been wooing us, let's say -- (laughter) -- and I don't think we've made an absolute final decision where that will be.
DR. COLLINS: There's a prodigious amount of work involved in doing the analysis of these 3.1 billion letters of the DNA code, and that is very vigorously underway right now in a public project by a team of investigators that have been meeting by conference call and a variety of other mechanisms. And we aim to try to write really good papers here, not just say, oh, we did it. But also, what did we find here? In the first pass through the human genome what can you learn about what genes are there? And maybe what's not there, as well.
So the intention is to be sure that these are papers that will stand the test of time. And we look forward to the opportunity to do this simultaneously with what Celera is doing.
Q What about the data that will accompany it, where is that going to be deposited?
DR. COLLINS: Craig should speak for Celera. In the public project, as you know, all of the sequence data is deposited onto the Internet every 24 hours. And the analysis of that sequence data will also be appearing very quickly on the Internet, even in advance of publication. But the papers, of course, themselves, will stand on their own because of the additional higher-level analysis that they will include.
DR. VENTER: Celera's data is available right now to the academic and pharmaceutical and biotech worlds, but it's through subscription at the moment. In the fall, when we actually publish our scientific analysis of the genome, that data will be available to academic scientists via our Internet site, Celera.com.
Q As this progresses, how is Celera going to make money on public data?
DR. VENTER: The question is how is Celera going to make money on the public data. Hopefully it won't. Celera has independently sequenced the genome. We decided as a corporation that it was such a significant event that when we were finished with sequencing the genetic code of our species we would make that data freely available to scientists around the world.
We've indicated that the effort to make a financial return for our investors will be from understanding that information. We're right now helping some of the biggest and best pharmaceutical and biotech companies and academic institutions interpret the human genetic code. A key part of this is we will have the mouse genome sequenced by the end of this year, and that will be very key for a layering on top of the human genetic code to in fact interpret it.
But our work previously has shown with the close to 24 genomes that we've done both at Tigr and at Celera, is that having one genetic code is important, but it's not all that useful. And it's only through comparative genomics -- having both human and mouse, dog, chimpanzee, rat, other species to layer on top of the human -- will we only then be able to truly begin to interpret the genetic code.
DR. COLLINS: I want to completely agree with the conclusion that the human sequence, without comparisons to draw to it, is going to be very difficult to understand. And, in fact, the public project is also engaged in beginning the process of sequencing other complex genomes, including the rat, and a fish called the zebra fish, and also the mouse, but a different strain than what Celera is doing. And I think Craig and I would agree that that's a good thing, that these are complementary efforts, and that you learn a lot from whatever sequencing of this sort you do.
I would also strongly want to point out that even with those sequencing efforts coming into fruition, we will need a lot of other tools to understand how the genome works. Methods of studying not just one gene at a time, but the whole genome, in terms of its function. And that's a major goal of the Genome Project in the coming years.
Q I'd like to ask if you all are -- you mentioned doing a joint conference to annotate. Is this going to be like the Drosophila Conference that went on -- the jamboree?
DR. VENTER: No, in fact, I think what the President said is we would -- after publication of our different versions of the genome, we would have a joint scientific conference to compare the results; but, more importantly, I think, to analyze the methods -- which are very different for the two different genome projects -- to understand the best methods for people to go forward.
DR. COLLINS: Yes, I think that's going to be really interesting. In fact, for the mouse, we're actually beginning to do that sequencing by a combination of the whole genome shotgun effort that Celera has pioneered, and the map-based effort, which the public project has been using for humans. Maybe for the mouse, we'll try a combination. But being able to sit down together, and really look at the ins and outs and the details of what kind of sequencing came out of these approaches after the time of publication, is going to be incredibly interesting, and I would think, a lot of fun.
Q Dr. Venter, can you talk about how many patents you've applied for so far on the information that you've derived, and how many you think you'll have applied for before you finally make the data public in full?
DR. VENTER: As of recently, I think, we're up to about two dozen unique gene patents that Celera has filed for. The number is changing constantly as discoveries are made with Celera and its pharmaceutical partners: Phizer, Novartis, Pharmacia, Amgen, Takada in Japan. We're only filing patents on genes that our pharmaceutical partners tell us are essential for their programs to develop new therapeutics, develop new treatments.
So Celera is not following the route of some of these other biotech companies that are just randomly patent sequences that they download nightly from the public effort on a speculative basis. We think it's only important to patent things -- and I compliment the patent commissioner. A recent report was issued doing what, I think, both Francis and I would agree that we are pleased to see is raising the bar, requiring much more information on gene patents than just simply downloading data off the Internet and doing a quick computer search. So I think we're definitely working in the right direction.
Q If I could follow up on that a little bit. I think one of the big sticking points between the two approaches is that the consortium was looking at every single thing in the genome, even nothing -- spaces -- it's interesting -- whereas, Celera was looking for things that were patentable, proprietary. To what extent have you been able to reconcile your different world view there?
DR. COLLINS: I think that's actually an incorrect view. Both methods were aimed to try to look at the entire genome, because we imagine that all of it is interesting, and we would be kind of foolish to pretend that we were smart enough to know what wasn't interesting at this point, so let's just look at all of it.
And I think, actually, Craig's view and mine on the appropriateness of patents are much closer together than most reports would have suggested. And I, too, want to compliment the Patent Commissioner for looking at this issue very carefully over the course of the last few months and setting some new utility guidelines that are, I think, quite reassuring in terms of making sure we end up with an outcome where the patent system is used to provide an incentive for research and not a disincentive.
Q Dr. Collins, could you explain the difference between what you've done and what Celera has done?
DR. COLLINS: Well, how deeply do you want to get into science here, because the answer is going to be of that sort. We will talk about it at 12:30 p.m. But very quickly, the way in which the public project has sequenced the human genome is to first break it down into pieces that are roughly 150,000 letters in length.
Those are relatively straightforward to generate, but fairly challenging to figure out where they go. We have spent a lot of our effort, particularly over the last year, assembling those pieces into large, contiguous fragments of DNA across chromosomes. And we have 97 percent of the genome now covered with those mapped pieces of DNA. We use those pieces, those 150,000 letters long, as our sub-strate for doing the sequencing. So we know how to put that back together.
That is not nearly as challenging a computer problem as the method that Celera has been using, which is quite innovative and it requires, obviously, a lot of computational effort.
So we take those pieces, sequence them one by one, and then reassemble the whole thing back on to the chromosomes, which has been going on particularly vigorously in the last month; and deduce what the original sequence must have been by that method.
Celera takes an approach where they skip over the step of having these 150,000 letter long pieces and goes straight to the sequencing process and then use a computer to reassemble the whole thing at the end of the effort; which obviously has some advantages. Because you don't have to spend all the time and effort on the mapping phase, although, the assembly process -- I imagine Craig might want to comment on this -- is pretty challenging. And there are some uncertainties to the degree with which repeated sequence in the genome may give you headaches of various strength. But maybe you should add to that.
DR. VENTER: There is a couple of other important differences in terms of with the Celera approach for the whole genome shotgun -- we take all the DNA out of the cells of individuals. So we actually have genomes that actually represent individual's entire genetic repertoire. Whereas, some of the back libraries have come from -- I don't know what the total number is -- but a variety of different individuals. And I think this workshop that we were talking about earlier could be actually very instructive in terms of seeing if the two different approaches give the same view of the human genetic code. And I think that's going to be very instructive for all of us.
The calculation that we've done on assembling the genome is certainly, I think, calculations that are larger usually come out of the Department of Energy with some of the supercomputer processing there. I think we did 5 million, trillion calculations to assemble the human genome. It took 20,000 CPU hours on one of the largest supercomputers in history. But it does reassemble the entire genetic code of individuals. And we did this with a fruit fly. Scientists now studying it have reported that there is less than one error in one million base pairs with it, so the method is clearly accurate. But it could give a different answer than these different cloning methods. And I think it's going to be a very instructive -- probably not for the rest of the world, but certainly for the scientists involved to compare the details of those.
Q Dr. Collins, when you called Dr. Patrinos and talked about meeting with Dr. Venter, what led you to believe he would be willing to talk with you about a joint -- at that particular time.
DR. COLLINS: Well, I have to say, although it may not be a popular statement in this room, that the focus on the race and the personality issues that have been so prominently featured in press stories about the genome have in many ways done a disservice to the situation. I don't think the level of animosity or hostility was anything approaching the way it was described in some of the pieces that Craig and I have had to read.
This is, after all, a noble enterprise. Sequencing our genomes should not be something that is tarnished in some way by what appears to be a cat fight amongst people who are involved in the enterprise. I think both of us have felt disheartened by the way in which that so dominated the public image of what was going on with the genome project.
With both of these projects, obviously proceeding extremely well, there is a great opportunity here for sharing information after publication; because of the complimentary nature of the scientific strategies it seemed absolutely the right moment to sit down together and try to figure out a model for cooperation and coordination so that the public would be the greatest beneficiary and we could put behind us this chapter, which I hope history will not be very interested in, which has gone on for too many months and has really done a disservice to the hard labors of thousands of people around the world who have been trying to make this happen for the benefit of mankind.
MR. SIEWART: We have another event so we'll take one more question.
Q Dr. Venter, you have mentioned that Celera is working with Pfizer, Amgen and other companies. Could you just, for a lay person, explain what kind of work you are doing that comes out of the human genome research? And also to kind of both of you, the President, in the event, said that genetic science will realize the treatment and prevention of almost all human diseases. When do you think that those big kind of breakthroughs might start coming?
DR. VENTER: Let me try and answer the second question first. What this information will do is cause a catalytic change in how researchers do their work. Instead of funding the kind of programs that I spent 10 years doing, Dr. Collins spent 10 years doing, as you heard, those can be reduced to between 15 seconds and two weeks. So the challenges now for the scientific community to reassess how we fund science, and what we're funding to make sure that this information now gets used as the beginning of this new science.
There will discoveries made across the board. But it's impossible to predict which diseases, at this point, will see the breakthroughs first. What we know, from scientists studying the intricacies of the genetic code and the genes that all of us are discovering will be the new starting point for going much faster. And those discoveries will build on each other.
So we certainly hope to begin to see things just in the next few years. But some disease, and probably the ones we care about the most, could take longer. They could be much more refractory because we have to understand how the 50,000 or so genes work together to actually form life. And that's never been possible to even contemplate before without having the genetic code.
DR. COLLINS: Can I add to that? The reason I got interested in genomics to begin with was, as a physician, this enormous frustration in not understanding diseases well enough to be able to offer very much.
And when you look at what we currently know about things like diabetes and heart disease and Alzheimers Disease, it's not nearly sufficient to enable us to be able to design the strategies that we all hope for that will really cure these illnesses. I would be willing to make a predication that within 10 years, we will have the potential of offering any of you the opportunity to find out what particular genetic conditions you may be at increased risk for, based upon the discovery of genes involved in common illnesses like diabetes, hypertension, heart disease, and so on.
In many instances, that kind of predictive information could be quite useful to you, provided we put in the appropriate protections so that people don't use it against you. Because it would allow you to practice individualized, preventive medicine, focusing on the things that are most important for your health.
Over the longer term, perhaps in another 15 or 20 years, you will see a complete transformation in therapeutic medicine, because every pharmaceutical company is investing, and every biotech company is also contributing to the development of new targets for drug therapy, based upon the genome. And the therapies that we use 15 or 20 years from now will be directed much more precisely towards the molecular problem in things like cancer, or mental illness, than anything that we currently have available.
So count on this happening. We've got to be patient -- well, maybe we shouldn't be patient. We should be impatient, but I do think we have to expect this is going to take a lot of hard work, a lot of good research, a lot of funding for both the public and private sectors, a lot of partnerships in ways that we have to be very creative about. But the vision is a very exciting one.
DR. VENTER: Let me just briefly answer the first part of your question, which is how do our pharmaceutical partners and subscribers use this information. They're all linked in through virtual private networks, through very high-speed lines that are in basically every time zone around the world. They do searches on the data based on a daily, sometimes minute-by-minute basis. They've already made some tremendous discoveries in each of their own disease areas. And they're using some of the genes right now to move forward; drug design and drug targeting.
Dr. Les Hudson from Pharmacia, one of the top pharmaceutical companies in the world, he's the head of research there, he is one of the earliest subscribers to the Celera database. He will be at the 12:30 p.m. press briefing. And he said he would be available for answering questions about how the pharmaceutical industry is now using this data.
But it's impossible to gauge every one of the possible ways that they use it. That's why we just make the information available; some very dramatic research tools where people can interpret the data and make some very key discoveries. And every one of them have made some pretty exciting discoveries that they will be announcing on their own time.
END 11:38 A.M. EDT