Walkabout with us! ;P

05 April 2010

What Del is working on in Oz (part 2)

Welcome back everyone and thanks for reading. So let's start by summarizing what we talked about in the last post. We answered the question "What is a biocatalyst?" It is an enzyme: a protein that makes chemical reactions in your body happen at a useful rate. We also learned that proteins are long chains of amino-acids that are attached end to end and then fold up into their final shape. Here is a little picture I made summarizing this: Now, on to the next question: "Why do I want to design biocatalysts?" I mentioned in the last post that designing enzymes is very challenging, so why would I want to spend my time on it? Well, part of the fun is in the challenge. The hope is that whether I succeed or I fail, I will learn something that is potentially useful for myself and the world of enzyme design. Specifically people want to design enzymes when there is no existing natural enzyme to efficiently carry out a certain task. In my case, I am designing an enzyme which will (hopefully) break down pesticides in such a way as to make them much less harmful to humans and natural wildlife, but still work against pest insects that destroy crops. This type of problem is an example of bio-remediation. Bio-remediation is the use of microorganisms to break down things that are harmful for the environment. So if I could design an enzyme that is useful for breaking down pesticides, we could put a copy of that gene (sequence of DNA with instructions on how to make my enzyme) into bacteria and have them make my enzyme and use it to detoxify pesticides. This specific reason describes the motivation for my post-doctoral research, but also points to the kind of potential such designed enzymes can have. If we can get really good at designing enzymes, we may be able to create enzymes for all sorts of tasks: making compounds that would be useful to mankind (biofuels, synthetic polymers), breaking down our garbage, and even fighting disease (enzyme therapy)! These things are a far ways off, but imagine how much good can come of it.

Onto the third question: "What is computational modeling?" I assure you it is not laptops walking down runways with fashion photographers shooting pictures (although some may argue that is exactly what Apple's product launches are becoming...). In general computational modeling involves using a computer to represent the physical properties of an object or process. There are computer models for all sorts of things: how aerodynamic your car is, how a rocket will behave when leaving orbit, how the climate changes, and in my case, how proteins do the things they do. The movie you saw in my last post of a protein folding is an example of computer modeling (or computer simulation, I like that term better). Computer simulation is frequently used to study molecules (and proteins in particular). This is because computers can do certain things easily that are more difficult to do in reality. One example is mutating a protein. It is trivial for a computer to make a number of mutations (changing one amino acid into another), while to do this in the real world requires a lot of expensive lab equipment. Another thing computers can do very well (when it comes to looking at proteins) is to watch the movement of specific atoms on very short timescales. Small rapid changes in a protein's shape can be modeled very easily in a computer where to observe them in reality requires some very fancy equipment (i.e. temperature jump spectroscopy).

In addition to these sorts of things there is one place where computers really shine: doing repetitive tasks very efficiently. Humans on the other hand aren't so great at this. If an experiment needs to be performed 100 or 1000 or more times, it is difficult to find a young scientist who can do this in a timely and reproducible manner. Usually robots and special techniques are employed, which as you can imagine costs lots of money. Computers on the other hand specialize in these repetitive tasks. Computer scientists call these tasks "embarrassingly parallel" or "trivially parallel" (I prefer the phrase "pleasingly parallel", but that's just me). To put this in real world terms, ditch digging would be an embarrassingly parallel problem. If you need to dig a ditch that is 100 feet long, it may take one guy 24 hours to do it. But what if you get 2 guys (who both work at the same pace)... it now takes 12 hours. By this logic, it is very conceivable that you can get 10 guys and have the ditch dug in about 2 and a half hours. Of course there is a limit to this, after which you can't parallelize the problem anymore (you can't dig the ditch in one second with 86,400 people).
A number of problems related to protein folding, protein-drug interactions (yes, most drugs work by interacting with proteins), and protein design are also parallelizable like our ditch digging. For example, to actually determine how a protein folds, we usually need to do 1000's of simulations, which can all be done independently of each other for the most part. At Stanford, the lab I worked in solved this problem by creating Folding@Home: a screen saver that people download and run on their home computers. All over the world when people with the screen saver aren't using their computers, they perform protein folding simulations. This way we can perform 1000's of simulations all at once, and can learn something about how a protein folds. Enzyme design is another problem that has a pleasantly parallel component to it. To design an enzyme we need to make a number of small changes to the shape of the protein and see if each makes our model better or worse. (Technically, we can do independent mutations and conformational rotomer searches to see how this affects the protein's stability). Also, in enzyme design we frequently have a number of different models we want to try out and because we are testing them in computers, we can try them all at the same time.

Now onto the last question "What is directed evolution?" Directed evolution is essentially making evolution do what you tell it, or evolving things in the laboratory for a specific purpose. To understand this I'll briefly explain how evolution works. You already know that your genes (your DNA) encodes the instructions to make all of the proteins in your body. What I haven't told you is that over time, genes can build up mutations wherein something (radiation, chemicals, viruses, natural replication) makes a small change to your DNA (mutagenesis). This results in a change to your protein. Sometimes these changes don't really show up or have any obvious effect at all (silent mutations). Other times they can have a drastic effect on an organism (either good or bad). If a bunch of organisms have to compete for limited resources, nature will select for mutations that make an organism more fit to obtain those resources. That organism in turn has a higher rate of survival and passes its genes on to its offspring. Of course because all organisms are evolving and their survival affects each other, this results in a complex and constantly changing evolutionary landscape. I've drawn a little picture below of this concept using bacteria as an example (which also illustrates how natural selection leads to antibiotic resistant bacteria).




An interesting example of evolution can be found in the case of sickle-cell anemia in sub-Saharan Africa. One small change of one amino acid in hemoglobin causes it to loose much of its ability to do its job (carry oxygen around your body; it's also what makes your blood red). You would think that because of this, people with sickle-cell anemia would all die and the trait would disappear. It turns out that genetics are a little more complicated than that. Because we get one set of genes from our mom and one set of genes from our dad, a person can end up with both a normal hemoglobin gene and a sickle-cell hemoglobin gene (this is called a heterozygote, and can be considered 'carrier' of a gene). Now here's where it gets really cool. It turns out that people with one bad gene and one good gene are resistant to malaria (a deadly blood parasite transmitted by mosquito bites). So people with 2 good genes (dominant homozygote) have a higher chance of dying from malaria. People with 2 bad genes (recessive homozygote) have a higher chance of dying from sickle-cell anemia. But people with a good gene and a bad gene are resistant to malaria yet still have enough good hemoglobin to survive the sickle cell disease. Thus you have an evolutionary niche where the mutant gene has a good chance of survival. Here is a picture I put together illustrating this example:
Now in the case of directed evolution, we take a simple organism like bacteria or yeast, and force it to mutate. Then we just set things up so the only bacteria or yeast that survive are the ones that are good at doing the job we are interested in (say digesting oil to cleanup an oil spill, or detoxifying pesticides to make them less harmful to local wildlife). We keep repeating this process of forcing random mutations and stacking the deck for the survival of our specialized bacteria. After a number of iterations we end up with bacteria that have been evolved in the laboratory for the purpose of doing a specific job. This is directed evolution. We hope to combine directed evolution with computer modeling to make a special kind of bacteria that can detoxify pesticides. What we will do is use our computer simulation to predict what protein (sequence of amino acids) will make a good enzyme. We then put a gene for this protein into our bacteria and force them to evolve. Hopefully they will evolve in such a way as to get better at digesting pesticides. After we evolve our bacteria, we take a look at their enzymes and see how evolution changed them from our original predictions. Hopefully repeating this process will do two things:
  1. allow us to create an enzyme that is designed to be really good at breaking down pesticides
  2. learn something about how good we are at designing enzymes and hopefully find ways to improve
So there you have it: "Designing biocatalysts through combined computational modeling and directed evolution". My job in Australia is to try to design enzymes that are useful in bio-remediation. I'll be using my expertise in computer simulation of proteins combined with the expertise of my boss and our collaborators to make the proteins I suggest and optimize them through evolution. Pretty cool huh! I'm very excited about this project. It is challenging, but it also represents an important step towards using our understanding of biology, chemistry, and physics to change our world for the better. Who knows, it may work out really well and I could learn so much that I decide to start a small biotechnology company specializing in enzyme design! That would definitely be cool. Well, anyway thanks for reading and taking an interest in my work!

9 comments:

  1. I like this post and the graphics are awesome!!! No, I didn't do them! Del is naturally good at using Illustrator on his own, I'm just a coach on occasion. Thanks Baby, this is great!

    ReplyDelete
  2. Something I've always been curious about with computer modeling, What is the rate of accuracy? Or how is it measured?

    I'm sure the answer is different depending on what is being "modeled". For example I would expect that aerospace applications are closer to WYSIWYG than Biological because the simulations are working with essentially set in stone parameters. Gravity is always X, Air always behaves Y at 32 deg. C. Etc. Where as biological factors are less predictable.

    The question for me is this. How often do your simulations say "if you filter for X trait, you will get a Y enzyme, that should do Z towards breaking down pesticides" and then you do X. What percentage of the time does Y happen first. And then what I would think is the fun part if Y does result from X how often does X do what your models predict it will?

    Do we know all the ways that a protein can fold? Are we ever surprised by a new fold? Air flowing over wing with 1sqft surface area at 30 mph etc etc will produce 10 ftlbs of lift, ALWAYS. (numbers have been made up and simplified for my tiny brain) Are there similar truths about proteins folding?

    ReplyDelete
  3. Wow, those are great questions! We'll have to make sure the Doctor answers those for you because I am curious about them myself now.

    ReplyDelete
  4. Awesome questions Don! Let me do my best to answer them (and expand a little on the topic of molecular simulation in general). We do have things we know on the microscopic level (Planck's constant, Boltzmann's constant, the charge of an electron, the mass of a hydrogen atom, etc.) but the issue of parameterization does become problematic. For example, the simulation of protein folding you saw in the previous post is actually performed numerically the same way as a ballistics simulation (integrating Newton's second law of motion to form a trajectory). The problem is: how good is the approximation of a molecule as a classical mechanical object? To treat a protein as a classical object we treat each atom as a point mass and each chemical bond as a spring. This turns out to be okay for some things, but not so much for others. Molecules also have quantum mechanical properties that we need to account for. These properties basically involve how each atom will attract or repel each other atom. It turns out that parameterizing this part is very difficult.

    In order to determine our parameters we frequently do one of two things: make experimental measurements or do more detailed calculations on a smaller (simpler) problem and try to transfer what we see to our larger models. So we make measurements of things like vibrational frequencies (to determine how bonds stretch and angles bend) and do expensive quantum mechanics calculations of small systems (like single amino acids) and infer our classical parameters from these smaller experiments/simulations. Now the first obvious question is: why not just do the quantum mechanics calculation for the whole protein. This is so expensive that it is essentially impossible (the calculation would take millions of years). So we use the expensive quantum method on a small model to get parameters for the less expensive classical method on a big model.

    The next obvious question would be: how do we know that our parameters derived from small models are transferable to the big system? To test this we need to have an experimental result to compare it to. In the world of protein folding, this is usually the rate of folding (but can be a number of other things as well). We tend to trust parameters that can predict folding rates accurately. We hope that a model of a protein whose parameters come from things we trust (vibrational frequencies, quantum mechanics, etc) and can recapitulate experimental observables (folding rates for example) will be informative in telling us something that experiments cannot. This is shown in the protein folding video in the last post. We get good agreement with what can be measured by experiment (folding rates, stability of the folded structure) and learn something new (mechanism AKA order or events of folding).

    --continued in next comment, I'm apparently too long winded for blogger!--

    ReplyDelete
  5. I know at this point I may have significantly diverged from your question, so I'll try to rein in my crazy ramblings. The challenge of computer modeling is picking (or creating) the best model to answer the specific question. The cool thing about my work in enzyme design is (as you noticed) that I get to have this cross talk with experimentalists to validate and improve my models.

    For example I predict protein sequence X will create an enzyme, which will be good at detoxifying pesticides. What if enzyme X is good at something else? What if nature changes protein X to protein Y that is better at detoxifying pesticides? If so, how is Y different than X? All the fun is in these questions as you rightfully point out.

    Making accurate predictions though is definitely the goal. Predicting things like protein structure and function, protein folding, enzyme catalysis, drug efficacy, are among the great, unsolved problems in science today. The methods that are the most successful are those that balance computational cost with detail, but there is still a long way to go. In some ways it is like primitive man trying to create the first machines. As much as we know about biology, I feel that our understanding of it is still very primitive. Our nascent ability to predict things like potential drugs, enzymes, and protein structures shows that we are on the right track but have a long long way to go. We have had successes and we are very proud of them, but we have a long way to go before we can predict if a chemical will make a good drug as easily as we can predict the lift on a wing at a given temperature, speed, wing shape. Hope this helps, let me know if you have any other questions (or if I missed anything in answering your current ones). Also, thanks for reading and commenting! I really appreciate the interest!

    ReplyDelete
  6. So essentially (for hollyism terms), you really don't know all the ways a protein can fold, you can occasionally be surprised by a result in folding, some proteins can react in way not predicted, and there is no real way to know all the answers until more and more and more research is done? But, am I to understand that this is really the core of it and the part that one would derive the most fun out of this job?

    I'm hoping I didn't miss any key answers here, so I'm asking in the most simplistic way to ensure that I've understood everything. Thanks Baby!

    ReplyDelete
  7. Is there a repository of knowledge somewhere that a person can go to to find the results of previous quantum mechanics calculations for a particular protein? Is this information shared between researchers? Kind of like the genome project? I would suspect that if you did use someone else's research as a short cut, you would have to go back and check their work if you found something significant? Then again if you predict something with modeling, using someone else's results and it proves out in the real world, you can ignore the possibility of bogus data.
    (I guess I was sort of typing to myself on that last point.:) )

    ReplyDelete
  8. So there are numerous efforts in the scientific community to continue to refine the parameters for various models (protein folding, structure prediction, enzyme design, etc). When someone publishes a new model it is scrutinized and reviewed by our peers and double checked (peer review). When we use it, we cite their publication in our publication. This process in general allows continuous development and evaluation of models and methods. The common practice is to use the work of others as a short-cut in this field (and all of science actually). For example, there is a large effort to come up with a general set of parameters for folding all different proteins (called a force field). The parameters used to fold the protein in that video are publicly available and contributed to by a number of collaborators. I would even go so far to say as that scientists want you to use their work as a shortcut. The more people build off of your research, the more they are obliged to give you credit for your work. The number of papers you publish and the number of times those papers are cited is a type of scientific currency (and is a measure of your work's significance as well as your success in science). In the case of something like a parameter set for protein folding, every time it is used to investigate how a different protein folds, your work will be cited. This will also lead to continuous scrutinization of your model. As for bogus data, people do make mistakes in the rush to publish, and if these mistakes are significant enough the paper will be retracted. If someone falsifies data on the other hand, it will eventually be found out and that will be career suicide for that unsavory scientist. Thanks again for your interest cous, it is really great getting to talk to you about science!!!

    ReplyDelete