Grand challenges in bioinformatics
(BRIC-NEWS Vol3 No.40, Nov. 20, 1999)
ÀÌ ±ÛÀº KEGG¸¦ ¿î¿µÇϰí ÀÖ´Â ÀϺ»ÀÇ Dr. Minoru Kanehisa°¡ Bioinformatics journal
(Vol.14 no.4 1998, 309)¿¡ ³½ ±ÛÀ» ¹ø¿ªÇÑ °ÍÀÔ´Ï´Ù.
Amino acid sequence·ÎºÎÅÍ ´Ü¹éÁúÀÇ 3 Â÷ ±¸Á¶¸¦ ¿¹ÃøÇÏ´Â ¹®Á¦´Â computational molecular biologyÀÇ °¡Àå Å« µµÀü ÁßÀÇ ÇϳªÀÌ´Ù. ´Ü¹éÁúÀÇ 3 Â÷ ±¸Á¶´Â ¿¿ªÇÐÀû ¾ÈÁ¤¼º¿¡ ÀÇÇØ °áÁ¤µÇ¹Ç·Î, ´Ü¹éÁúÀÇ 3Â÷ ±¸Á¶¸¦ °áÁ¤Çϴµ¥ ÇÊ¿äÇÑ ¸ðµç Á¤º¸µéÀº ¼¿¿¡ Æ÷ÇԵǾî ÀÖ´Ù°í »ý°¢ÇÒ ¼ö ÀÖ´Ù. Áï ¾î¶² ƯÁ¤ ȯ°æÀÌ ÁÖ¾îÁö¸é ´Ü¹éÁúÀº ȯ°æ¿¡ ¸ÂÃß¾î ÀúÀý·Î foldingÀÌ ÀÏ¾î³ ´Ù´Â °ÍÀ» ÀǹÌÇϸç Anfinsen's thermodynamic principleÀ̶ó ºÎ¸¥´Ù.
in vitro ½ÇÇè Á¶°Ç¿¡¼ ¸î¸î ¼±ÅÃÀû ´Ü¹éÁú¿¡ ´ëÇØ ÀÌ ¹ýÄ¢Àº Àß Àû¿ëµÇ¾ú´Ù. ÇÏÁö¸¸ ÃÖ±ÙÀÇ ¿¬±¸¿¡ ÀÇÇØ in vivo protein foldingÀº ÈξÀ º¹ÀâÇϰí, chaperon°ú °°Àº ´Ù¸¥ moleculeµé°ú ¿¬°üµÇ¾î µ¿ÀûÀÎ ±¸Á¶¸¦ °¡Áø °ÍÀ¸·Î ¾Ë·ÁÁö°í ÀÖ´Ù. ¶ÇÇÑ ´Ü¹éÁú ÁÖÀ§ ȯ°æµµ ´Ü¼øÇÑ ¿¿ªÇÐÀû ȯ°æÀÌ ¾Æ´Ï°í, ´Ü¹éÁúÀ» ±¸¼ºÇÏ´Â °¢ ºÐÀÚµéÀÌ °¡Áö´Â ¿©·¯ ÇüÅÂÀÇ interactionÀÇ ÃÑÇÕÀ¸·Î ÀνĵǾî Áö°í ÀÖ´Ù. µû¶ó¼ ºÐÀÚµé»çÀÌÀÇ ÀÌ·¯ÇÑ Æ¯º°ÇÑ interactionÀ» °í·ÁÇÏÁö ¾Ê´Â ÇÑ ÀÚ¿¬ »óÅ¿¡¼ÀÇ protein folding ¹®Á¦´Â ÇØ°áµÇ¾îÁöÁö ¾ÊÀ» °ÍÀÌ´Ù. ÀÌ·¯ÇÑ ¹®Á¦´Â ´Ü¹éÁúÀÇ 2 Â÷±¸Á¶¸¦ ¿¹ÃøÇÏ´Â ¹®Á¦¿¡¼µµ ÀÌ¹Ì ¹ß»ýÇÏ¿´´Ù. Áï ¾Æ¹«¸® ÁÁÀº ¾Ë°í¸®µëÀ» »ç¿ëÇÏ¿©µµ short-range interaction¸¸À» °í·ÁÇÏ´Â ÇÑ ¿¹Ãø ÇÁ·Î±×·¥Àº ÇѰ踦 °¡Áö°í ÀÖ´Ù. ÀÌ·¯ÇÑ ¹®Á¦´Â ´Ü¹éÁú 3 Â÷ ±¸Á¶¸¦ ¿¹ÃøÇÏ´Â ÇÁ·Î±×·¥µéÀÇ ÇѰè·Îµµ Àû¿ëµÉ °ÍÀÌ´Ù.
¿ì¸®´Â whole-genome sequencing ½Ã´ë¿¡ Á¢¾îµé¸é¼ organism reconstruction problemÀ̶ó´Â »õ·Î¿î Å« ¹®Á¦¿¡ Á÷¸éÇÏ°Ô µÇ¾ú´Ù.Áï ÁÖ¾îÁø complete genome sequence·ÎºÎÅÍ ÇÑ °³ÀÇ ¼¼Æ÷·ÎºÎÅÍ ¼º¼÷ÇÑ °³Ã¼·Î ¹ß»ýÇÏ´Â Àü°úÁ¤°ú ¹ß»ý °úÁ¤¿¡ ÇÊ¿äÇÑ ±â´ÉµéÀ» ÄÄÇ»Å͸¦ ÅëÇØ ¿¹ÃøÇÏ´Â Ä¿´Ù¶õ µµÀü¿¡ Á÷¸éÇØ ÀÖ´Ù. ¿©±â¼ ÇöÀç ¿ì¸®´Â protein folding problem°ú ¸¶Âù°¡Áö·Î genomeÀº ÇÑ °³Ã¼ÀÇ Ã»»çÁøÀÌ¸ç °³Ã¼¸¦ ±¸¼ºÇϴµ¥ ÇÊ¿äÇÑ ¸ðµç Á¤º¸µéÀ» ´Ù Æ÷ÇÔÇϰí ÀÖ´Ù´Â ÀüÅëÀûÀÎ ½Ã°¢À» °¡Áö°í ÀÖ´Ù. Áï,¿ø·¡ÀÇ ÇÙ(nucleous)À» ´ë½ÅÇÏ¿© ƯÁ¤ °³Ã¼ÀÇ ¸ðµç Á¤º¸µéÀ» °¡Áø cloneµéÀ» ºÐÀÚ »ý¹°ÇÐ ¹æ¹ý¿¡ ÀÇÇØ Á¦ÀÛÇÒ ¼ö ÀÖ´Ù´Â °á·Ð¿¡ µµ´ÞÇÒ ¼ö ÀÖÀ¸¸ç À̰ÍÀº Dolly's cloning principleÀ̶ó ºÒ¸± ¼ö ÀÖÀ» °ÍÀÌ´Ù.
ÀÌ·¯ÇÑ °¡Á¤ÀÌ ¸Â´Ù¸é ¿ì¸®´Â ¾ðÁ¨°¡´Â sequence informationÀ¸·ÎºÎÅÍ ¸ðµç À¯ÀüÀÚÀÇ ±â´ÉÀ» ¿¹ÃøÇÒ ¼ö ÀÖÀ» °ÍÀÌ´Ù.°¢°¢ÀÇ À¯ÀüÀÚµéÀÇ ±â´ÉÀº ÁÖÀ§ ȯ°æ°ú ¿¬°üµÇ¾î ±× ±â´ÉÀ» °¡Áö¹Ç·Î, ¾ÕÀÇ °¡Á¤Àº ¼¿·ÎºÎÅÍ À¯ÀüÀÚÀÇ ±â´É »Ó ¾Æ´Ï¶ó ÁÖÀ§ ȯ°æ±îÁö ¿¹ÃøµÇ¾î Áú ¼ö ÀÖ´Ù´Â °ÍÀ» ¾Ï½ÃÇÑ´Ù. µû¶ó¼, ¿¹¸¦ µé¸é, genome sequenceÀÇ bioinformatics ¿¬±¸¸¦ ÅëÇØ ÇÑ °³ÀÇ germ cell·ÎºÎÅÍ ÆÄ»ýµÇ´Â ¸ðµç ºÐÀÚ ±¸Á¶¿Í ¹ÝÀÀ ±âÀÛÀ» ¿¹Ãø ÇÒ ¼ö ÀÖÀ» °ÍÀÌ´Ù. µû¶ó¼ ¿ì¸®´Â ÇÑ °³Ã¼ÀÇ ÇüÅÂ¿Í ±â´ÉÀº ÇÙ¿¡ ÀÇÇØ ´ëÇ¥µÈ´Ù°í ¸»ÇÒ ¼ö ÀÖÀ» °ÍÀÌ´Ù.
ÇÏÁö¸¸ ÃÖ±Ù Á¦½ÃµÈ ¶Ç ´Ù¸¥ ½Ã°¢À¸·Î´Â genomeÀº ´ÜÁö °³Ã¼¸¦ ±¸¼ºÇÏ´Â ÀϺκп¡ Áö³ªÁö ¾Ê°í, ÁøÁ¤ÇÑ Ã»»çÁøÀº ºÐÀÚµéÀÇ interactionµéÀÇ networkÀ¸·Î ±¸¼ºµÈ ¼¼Æ÷ Àüü¶ó´Â °¡Á¤ÀÌ´Ù. ÀÌ·¯ÇÑ °¡Á¤¿¡ µû¸£¸é ºÐÀڵ鰣ÀÇ »óÈ£ interaction, À¯ÀüÀÚÀÇ ½Ã°£Àû °ø°£Àû ¹ßÇö¿¡ °ü·ÃµÈ Á¤º¸µî, Ãß°¡ Á¤º¸ ¾øÀÌ genome sequence¸¸À¸·Î´Â ÀüüÀûÀÎ °³Ã¼¸¦ ÆÄ¾ÇÇÒ ¼ö ¾ø´Ù´Â °á·Ð¿¡ À̸£°Ô µÈ´Ù. »ç½Ç»ó ƯÁ¤ °³Ã¼ÀÇ 1/3 ȤÀº 1/2ÀÇ sequencingÀÌ ³¡³ »óÅ¿¡¼µµ °¡»ó ´Ü¹éÁúÀÇ ±â´É¿¡ °ü·ÃµÈ Á¤º¸¸¦ ¾ò±â À§Çؼ´Â disruption experiment¸¦ ÅëÇØ gene-gene interactionÀ» ¹àÈ÷°Å³ª yeast two-hybrid systemÀ» ÀÌ¿ëÇÑ protein-protein interactionÀ» ¹àÈ÷´Â ½ÇÇè °úÁ¤À» °ÅÃÄ¾ß ÇÑ´Ù.
Sequence informationÀÌ ±Þ°ÝÈ÷ Áõ°¡ÇÔ¿¡ µû¶ó Bioinformatics´Â »õ·Î¿î Çй®À¸·Î ±ÞºÎ»óÇϰí ÀÖÀ¸¸ç, »õ·Î¿î µ¥ÀÌÅͺ£À̽ºµé°ú ÄÄÇ»ÅÍ ±â¼úµéÀ» ¹ßÀü½ÃÅ´À¸·Î½á ¼¿ÀÌ °¡Áø »ý¹°ÇÐ Á¤º¸¸¦ ¹àÈ÷´Âµ¥ Å« ¿ªÇÒÀ» Çϰí ÀÖ´Ù. ¾ÕÀ¸·Î µµ·¡ÇÒ systematic functional analysis ½Ã´ë¿¡´Â bioinformatics´Â ´Ü¼øÇÑ Á¤º¸ ÀúÀå ¹× °¡°øÀÇ Â÷¿øÀ» ³Ñ¾î¼ ºÐÀڵ鰣ÀÇ interaction¿¡ °ü·ÃµÈ ¿Ïº®ÇÑ catalog¸¦ Á¦°øÇÒ ¼ö ÀÖÀ» °ÍÀ¸·Î ±â´ëµÈ´Ù.ÀÌ·¯ÇÑ ¹ßÀüµÈ ÇüÅÂÀÇ Á¤º¸µéÀ» ÅëÇØ ÇöÀç Bioinformatics°¡ ´ç¸éÇÑ Å« ¹®Á¦µéÀº, óÀ½ °èȹµÈ ÇüÅ´ ¾Æ´Ò Áö¶óµµ, ¾ðÁ¨°¡´Â ¹àÇôÁú °ÍÀ¸·Î ±â´ëµÈ´Ù.
<¿ø¹®>
The protein folding problem has been one of the grand challenges in computational molecular biology. The problem is to predict the native three-dimensional structure of a protein from its amino acid sequence. It is widely believed that the amino acid sequence contains all the necessary information to make up the correct three-dimensional structure, since the protein folding is apparently thermodynamically determined; namely, given a proper environment, a protein would fold up spontaneously. This is called Anfinsen's thermodynamic principle.
While this principle is well established in selected proteins under in vitro experimental conditions, protein folding in vivo is a more complex and dynamic process involving a number of other molecules such as chaperones. The environment has to be considered as a collection of various interactions with molecules rather than a smooth thermodynamic environment. It is not unreasonable to expect that the protein folding problem cannot be solved for the majority of proteins in nature without considering specific molecular interactions. This is reminiscent of the problem of secondary structure prediction in proteins. However good the algorithms developed for secondary structure prediction are, the success rate will be limited as long as only the short-range interactions are considered. Similarly, however good the algorithms developed for the three-dimensional structure prediction are, the success rate will be limited as long as only the information of a single molecule is examined.
In the era of whole-genome sequencing, we are faced with another grand challenge problem, which may be called the organism reconstruction problem. Given a complete genome sequence, the problem is to predict computationally the development of the adult from a single cell and its continual function as a biological organism. Here again, a traditional view is that the genome is a blueprint of life containing all the necessary information that would make up an organism. A clone can be made by replacing the nucleus, which is the localized area containing all genetic information. Thus, this might be called Dolly's cloning principle.
According to this genetic determinism principle, we should eventually be able to predict the function of every gene in the genome by its sequence information alone. Implicitly, this assumes that the environment of each gene is also computable from the complete genome sequence because the function of a molecule can only become meaningful in relation to its environment. Therefore, the entire molecular architectures and molecular reaction pathways in a germ cell, for example, may be computable from the genomic sequence. We thus end up asserting that the form and function of an organism are represented in the nucleus.
In an alternative view, the genome is simply a warehouse of parts, or building blocks of life, and a real blueprint of life is written in the entire cell, perhaps as a network of molecular interactions. Whichever view one takes, it is impossible in practice to make sense fully out of the sequence data without additional information, including time and localization of expression and, especially, the information on molecular interactions. In fact, in order to obtain any functional clue of hypothetical proteins that still form one-third to one-half of the genes in every genome that has been sequenced, new systematic experiments are being designed to observe, for example, gene-gene interactions by disruption experiments and protein-protein interactions by yeast two-hybrid system experiments.
Bioinformatics has emerged as a major discipline due to the rapid increase in sequence information, developing new databases and computational technologies that help us to understand the biological meaning encoded in the sequence data. In a post-genomic era of systematic functional analysis, the basis of bioinformatics is not only the complete catalogue of building blocks, but also the complete catalogue of their interactions. With this new level of information, the grand challenge problems in bioinformatics, both old and new, and both structural and functional, may one day be elucidated, although not in the manner in which they were originally formulated.