Multiple Alignments

 

        MULTIPLE ALIGNMENT MODE¸¦ ¼±ÅÃÇÑ ÈÄ ALIGNMENT menu·Î µé¾î°¡ multiple alignments¸¦ ½Ç½ÃÇÑ´Ù.

        Multiple alignments ´Â 3´Ü°è·Î ÀÌ·ç¾îÁø´Ù.

1) ¸ðµç ¼­¿­µéÀº ¼­·Î °£¿¡ ºñ±³µÈ´Ù (pairwise alignments);

2) À¯»ç¼ºÀ¸·Î ¼­¿­µéÀÇ °³·«ÀûÀÎ groupingÀ» ±â¼úÇÏ´Â dendrogram (like a phylogenetic tree) ÀÌ Çü¼ºµÈ´Ù (stored in a file).

3) ¸¶Áö¸·À¸·Î dendrogramÀ» guide·Î »ç¿ëÇÏ¿© multiple alignment¸¦ ½Ç½ÃÇÑ´Ù.

 

        ÀÌ 3 ´Ü°è´Â DO COMPLETE ALIGNMENT optionÀ» ¼±ÅÃÇϸé ÀÚµ¿À¸·Î ¼öÇàµÈ´Ù. ±âÁ¸ÀÇ guide tree fileÀ» »ç¿ëÇÏ¿© ù ¹øÂ° ´Ü°è (pairwise alignments; guide tree)¸¦ »ý·«ÇÒ ¼öµµ ÀÖ´Ù (DO ALIGNMENT FROM GUIDE TREE); ¶Ç´Â multiple alignment ¾øÀÌ guide tree ¸¸À» ¸¸µé ¼ö ÀÖ´Ù  (PRODUCE GUIDE TREE ONLY).

 

REALIGN SELECTED SEQUENCES Àº ¹è¿­µÈ »óŰ¡ ÁÁÁö ¾ÊÀ» ¶§ Àç¹è¿­À» À§ÇØ »ç¿ëµÈ´Ù. ¼­¿­µéÀº ¸íĪµé¿¡¼­ ¼±ÅÃµÉ ¼ö ÀÖ´Ù - see Editing Alignments for more details. ¼±ÅõÇÁö ¾ÊÀº ¼­¿­µéÀº °íÁ¤µÇ°í (fixed), À̵鸸À» Æ÷ÇÔÇÏ¿© profile ÀÌ ¸¸µé¾îÁø´Ù. ÀÌ profile¿¡ ¼±ÅÃµÈ ¼­¿­µé °¢°¢ÀÌ Â÷·Ê·Î Àç¹è¿­µÈ´Ù. Àç¹è¿­µÈ ¼­¿­µéÀº ¹è¿­ÀÇ ³¡¿¡ ÇϳªÀÇ groupÀ¸·Î Àü°³µÈ´Ù.

 

REALIGN SELECTED SEQUENCE RANGE Àº alignmentÀÇ ÀûÀº ¿µ¿ªÀ» Àç¹è¿­Çϴµ¥ »ç¿ëµÈ´Ù. Sequence display area¿¡¼­ Ŭ¸¯ÇÏ¿© Àܱ⠹üÀ§¸¦ ¼±ÅÃÇÑ ´ÙÀ½ multiple alignmentÀ» ¼öÇàÇÑ´Ù. ¸¶Áö¸·À¸·Î »õ·Î¿î alignment¸¦ ¸ÕÀúÀÇ full sequence alignment¿¡ °¡Á®´Ù ºÙÀδÙ.

 

¼³Á¤°ªÀ¸·Î gap penalties°¡ subrangeÀÇ °¢ ¸»´Ü¿¡ ¸»´ÜÀÇ gapµé¿¡ ¹úÁ¡À» ÁÖ±âÀ§ÇØ »ç¿ëµÈ´Ù. ¸¸¾à REALIGN SEGMENT END GAP PENALTIES option ÀÌ ²¨Á®ÀÖÀ¸¸é, gapµéÀº residue rangeÀÇ ³¡¿¡ cost ¾øÀÌ ³Ö¾îÁú ¼ö ÀÖ´Ù.

 

 

ALIGNMENT PARAMETERS Àº ´ÙÀ½°ú °°Àº »ç¾çÀÇ sub-menu¸¦ º¸¿©ÁØ´Ù.

RESET NEW GAPS BEFORE ALIGNMENT Àº parameterµéÀ» º¯È­½ÃÄÑ ´Ù½Ã alignmentÇϱ⸦ ¿øÇÑ´Ù¸é multiple alignment µ¿¾È ¼­¿­µé¿¡ »ðÀԵǴ »õ·Î¿î gapµéÀ» Á¦°ÅÇÑ´Ù. À̰ÍÀº 2¹øÂ°   multiple alignment¸¦ Çϱâ Á÷Àü¿¡¸¸ È¿°ú¸¦ ¹ßÈÖÇÑ´Ù. À̰ÍÀÌ ÀÛµ¿µÇ°Ç ¾Æ´Ï°Ç alignment ÈÄ¿¡´Â  °èÅë¼ö¸¦ ¸¸µé ¼ö ÀÖ´Ù. À̰ÍÀ» ²ô¸é µÎ ¹øÂ° multiple alignment¸¦ ÇÏ¿©µµ »õ·Î¿î gapµéÀº À¯ÁöµÈ´Ù. À̰ÍÀº alignment¸¦ Á¡Â÷ÀûÀ¸·Î ¹Ýº¹ÇÒ ¼ö ÀÖµµ·Ï ÇØÁØ´Ù. °¡²û alignment´Â 2, 3¹øÀÇ ¹Ýº¹À¸·Î ÁõÁøµÉ ¼ö ÀÖ´Ù.

 

RESET ALL GAPS BEFORE ALIGNMENT ´Â ¼­¿­ ÀÔ·Â file·ÎºÎÅÍ ÀÐÇôÁø °ÍµéÀ» Æ÷ÇÔÇÑ ¸ðµç gapµéÀ» Á¦°ÅÇÒ °ÍÀÌ´Ù. À̰ÍÀº µÎ ¹øÂ° multiple alignment Á÷Àü¿¡¸¸ È¿°ú¸¦ ¹ßÈÖÇÑ´Ù. À̰ÍÀÌ ÀÛµ¿µÇ°Ç ¾Æ´Ï°Ç alignment ÈÄ¿¡´Â  °èÅë¼ö¸¦ ¸¸µé ¼ö ÀÖ´Ù. À̰ÍÀ» ²ô¸é µÎ ¹øÂ° multiple alignment¸¦ ÇÏ¿©µµ »õ·Î¿î gapµéÀº À¯ÁöµÈ´Ù. À̰ÍÀº alignment¸¦ Á¡Â÷ÀûÀ¸·Î ¹Ýº¹ÇÒ ¼ö ÀÖµµ·Ï ÇØÁØ´Ù. °¡²û alignment´Â 2, 3¹øÀÇ ¹Ýº¹À¸·Î ÁõÁøµÉ ¼ö ÀÖ´Ù.

 

PAIRWISE ALIGNMENT PARAMETERS ´Â Ãʱ⠹迭µéÀÇ ¼Óµµ¿Í Á¤¹Ðµµ¸¦ Á¶ÀýÇÑ´Ù.

 

MULTIPLE ALIGNMENT PARAMETERS ´Â ¸¶Áö¸· multiple alignmentµéÀÇ gapµéÀ» Á¶ÀýÇÑ´Ù.

 

PROTEIN GAP PARAMETERS ´Â ´Ü¹éÁú ¼­¿­¿¡¸¸ »ç¿ëµÇ´Â °ÍÀ¸·Î ¿©·¯ °¡Áö parameterµéÀ» º¼ ¼ö ÀÖ´Â ÀϽÃÀûÀΠâÀ» Àü°³ÇÑ´Ù. (SECONDARY STRUCTURE PARAMETERS´Â Profile Alignment Mode¿¡¼­¸¸ »ç¿ëÇÏ´Â °ÍÀ¸·Î gap penalty masksÇÏ°í¸¸ »ç¿ëµÇ´Â ¿©·¯ °¡Áö parametersÀ» ¹Ù²Ü ¼ö ÀÖµµ·Ï ÇÑ´Ù)

 

SAVE LOG FILE ´Â alignment calculation scores¸¦ filename ¿¡ ±â·ÏÇÑ´Ù. log file ¸íĪÀº ÀÔ·ÂÇÑ ¼­¿­ÀÇ filename°ú °°À¸¸ç È®ÀåÀÚ°¡ .logÀÌ´Ù.

 

OUTPUT FORMAT OPTIONS

        5°¡ÁöÀÇ ´Ù¸¥ alignment formats (CLUSTAL, GCG, NBRF/PIR, PHYLIP and GDE)À¸·ÎºÎÅÍ ¼±ÅÃÇÒ ¼ö ÀÖ´Ù. Çϳª ÀÌ»óÀ» ¼±ÅÃÇÒ ¼öµµ ÀÖ´Ù (¿øÇϸé 5°³ ¸ðµÎµµ ¼±ÅÃÇÒ ¼ö ÀÖ´Ù)

 

CLUSTAL format output is a self explanatory alignment format. It shows the sequences aligned in blocks. It can be read in again at a later date to (for example) calculate a phylogenetic tree or add in new sequences by profile alignment.

 

GCG output can be used by any of the GCG programs that can work on multiple alignments (e.g. PRETTY, PROFILEMAKE, PLOTALIGN). It is the same as the GCG .msf format files (multiple sequence file); new in version 7 of GCG.

 

PHYLIP format output can be used for input to the PHYLIP package of Joe Felsenstein.  This is a very widely used package for doing every imaginable form of phylogenetic analysis (MUCH more than the the modest introduction offered by this program).

 

NBRF/PIR: this is the same as the standard PIR format with ONE ADDITION. Gap characters "-" are used to indicate the positions of gaps in the multiple alignment. These files can be re-used as input in any part of clustal that allows sequences (or alignments or profiles) to be read in.

 

GDE:  this format is used by the GDE package of Steven Smith and is understood by SEQLAB in GCG 9 or later.

 

GDE OUTPUT CASE: sequences in GDE format may be written in either upper or lower case.

 

CLUSTALW SEQUENCE NUMBERS: Àܱ⠼öµéÀÌ alignment linesÀÇ ³¡¿¡  clustalw formatÀ¸·Î ÷°¡µÈ´Ù.

 

OUTPUT ORDER ´Â output alignments ¼­¿­µéÀÇ ¼ø¼­¸¦ Á¶ÀýÇϴµ¥ »ç¿ëµÈ´Ù. ¼³Á¤°ªÀ¸·Î ¼­¿­µéÀÌ ¹è¿­µÈ ¼ø¼­¸¦ »ç¿ëÇÏ¿© (from the guide tree/dendrogram) °¡±õ°Ô ¿¬°üµÈ ¼­¿­µéÀ» ÀÚµ¿À¸·Î  grouping ÇÑ´Ù. À̰ÍÀº ¿ø·¡ ÀÔ·ÂÇÑ ¼ø¼­¿Í µ¿ÀÏÇϵµ·Ï ÀüȯµÉ ¼ö ÀÖ´Ù.

 

PARAMETER OUTPUT: ÀÌ option Àº alignment µ¿¾È parameter file (suffix.par)¿¡ ÀÖ´Â ¸ðµç parameter settingµéÀ» ÀúÀåÇÑ´Ù. ÀÌ ÆÄÀÏÀº µ¿ÀÏÇÑ parameterµéÀ» »ç¿ëÇÏ¿© ClustalW¸¦ ´Ù½Ã °¡µ¿Çϴµ¥ »ç¿ëµÉ ¼ö ÀÖ´Ù.

 

 

ALIGNMENT PARAMETERS

PAIRWISE ALIGNMENT PARAMETERS

        ¼­¿­µéÀÇ ¸ðµç ½Ö »çÀÌ¿¡ °Å¸®°¡ °è»êµÇ°í À̵éÀº ÃÖÁ¾ multiple alignment·Î ÀεµÇÏ´Â °èÅë¼ö¸¦ ±¸ÃøÇϴµ¥ »ç¿ëµÈ´Ù. ScoreµéÀº º°µµÀÇ pairwise alignmentµé·ÎºÎÅÍ °è»êµÈ´Ù. À̰͵éÀº µÎ°¡Áö ¹æ¹ýÀ» »ç¿ëÇÏ¿© °è»êµÈ´Ù: dynamic programming (slow but accurate) ¶Ç´Â Wilbur and Lipman ¹æ¹ý (extremely fast but approximate).

        PAIRWISE ALIGNMENTS optionÀ» »ç¿ëÇÏ¿© µÎ °³ÀÇ ¹æ¹ý Áß ¼±ÅÃÇÒ ¼ö ÀÖ´Ù. Slow/accurate ¹æ¹ýÀº ªÀº ¼­¿­µé¿¡´Â ÃæºÐÈ÷ ºü¸£³ª ¸¹Àº (e.g. >100) ±ä (e.g. >1000 residue) ¼­¿­µé¿¡´Â ¸Å¿ì ´À¸± °ÍÀÌ´Ù.

 

SLOW-ACCURATE alignment parameters: ÀÌ parameterµéÀº ¹è¿­µéÀÇ ¼Óµµ¿¡ ¾Æ¹«·± ¿µÇâÀ» ¹ÌÄ¡Áö ¾Ê´Â´Ù. À̵éÀº ÃÖÃÊÀÇ ¹è¿­À» ¸¸µå´Âµ¥ »ç¿ëµÇ°í, ±× ÈÄ ÀÌ ¹è¿­Àº percent identity scoreµéÀ» ¸¸µé±â À§ÇØ rescore µÈ´Ù. ÀÌ % scoreµéÀÌ ½ºÅ©¸°¿¡ Àü°³µÇ´Â °ÍÀÌ´Ù. ÀÌ scoreµéÀº treeµéÀ» À§ÇØ distanceµé·Î ÀüȯµÈ´Ù.

 

Gap Open Penalty:   ¹è¿­¿¡¼­ gapÀ» ¸¸µå´Âµ¥ ´ëÇÑ ¹úÁ¡

 

Gap Extension Penalty: gapÀ» 1°³ÀÇ Àܱ⾿ ´À¸®´Âµ¥ µû¸¥ ¹úÁ¡

 

Protein Weight Matrix: °¢ ¾Æ¹Ì³ë»êÀÇ ¼­·Î¿¡ ´ëÇÑ À¯»ç¼ºÀ» Ç¥½ÃÇÏ´Â scoring table

 

Load protein matrix: file·ÎºÎÅÍ comparison table¸¦ ÀÐÀ» ¼ö ÀÖµµ·Ï ÇØÁØ´Ù.

 

DNA weight matrix: matchµé°ú mismatchµé¿¡ Á¤ÇØÁø scoreµé (including IUB ambiguity codes).

 

Load DNA matrix: file·ÎºÎÅÍ comparison table¸¦ ÀÐÀ» ¼ö ÀÖµµ·Ï ÇØÁØ´Ù.

 

matrix input formatÀÇ »ó¼¼ÇÑ ³»¿ëÀº ÇÏ´Ü¿¡ ÀÖ´Â Multiple alignment parametersÀÇ MATRIX optionÀ» ÂüÁ¶Ç϶ó.

 

FAST-APPROXIMATE alignment parameters: ÀÌ similarity scoreµéÀº 4°³ÀÇ parameterµé¿¡ ÀÇÇØ Á¶ÀýµÇ´Â fast, approximate, global alignmentµé·ÎºÎÅÍ °è»êµÈ´Ù. 2°³ÀÇ ±â¼úÀÌ ÀÌµé ¹è¿­À» ¸Å¿ì ºü¸£°Ô ¸¸µå´Âµ¥ »ç¿ëµÈ´Ù: 1) ´ÜÁö Á¤È®È÷ ÀÏÄ¡ÇÏ´Â ÀýÆíµé (k-tuples)ÀÌ °í·ÁµÈ´Ù; 2) ´ÜÁö 'best' diagonalµé (the ones with most k-tuple matches)ÀÌ »ç¿ëµÈ´Ù.

 

GAP PENALTY:  À̰ÍÀº fast alignmentµé¿¡¼­ °¢°¢ÀÇ gap¿¡ ´ëÇÑ ¹úÁ¡ÀÌ´Ù. À̰ÍÀº ±Ø´ÜÀûÀÎ °ªµéÀ» Á¦¿ÜÇϰí´Â ¼Óµµ¿Í Á¤¹Ðµµ¿¡ °ÅÀÇ ¿µÇâÀ» ÁÖÁö ¾Ê´Â´Ù.

 

K-TUPLE SIZE:  À̰ÍÀº »ç¿ëµÇ´Â ¼­¿­¿¡¼­ Á¤È®È÷ ÀÏÄ¡ÇÏ´Â ÀýÆíµéÀÇ Å©±âÀÌ´Ù. ºü¸¥ ¼Óµµ¸¦ ¿øÇϸé Áõ°¡½Ã۰í (max= 2 for proteins; 4 for DNA), Á¤¹Ðµµ¸¦ À§Çؼ­´Â °¨¼Ò½ÃŲ´Ù. ´õ ±ä ¼­¿­µé (e.g. >1000 residues)ÀÇ °æ¿ì¶ó¸é ¼³Á¤°ªÀ» Áõ°¡½Ã۱⸦ ¹Ù¶ö °ÍÀÌ´Ù.

 

TOP DIAGONALS: °¢ »ç¼± (in an imaginary dot-matrix plot)¿¡¼­ k-tuple matchµéÀÇ ¼ö°¡ °è»êµÈ´Ù. °¡Àå ÁÁÀº °Íµé (with most matches)¸¸ÀÌ ¹è¿­¿¡ »ç¿ëµÈ´Ù. ÀÌ parameter´Â °ªÀ» ÁöÁ¤ÇÑ´Ù. Decrease for speed; increase for sensitivity.

 

WINDOW SIZE:  À̰ÍÀº »ç¿ëµÉ 'best' »ç¼±µé °¢°¢ÀÇ ÁÖº¯¿¡ ÀÖ´Â »ç¼±µéÀÇ ¼öÀÌ´Ù. Decrease for speed; increase for sensitivity.

 

MULTIPLE ALIGNMENT PARAMETERS

        ÀÌ parameterµéÀº ÃÖÁ¾ multiple alignment¸¦ Á¶ÀýÇÑ´Ù. À̰ÍÀº ÀÌ ÇÁ·Î±×·¥ÀÇ ÇÙ½ÉÀÌ¸ç ±× ¼¼ºÎÀûÀÎ °ÍÀº º¹ÀâÇÏ´Ù. Parameterµé°ú scoring systemÀÇ »ç¿ëÀ» ¿ÏÀüÈ÷ ÀÌÇØÇϱâ À§Çؼ­´Â ¹®¼­¸¦ ÂüÁ¶Ç϶ó.

        ÃÖÁ¾ multiple alignmentÀÇ °¢ ´Ü°è´Â µÎ °³ÀÇ ¹è¿­À̳ª ¼­¿­µéÀ» ¹è¿­ÇÏ´Â °ÍÀ¸·Î ±¸»óµÈ´Ù. À̰ÍÀº GUIDE TREE¿¡ ÀÖ´Â branching order¸¦ µû¶ó ´Ü°èÀûÀ¸·Î ÁøÇàµÈ´Ù. À̰ÍÀ» Á¶ÀýÇÏ´Â ±âº»ÀûÀÎ parameterµéÀº ¿©·¯ °¡Áö identical/non-indentical Àܱâµé¿¡ ´ëÇÑ gap ¹úÁ¡µé°ú scoreµéÀÌ´Ù.

 

The GAP OPENING and EXTENSION PENALTIES ´Â ¿©±â¼­ Á¤ÇØÁú ¼ö ÀÖ´Ù. À̰͵éÀº ¸Å¹ø »õ·Î¿î gapÀ» ¿©´Âµ¥ µå´Â ºñ¿ë°ú gap¿¡¼­ÀÇ ¸ðµç Ç׸ñ¿¡ ´ëÇÑ ºñ¿ëÀ» Á¶ÀýÇÑ´Ù. Gap opening penalty¸¦ Áõ°¡½Ã۸é gapµéÀº Àû¾îÁö°í, gap extension penalty¸¦ Áõ°¡½Ã۸é gapµéÀº ª¾ÆÁø´Ù. ¸»´ÜÀÇ gapµéÀº ¹úÁ¡À» ¹ÞÁö ¾Ê´Â´Ù.

 

The DELAY DIVERGENT SEQUENCES switch´Â ¸Õ ¼­¿­µéÀÇ ¹è¿­À» °¡Àå °¡±õ°Ô ¿¬°üµÈ ¼­¿­µéÀÌ ¹è¿­µÉ ¶§±îÁö ´ÊÃá´Ù. Á¤ÇØÁø °ªÀº ¼­¿­ÀÇ Ã·°¡¸¦ ´ÊÃߴµ¥ ¿ä±¸µÇ´Â percent identity levelÀ» º¸¿©ÁØ´Ù; ´Ù¸¥ ¼­¿­µé¿¡ ÀÌ ¼öÁغ¸´Ù less identical ¼­¿­µéÀº ³ªÁß¿¡ ¹è¿­µÉ °ÍÀÌ´Ù.

 

The TRANSITION WEIGHT ´Â transitionµé (A<-->G or C<-->T i.e. purine-purine or pyrimidine-pyrimidine substitutions)¿¡ 0¿¡¼­ 1 »çÀÌÀÇ °¡ÁßÄ¡ (weight)¸¦ ÁØ´Ù; °¡ÁßÄ¡ 0Àº transitionµéÀÌ mismatchµé·Î ±â·ÏµÇ°í, ¹Ý¸é °¡ÁßÄ¡ 1Àº transitionµé¿¡ match score¸¦ ÁØ´Ù. ¿¬°üÀÌ ¸Õ DNA ¼­¿­µé¿¡ ´ëÇØ¼­ °¡ÁßÄ¡´Â 0¿¡ ±ÙÁ¢ÇÏ¿©¾ßÇÒ °ÍÀ̸ç; °¡±õ°Ô ¿¬°üµÈ ¼­¿­µé¿¡ ´ëÇØ¼­´Â º¸´Ù ³ôÀº score¸¦ ÁöÁ¤ÇÏ´Â °ÍÀÌ À¯¿ëÇÒ °ÍÀÌ´Ù. ¼³Á¤°ªÀº 0.5·Î Á¤ÇØÁ® ÀÖ´Ù.

 

The PROTEIN WEIGHT MATRIX optionÀº ÀÏ·ÃÀÇ weight matrixµéÀ» ¼±ÅÃÇϵµ·Ï ÇØÁØ´Ù. ´Ü¹éÁú alignmentµéÀ» À§Çؼ­´Â µ¿ÀÏÇÏÁö ¾ÊÀº ¾Æ¹Ì³ë»êµéÀÇ À¯»ç¼ºÀ» °áÁ¤Çϱâ À§ÇØ weight matrix¸¦ »ç¿ëÇÏ¿©¾ß¸¸ ÇÑ´Ù. ¿¹¸¦ µé¾î Phe¿Í ¹è¿­µÈ Tyr´Â Pro¿Í ¹è¿­µÈ °Íº¸´Ù 'better'¶ó°í ÆÇÁ¤µÈ´Ù.

        ¼¼°¡Áö 'in-built' series of weight matrixµéÀÌ Á¦°øµÈ´Ù. °¢°¢Àº ´Ù¸¥ ÁøÈ­ÀûÀÎ °Å¸®¿¡¼­´Â ´Ù¸£°Ô ÀÛµ¿ÇÏ´Â ¿©·¯°³ÀÇ matrixµé·Î ±¸¼ºµÇ¾î ÀÖ´Ù (To see the exact details, read the documentation). °£´ÜÈ÷ ¿ì¸®´Â memory¿¡ ¿©·¯ °¡Áö matrixµéÀ» ÀúÀåÇÑ´Ù. À̰͵éÀº full range of amino acid distance¿¡ °ÉÃÄÀÖ´Ù (from almost identical sequences to highly divergent ones). ¸Å¿ì À¯»çÇÑ ¼­¿­µé¿¡ ´ëÇØ¼­´Â µ¿ÀÏÇÑ °Íµé°ú °¡Àå ¿ì¼¼ÇÑ º¸Á¸ ġȯ¿¡ ³ôÀº Á¡¼ö¸¦ ÁÖ´Â strict weight matrix¸¦ »ç¿ëÇÏ´Â °ÍÀÌ ÃÖ¼±ÀÌ´Ù. º¸´Ù ºÐ±âµÈ ¼­¿­µé¿¡ ´ëÇØ¼­´Â ÀÚÁÖ ÀϾ´Â ġȯµé¿¡ ³ôÀº Á¡¼ö¸¦ ÁÖ´Â "softer" matrixµéÀ» »ç¿ëÇÏ´Â °ÍÀÌ Àû´çÇÏ´Ù.

 

        1) BLOSUM (Henikoff). ÀÌ matrices´Â data base similarity (homology searches)¸¦ ¼öÇàÇϴµ¥ ÀÌ¿ëµÉ ¼ö ÀÖ´Â °¡Àå ÁÁÀº °ÍÀ¸·Î º¸ÀδÙ. ÇöÀç »ç¿ëµÇ´Â matrices´Â Blosum 80, 62, 45 ±×¸®°í 30ÀÌ´Ù. BLOSUM Àº ¾Õ¼­ÀÇ Clustal X versionµé¿¡¼­´Â ¼³Á¤°ªÀ¸·Î »ç¿ëÇÏ¿´´Ù.

        2) PAM (Dayhoff). À̰͵éÀº 70³â´ë ÀÌ·¡ ¸Å¿ì ³Î¸® »ç¿ëµÇ¾î¿Ô´Ù. ¿©±â¼­ »ç¿ëµÇ´Â °ÍÀº PAM 20, 60, 120, 350 matricesÀÌ´Ù.

        3) GONNET. ÀÌ matrices´Â DayhoffÀÇ °Í (above)°ú °ÅÀÇ µ¿ÀÏÇÑ ¹æ½ÄÀ¸·Î À¯µµµÈ °ÍÀÌ´Ù. ±×·¯³ª ÈξÀ ÃÖ½ÅÀÌ°í º¸´Ù Å« data set¿¡ ±Ù°ÅÇϰí ÀÖ´Ù. À̰͵éÀº Dayhoff seriesº¸´Ù Á¤¹ÐÇØ º¸ÀδÙ. ¿©±â¼­ »ç¿ëÇÏ´Â °ÍÀº GONNET 80, 120, 160, 250±×¸®°í 350 matricesÀÌ´Ù. ÀÌ series´Â Clustal X version 1.8¿¡¼­ ¼³Á¤°ªÀÌ´Ù.

        ¿©±â¼­´Â ¶ÇÇÑ µÎ °³ÀÇ µ¿ÀÏÇÑ ¾Æ¹Ì³ë»êµé¿¡ 10Á¡À» ÁÖ°í ±×·¸Áö ¾ÊÀ¸¸é ¿µÁ¡À» ÁÖ´Â identity matrixµµ Á¦°øÇÑ´Ù. ÀÌ matrixÀº º°·Î À¯¿ëÇÏÁö ¾Ê´Ù.

        Load protein matrix: fileÀ» comparison matrix·Î ÀÐÀ» ¼ö ÀÖµµ·Ï ÇØÁØ´Ù. À̰ÍÀº single matrixÀ̰ųª a series of matricesÀÏ ¼öµµ ÀÖ´Ù(see below for format).

 

 

DNA WEIGHT MATRIX optionÀº Çٻ꼭¿­µéÀ» ¹è¿­Çϴµ¥ »ç¿ëµÇ´Â ÇϳªÀÇ matrix (not a series)¸¦ ¼±ÅÃÇÏ°Ô ÇÑ´Ù. µÎ °³ÀÇ hard-coded matrixµéÀÌ ÀÌ¿ëµÈ´Ù:

        1) IUB. À̰ÍÀº Çٻ꼭¿­µéÀÇ ºñ±³¸¦ À§ÇØ BESTFIT¿¡ ÀÇÇØ »ç¿ëµÇ´Â default scoring matrixÀÌ´Ù. X'µé°ú N'µéÀº ¾î¶² IUB ambiguity symbolµé°úµµ ÀÏÄ¡ÇÏ´Â °ÍÀ¸·Î Ãë±ÞµÈ´Ù. All matchµéÀº score 1.9; IUB symbolµé¿¡ ´ëÇÑ ¸ðµç mismatchµéÀº score 0.

        2) CLUSTALW(1.6). A previous system used by ClustalW, in which matches score 1.0 and mismatches score 0. All matches for IUB symbols also score 0.

        Load DNA matrix: file (just one matrix, not a series)À» ÇÙ»êºñ±³ matrix·Î ÀÐÀ» ¼ö ÀÖµµ·Ï ÇØÁØ´Ù.

 

 

SINGLE MATRIX INPUT FORMAT

        Single matrix¿¡ »ç¿ëµÇ´Â Çü½ÄÀº BLAST program°ú µ¿ÀÏÇÏ´Ù. »õ·Î¿î weight matrix¿¡¼­ÀÇ scoreµéÀº À¯»ç¼ºµéÀ̾î¾ß ÇÑ´Ù. ºñ·Ï matrix´Â ÀÚµ¿ÀûÀ¸·Î ¸ðµÎ positive scoresµé·Î Á¶Á¤µÇÁö¸¸ NEGATIVE MATRIX optionÀÌ ¼±ÅõÇÁö ¾Ê´Â ÇÑ ¾çÀÇ °ªÀº ¹°·ÐÀ̰í À½ÀÇ °ªµµ »ç¿ëÇÒ ¼ö ÀÖ´Ù. # ±âÈ£·Î ½ÃÀ۵Ǵ ¸ðµç lineµéÀº commentµé·Î ÃßÁ¤µÈ´Ù. ù ¹øÂ° non-comment lineÀº ¾î¶² ¼ø¼­ÀÌ´øÁö 1 letter code¸¦ »ç¿ëÇϰí * ±âÈ£°¡ µÚµû¸£´Â ¾Æ¹Ì³ë»êÀÇ ¸ñ·ÏÀ» Æ÷ÇÔÇÏ¿©¾ß ÇÑ´Ù. À̰Ϳ¡´Â °¢ ¾Æ¹Ì³ë»ê¿¡ ´ëÇØ ÇϳªÀÇ Çà°ú ÇϳªÀÇ ¿­À» °¡Áö´Â scoreµéÀÇ square matrix°¡ µÚµû¶ó¾ß ÇÑ´Ù. MatrixÀÇ ¸¶Áö¸· Çà°ú ¿­Àº (corresponding to the * character) Àüü matrix¿¡ °ÉÃÄ ÃÖ¼ÒÇÑÀÇ score¸¦ °¡Áø´Ù.

 

MATRIX SERIES INPUT FORMAT

        ClustalX´Â ¹è¿­µÉ ¼­¿­µéÀÇ mean percent identity¿¡ µû¶ó ´Ù¸¥ matrixµéÀ» »ç¿ëÇÑ´Ù. ÀÏ·ÃÀÇ matrix¿Í matrix series file¿¡¼­ °¢ matrix¿¡ ´ëÇÑ percent identityÀÇ ¹üÀ§¸¦ ÁöÁ¤ÇÒ ¼ö ÀÖ´Ù. ÀÌ fileÀº fileÀÇ ½ÃÀÛ¿¡ ÀÖ´Â CLUSTAL_SERIES¶ó´Â ´Ü¾î¿¡ ÀÇÇØ ÀÚµ¿ÀûÀ¸·Î ÀνĵȴÙ. ±×·¯¸é series¿¡ ÀÖ´Â °¢ matrix´Â MATRIX¶ó´Â ´Ü¾î·Î ½ÃÀ۵Ǵ ÇϳªÀÇ line¿¡ ÁöÁ¤µÈ´Ù. À̰ÍÀ» matrix¸¦ ÀÇ·ÚÇϱâ À§ÇÏ´Â sequence percent identitiesÀÇ lower and upper limits°¡ µÚµû¸¥´Ù.  Matrix line¿¡¼­ÀÇ ¸¶Áö¸· entry´Â Blast format matrix fileÀÇ ¸íĪÀÌ´Ù (see above for details of the single matrix file format).

 

Example.

 

CLUSTAL_SERIES

 

MATRIX 81 100 /us1/user/julie/matrices/blosum80

MATRIX 61 80 /us1/user/julie/matrices/blosum62

MATRIX 31 60 /us1/user/julie/matrices/blosum45

MATRIX 0 30 /us1/user/julie/matrices/blosum30

 

 

PROTEIN GAP PARAMETERS

 

RESIDUE SPECIFIC PENALTIES ´Â ¾Æ¹Ì³ë»ê¿¡ ƯÀÌÀûÀÎ gap penalties·Î¼­ alignment ³ª sequence ³»ÀÇ °¢ À§Ä¡¿¡¼­ gap opening penalties¸¦ °¨¼Ò½ÃŰ°Å³ª Áõ°¡½ÃŲ´Ù. See the documentation for details. ¿¹¸¦ µé¾î glycine ÀÌ Ç³ºÎÇÑ À§Ä¡µéÀº valineÀÌ ¸¹Àº À§Ä¡µé º¸´Ù ÀÌ¿ôÇÏ´Â gapÀ» °¡Áö´Â °æÇâÀÌ ÀÖ´Ù.

 

HYDROPHILIC GAP PENALTIES ´Â Ä£¼ö¼º ¾Æ¹Ì³ë»êµéÀÌ ¿¬ÀÌÀº °÷ (5 or more residues)¿¡¼­ gapÀÇ °¡´É¼ºÀ» Áõ°¡½Ã۴µ¥ »ç¿ëµÈ´Ù; À̵éÀº gapÀÌ ÈçÇÑ loop ³ª random coil ¿µ¿ªÀ» ÀÌ·ç´Â °æÇâÀÌ ÀÖ´Ù. Ä£¼ö¼ºÀ¸·Î ¿©°ÜÁö´Â ÀܱâµéÀº HYDROPHILIC RESIDUES·Î ³Ö¾îÁø´Ù.

 

GAP SEPARATION DISTANCE ´Â gapµéÀÌ ¼­·Î ³Ê¹« °¡±î¿öÁö´Â ±âȸ¸¦ ÁÙÀÌ·Á°í ³ë·ÂÇÑ´Ù. ÀÌ °Å¸®º¸´Ù °¡±õ°Ô ¶³¾îÁø gapµéÀº ´Ù¸¥ gapµéº¸´Ù ¸¹Àº ¹úÁ¡À» ¹Þ´Â´Ù. À̰ÍÀº °¡±î¿î gapµéÀ» ¸·Áö´Â ¾Ê´Â´Ù; ´Ù¸¸ ´ú ÀÚÁÖ ³ªÅ¸³ªµµ·Ï ÇÏ¿© alignment°¡  block ó·³ º¸À̵µ·Ï Á¶ÀåÇÑ´Ù.

 

END GAP SEPARATION ´Â ³Ê¹« °¡±î¿î gapµéÀ» ÇÇÇϱâ À§ÇÑ ¸ñÀûÀ¸·Î (set by GAP SEPARATION DISTANCE above) end gapµéÀ» internal gapµéó·³ Ãë±ÞÇÑ´Ù.  À̰ÍÀ» ²ô¸é end gapµéÀº ÀÌ ¸ñÀûÀ» À§ÇØ ¹«½ÃµÉ °ÍÀÌ´Ù. À̰ÍÀº end gapµéÀÌ »ý¹°ÇÐÀûÀ¸·Î Àǹ̰¡ ¾ø´Â fragmentµéÀ» ¹è¿­ÇϰíÀÚ ÇÒ ¶§ À¯¿ëÇÏ´Ù.