附录:生物信息学主要英文术语及释义,《生物信息学札记》

care 13
樊龙江 附录:生物信息学主要英文术语及释义 AbstractSyntaxNotation(ASN.l)(NCBI发展的许多程序,如显示蛋白质三维结构的Cn3D等所使用的内部格式)Alanguagethatisusedtodescribestructureddatatypesformally,Withinbioinformatits,ithasbeenusedbytheNationalCenterforBiotechnologyInformationtoencodesequences,maps,taxonomicinformation,molecularstructures,andbiographicalinformationinsuchawaythatitcanbeeasilyessedandexchangedputersoftware.essionnumber(记录号)AuniqueidentifierthatisassignedtoasingledatabaseentryforaDNAorproteinsequence.Affinegappenalty(一种设置空位罚分策略)Agappenaltyscorethatisalinearfunctionofgaplength,consistingofagapopeningpenaltyandagapextensionpenaltymultipliedbythelengthofthegap.Usingthispenaltyschemegreatlyenhancestheperformanceofdynamicprogrammingmethodsforsequencealignment.SeealsoGappenalty.Algorithm(算法)Asystematicprocedureforsolvingaprobleminafinitenumberofsteps,typicallyinvolvingarepetitionofoperations.Oncespecified,analgorithmcanbewritteninputerlanguageandrunasaprogram.Alignment(联配/比对/联配)Referstotheprocedureparingtwoormoresequencesbylookingforaseriesofindividualcharactersorcharacterpatternsthatareinthesameorderinthesequences.Ofthetwotypesofalignment,localandglobal,alocalalignmentisgenerallythemostuseful.SeealsoLocalandGlobalalignments.Alignmentscore(联配/比对/联配值)Anputedscorebasedonthenumberofmatches,substitutions,insertions,anddeletions(gaps)withinanalignment.ScoresformatchesandsubstitutionsArederivedfromascoringmatrixsuchastheBLOSUMandPAMmatricesforproteins,andaftinegappenaltiessuitableforthematrixarechosen.Alignmentscoresareinlogoddsunits,oftenbitunits(logtothebase2).Higherscoresdenotebetteralignments.SeealsoSimilarityscore,Distanceinsequenceanalysis.Alphabet(字母表)Thetotalnumberofsymbolsinasequence-4forDNAsequencesand20forproteinsequences.Annotation(注释)Thepredictionofgenesinagenome,includingthelocationofprotein-encodinggenes,thesequenceoftheencodedproteins,anysignificant 125 《生物信息学札记》樊龙江 matchestootherProteinsofknownfunction,andthelocationofRNA-encodinggenes.Predictionsarebasedongenemodels;e.g.,hiddenMarkovmodelsofintronsandexonsinproteinsencodinggenes,andmodelsofsecondarystructureinRNA.AnonymousFTP(匿名FTP)WhenaFTPserviceallowsanyonetologin,itissaidtoprovideanonymousFTPser-vice.AusercanlogintoananonymousFTPserverbytypinganonymousastheusernameandhisE-mailaddressasapassword.MostWebbrowsersnownegotiateanonymousFTPlogonwithoutaskingtheuserforausernameandpassword.SeealsoFTP.ASCIITheAmericanStandardCodeforInformationInterchange(ASCII)encodesentedlettersa-z,A-Z,thenumbersO-9,mostpunctuationmarks,space,andasetofcontrolcharacterssuchascarriagereturnandtab.ASCIIspecifies128charactersthataremappedtothevaluesO-127.ASCIItilesmonlycalledplaintext,meaningthattheyonlyencodetextwithoutextramarkup.BACclone(细菌人工染色体克隆)BacterialartificialchromosomevectorcarryingagenomicDNAinsert,typically100–200kb.Mostofthelarge-insertclonessequencedintheprojectwereBACclones.Back-propagation(反向传输)Whentrainingfeed-forwardworks,aback-propagationalgorithmcanbeusedtomodifyworkweights.Aftereachtraininginputpatternisfedthroughwork,work’soutputparedwiththedesiredoutputandtheamountoferroriscalculated.Thiserrorisback-propagatedthroughworkbyusinganerrorfunctiontocorrectworkweights.SeealsoFeed-forwardwork.Baum-Welchalgorithm(Baum-Welch算法)AnexpectationmaximizationalgorithmthatisusedtotrainhiddenMarkovmodels.Baye’srule(贝叶斯法则)Formsthebasisofconditionalprobabilitybycalculatingthelikelihoodofaneventurringbasedonthehistoryoftheeventandrelevantbackgroundinformation.IntermsoftwoparametersAandB,thetheoremisstatedinanequation:Thecondition-alprobabilityofA,givenB,P(AIB),isequaltotheprobabilityofA,P(A),timestheconditionalprobabilityofB,givenA,P(BIA),dividedbytheprobabilityofB,P(B).P(A)isthehistoricalorpriordistributionvalueofA,P(BIA)isanewpredictionforBforaparticularvalueofA,andP(B)isthesumofthenewlypredictedvaluesforB.P(AIB)isaposteriorprobability,representinganewpredictionforAgiventhepriorknowledgeofAandthenewlydiscoveredrelationshipsbetweenAandB.Bayesiananalysis(贝叶斯分析)Astatisticalprocedureusedtoestimateparametersofanunderlying 126 《生物信息学札记》樊龙江 distributionbasedonanobserveddistribution.SeealsoBaye’srule.Biochips(生物芯片)Miniaturizedarraysoflargenumbersofmolecularsubstrates,oftenoligonucleotides,inadefinedpattern.TheyarealsocalledDNAmicroarraysandmicrochips.Bioinformatics(生物信息学)Themergerofbiotechnologyandinformationtechnologywiththegoalofrevealingnewinsightsandprinciplesinbiology./Thedisciplineofobtaininginformationaboutgenomicorproteinsequencedata.Thismayinvolvesimilaritysearchesofdatabases,paringyourunidentifiedsequencetothesequencesinadatabase,ormakingpredictionsaboutthesequencebasedoncurrentknowledgeofsimilarsequences.Databasesarefrequentlymadepublicallyavailablethroughthe,orlocallyatyourinstitution.Bitscore(二进制值/Bit值)ThevalueS'isderivedfromtherawalignmentscoreSinwhichthestatisticalpropertiesofthescoringsystemusedhavebeentakenintoount.Becausebitscoreshavebeennormalizedwithrespecttothescoringsystem,theycanbeusedparealignmentscoresfromdifferentsearches.BitunitsFrominformationtheory,abitdenotestheamountofinformationrequiredtodistinguishbetweentwoequallylikelypossibilities.Thenumberofbitsofinformation,AJ,requiredtoconveyamessagethathasA4possibilitiesislog2M=Nbits.BLAST(基本局部联配搜索工具,一种主要数据库搜索程序)BasicLocalAlignmentSearchTool.Asetofprograms,usedtoperformfastsimilaritysearches.NucleotidesequencescanparedwithnucleotidesequencesinadatabaseusingBLASTN,forexample.Complexstatisticsareappliedtojudgethesignificanceofeachmatch.Reportedsequencesmaybehomologousto,orrelatedtothequerysequence.TheBLASTPprogramisusedtosearchaproteindatabaseforamatchagainstaqueryproteinsequence.ThereareseveralotherflavoursofBLAST.BLAST2isanewerreleaseofBLAST.Allowsforinsertionsordeletionsinthesequencesbeingaligned.Gappedalignmentsmaybemorebiologicallysignificant.Block(蛋白质家族中保守区域的组块)Conservedungappedpatternsapproximately3-60aminoacidsinlengthinasetofrelatedproteins.BLOSUMmatrices(模块替换矩阵,一种主要替换矩阵)AnalternativetoPAMtables,BLOSUMtableswerederivedusinglocalmultiplealignmentsofmoredistantlyrelatedsequencesthanwereusedforthePAMmatrix.Theseareusedtoassessthesimilarityofsequenceswhenperformingalignments.Boltzmanndistribution(Boltzmann分布)Describesthenumberofmoleculesthathaveenergiesaboveacertainlevel,basedontheBoltzmanngasconstantandtheabsolutetemperature. 127 《生物信息学札记》樊龙江 Boltzmannprobabilityfunction(Boltzmann概率函数)SeeBoltzmanndistribution.BootstrapanalysisAmethodfortestinghowwellaparticulardatasetfitsamodel.Forexample,thevalidityofthebrancharrangementinapredictedictreecanbetestedbyresamplingcolumnsinamultiplesequencealignmenttocreatemanynewalignments.Theappearanceofaparticularbranchintreesgeneratedfromtheseresampledsequencescanthenbemeasured.Alternatively,asequencemaybeleftoutofananalysistodeter-minehowmuchthesequenceinfluencestheresultsofananalysis.Branchlength(分支长度)Insequenceanalysis,thenumberofsequencechangesalongaparticularbranchofaictree.CDSorcds(编码序列)Codingsequence.Chebyshe,dinequalityTheprobabilitythatarandomvariableexceedsitsmeanislessthanorequaltothesquareof1overthenumberofstandarddeviationsfromthemean.Clone(克隆)Populationofidenticalcellsormolecules(e.g.DNA),derivedfromasingleancestor.CloningVector(克隆载体)Amoleculethatcarriesaforeigngeneintoahost,andallows/facilitatesthemultiplicationofthatgeneinahost.Whensequencingagenethathasbeenclonedusingacloningvector(ratherthanbyPCR),careshouldbetakennottoincludethecloningvectorsequencewhenperformingsimilaritysearches.Plasmids,cosmids,phagemids,YACsandPACsareexampletypesofcloningvectors.Clusteranalysis(聚类分析)Amethodforgroupingtogetherasetofobjectsthataremostsimilarfromalargergroupofrelatedobjects.Therelationshipsarebasedonsomecriterionofsimilarityordifference.Forsequences,asimilarityordistancescoreorastatisticalevaluationofthosescoresisused.CobblerAsinglesequencethatrepresentsthemostconservedregionsinamultiplesequencealignment.TheBLOCKSserverusesthecobblersequencetoperformadatabasesimilaritysearchasawaytoreachsequencesthataremoredivergentthanwouldbefoundusingthesinglesequencesinthealignmentforsearches.Codingsystem(works)Regardingworks,acodingsystemneedstobedesignedforrepresentinginputandoutput.Thelevelofessfoundwhentrainingthemodelwillbepartiallydependentonthequalityofthecodingsystemchosen.Codonusage 128 《生物信息学札记》樊龙江 Analysisofthecodonsusedinaparticulargeneanism.COG(直系同源簇)Clustersoforthologousgroupsinasetofgroupsofrelatedsequencesinanismandyeast(S.cerevisiae).Thesegroupsarefoundbywholeparisonsandincludeorthologsandparalogs.SeealsoOrthologsandParalogs.Comparativegenomics(比较基因组学)parisonofgenenumbers,genelocations,andbiologicalfunctionsofgenesinthegenomesofanisms,oneobjectivebeingtoidentifygroupsofgenesthatplayauniquebiologicalroleinaanism.Complexity(ofanalgorithm)(算法的复杂性)Describesthenumberofstepsrequiredbythealgorithmtosolveaproblemasafunctionoftheamountofdata;forexample,thelengthofsequencestobealigned.Conditionalprobability(条件概率)Theprobabilityofaparticularresult(orofaparticularvalueofavariable)givenoneormoreeventsorconditions(orvaluesofothervariables).Conservation(保守)Changesataspecificpositionofanaminoacidor(monly,DNA)sequencethatpreservethephysico-chemicalpropertiesoftheoriginalresidue.Consensus(一致序列)Asinglesequencethatrepresents,ateachsubsequentposition,thevariationfoundwithincorrespondingcolumnsofamultiplesequencealignment.Context-freegrammarsArecursivesetofproductionrulesforgeneratingpatternsofstrings.Theseconsistofasetofterminalcharactersthatareusedtocreatestrings,asetofnonterminalsymbolsthatcorrespondtorulesandactasplaceholdersforpatternsthatcanbegeneratedusingterminalcharacters,asetofrulesforreplacingnonterminalsymbolswithterminalcharacters,andastartsymbol.Contig(序列重叠群/拼接序列)Asetofclonesthatcanbeassembledintoalinearorder.ADNAsequencethatoverlapswithanothercontig.Thefullsetofoverlappingsequences(contigs)canbeputtogethertoobtainthesequenceforalongregionofDNAthatcannotbesequencedinoneruninasequencingassay.Importantinicmappingatthemolecularlevel.CORBA(国际对象管理协作组制定的使OOP对象与网络接口统一起来的一套跨计算机、操作系统、程序语言和网络的共同标准) TheCommonObjectRequestBrokerArchitecture(CORBA)isanopenindustrystandardforworkingwithdistributedobjects,developedbytheObjectManagementGroup.CORBAallowstheinterconnectionofobjectsandapplicationsregardlessputerlanguage,machinearchitecture,orgeographiclocationofputers.Correlationcoefficient(相关系数) 129 《生物信息学札记》樊龙江 Anumericalmeasure,fallingbetween-1and1,ofthedegreeofthelinearrelationshipbetweentwovariables.Apositivevalueindicatesadirectrelationship,anegativevalueindicatesaninverserelationship,andthedistanceofthevalueawayfromzeroindicatesthestrengthoftherelationship.Avaluenearzeroindicatesnorelationshipbetweenthevariables.Covariation(insequences)(共变)CoincidentchangeattwoormoresequencepositionsinrelatedsequencesthatmayinfluencethesecondarystructuresofRNAorproteinmolecules.Coverage(ordepth)(覆盖率/厚度)Theaveragenumberoftimesanucleotideisrepresentedbyahigh-qualitybaseinacollectionofrandomrawsequence.Operationally,a'high-qualitybase'isdefinedasonewithanuracyofatleast99%(correspondingtoaPHREDscoreofatleast20).Database(数据库)puterizedstorehouseofdatathatprovidesastandardizedwayforlocating,adding,removing,andchangingdata.SeealsoObject-orienteddatabase,Relationaldatabase.DendogramAformofatreethatlistsparedobjects(e.g.,sequencesorgenesinamicroarrayanalysis)inaverticalorderandjoinsrelatedonesbylevelsofbranchesextendingtoonesideofthelist.Depth(厚度)SeecoverageDirichletmixturesDefinedastheconjugationalpriorofamultinomialdistribution.Oneuseisforpredictingtheexpectedpatternofaminoacidvariationfoundinthematchstateofahid-denMarkovmodel(representingonecolumnofamultiplesequencealignmentofproteins),basedonpriordistributionsfoundinconservedproteindomains(blocks).Distanceinsequenceanalysis(序列距离)Thenumberofobservedchangesinanoptimalalignmentoftwosequences,usuallynotcountinggaps.DNASequencing(DNA测序)TheexperimentalprocessofdeterminingthenucleotidesequenceofaregionofDNA.Thisisdonebylabellingeachnucleotide(
A,C,GorT)witheitheraradioactiveorfluorescentmarkerwhichidentifiesit.Thereareseveralmethodsofapplyingthistechnology,eachwiththeiradvantagesanddisadvantages.Formoreinformation,refertoacurrenttextbook.Highthroughputlaboratoriesfrequentlyuseautomatedsequencers,whicharecapableofrapidlyreadinglargenumbersoftemplates.Sometimes,thesequencesmaybegeneratedmorequicklythantheycanbecharacterised.Domain(功能域)Adiscreteportionofaproteinassumedtofoldindependentlyoftherestoftheproteinandpossessingitsownfunction. 130 《生物信息学札记》樊龙江 Dotmatrix(点标矩阵图)Dotmatrixdiagramsprovideagraphicalmethodparingtwosequences.Onesequenceiswrittenhorizontallyacrossofthegraphandtheotheralongtheleft-handside.Dotsareplacedwithinthegraphattheintersectionofthesameletterappearinginbothsequences.Aseriesofdiagonallinesinthegraphindicateregionsofalignment.Thematrixmaybefilteredtorevealthemost-alikeregionsbyscoringaminimalthresholdnumberofmatcheswithinasequencewindow.Draftgenomesequence(基因组序列草图)Thesequenceproducedbiningtheinformationfromtheindividualsequencedclones(bycreatingmergedsequencecontigsandthenemployinglinkinginformationtocreatescaffolds)andpositioningthesequencealongthephysicalmapofthechromosomes.DUST(一种低复杂性区段过滤程序)Aprogramforfilteringplexityregionsfromnucleicacidsequences.Dynamicprogramming(动态规划法)Adynamicprogrammingalgorithmsolvesaproblembiningsolutionstosub-problemsthatputedonceandsavedinatableormatrix.Dynamicprogrammingistypicallyusedwhenaproblemhasmanypossiblesolutionsandanoptimaloneneedstobefound.Thisalgorithmisusedforproducingsequencealignments,givenascoringsystemforparisons.EMBL(欧洲分子生物学实验室,EMBL数据库是主要公共核酸序列数据库之一) EuropeanMolecularBiologyLaboratories.MaintaintheEMBLdatabase,oneofthemajorpublicsequencedatabases.(欧洲分子生物学网络)EuropeanMolecularBiologyNetwork:/wasestablishedin1988,andprovidesservicesincludinglocalmoleculardatabasesandsoftwareformolecularbiologistsinEurope.Thereareseverallargeoutpostsof,includingEXPASY.Entropy(熵)Frominformationtheory,ameasureoftheunpredictablenatureofasetofpossibleelements.Thehigherthelevelofvariationwithintheset,thehighertheentropy.ErdosandRenyilawInatossofa“fair”coin,thenumberofheadsinarowthatcanbeexpectedisthelogarithmofthenumberoftossestothebase2.Thelawmaybegeneralizedformorethantwopossibleesbychangingthebaseofthelogarithmtothenumberofes.Thislawwasusedtoanalyzethenumberofmatchesandmismatchesthatcanbeexpectedbetweenrandomsequencesasabasisforscoringthestatisticalsignificanceofasequencealignment.EST(表达序列标签的缩写) 131 《生物信息学札记》樊龙江 SeeExpressedSequenceTagExpectvalue(E)(E值)Evalue.ThenumberofdifferentalignentswithscoresequivalenttoorbetterthanSthatareexpectedtourinadatabasesearchbychance.ThelowertheEvalue,themoresignificantthescore.Inadatabasesimilaritysearch,theprobabilitythatanalignmentscoreasgoodastheonefoundbetweenaquerysequenceandadatabasesequencewouldbefoundinasparisonsbetweenrandomsequencesaswasdonetofindthematchingsequence.Inothertypesofsequenceanalysis,Ehasasimilarmeaning.Expectationmaximization(sequenceanalysis)Analgorithmforlocatingsimilarsequencepatternsinasetofsequences.Aguessedalignmentofthesequencesisfirstusedtogenerateanexpectedscoringmatrixrepresentingthedistributionofsequencecharactersineachcolumnofthealignment,thispatternismatchedtoeachsequence,andthescoringmatrixvaluesarethenupdatedtomaximizethealignmentofthematrixtothesequences.Theprocedureisrepeateduntilthereisnofurtherimprovement.Exon(外显子)CodingregionofDNA.SeeCDS.ExpressedSequenceTag(EST)(表达序列标签)Randomlyselected,partialcDNAsequence;representsit'scorrespondingmRNA.dbESTisalargedatabaseofESTsatGenBank,NCBI.FASTA(一种主要数据库搜索程序)Thefirstwidelyusedalgorithmfordatabasesimilaritysearching.Theprogramlooksforoptimallocalalignmentsbyscanningthesequenceforsmallmatchescalled"words".Initially,thescoresofsegmentsinwhichtherearemultiplewordhitsarecalculated("init1").Laterthescoresofseveralsegmentsmaybesummedtogeneratean"initn"score.Anoptimizedalignmentthatincludesgapsisshownintheoutputas"opt".Thesensitivityandspeedofthesearchareinverselyrelatedandcontrolledbythe"k-tup"variablewhichspecifiesthesizeofa"word".(PearsonandLipman)Extremevaluedistribution(极值分布)Somemeasurementsarefoundtofollowadistributionthathasalongtailwhichdecaysathighvaluesmuchmoreslowlythanthatfoundinanormaldistribution.Thisslow-fallingtypeiscalledtheextremevaluedistribution.Thealignmentscoresbetweenunrelatedorrandomsequencesareanexample.Thesescorescanreachveryhighvalues,particularlywhenalargenumberparisonsaremade,asinadatabasesimilaritysearch.Theprobabilityofaparticularscoremaybeuratelypredictedbytheextremevaluedistribution,whichfollowsadoublenegativeexponentialfunctionafterGumbel.Falsenegative(假阴性)Anegativedatapointcollectedinadatasetthatwasincorrectlyreportedduetoafailureofthetestinavoidingnegativeresults. 132 《生物信息学札记》樊龙江 Falsepositive(假阳性)Apositivedatapointcollectedinadatasetthatwasincorrectlyreportedduetoafailureofthetest.Ifthetesthadcorrectlymeasuredthedatapoint,thedatawouldhavebeenrecordedasnegative.Feed-forwardwork(反向传输神经网络)Organizesnodesintosequencelayersinwhichthenodesineachlayerarefullyconnectedwiththenodesinthenextlayer,exceptforthefinaloutputlayer.Inputisfedfromtheinputlayerthroughthelayersinsequenceina“feed-forward”direction,resultinginoutputatthefinallayer.Seealsowork.Filtering(windowsize)Duringpair-wisesequencealignmentusingthedotmatrixmethod,randommatchescanbefilteredoutbyusingaslidingwindowparethetwosequences.Ratherparingasinglesequencepositionatatime,awindowofadjacentpositionsinthetwosequencesparedandadot,indicatingamatch,isgeneratedonlyifacertainminimalnumberofmatchesur.Filtering(过滤)AlsoknownasMasking.Theprocessofhidingregionsof(nucleicacidoraminoacid)sequencehavingcharacteristicsthatfrequentlyleadtospurioushighscores.SeeSEGandDUST.Finishedsequence(完成序列)Completesequenceofacloneorgenome,withanuracyofatleast99.99%andnogaps.FourieranalysisStudiestheapproximationsandpositionoffunctionsusingtrigonometricpolynomials.Format(file)(格式)Differentprogramsrequirethatinformationbespecifiedtotheminaformalmanner,usingparticularkeywordsandordering.Thisspecificationisafileformat.Forward-backwardalgorithmUsedtotrainahiddenMarkovmodelbyaligningthemodelwithtrainingsequences.Thealgorithmthenrefinesthemodeltoreducetheerrorwhenfittedtothegivendatausingagradientdescentapproach.FTP(FileTransferProtocol)(文件传输协议)AllowsapersontotransferfilesfromputertoanotheracrossworkusinganFTP-capableclientprogram.TheFTPclientprogramcanmunicatewithmachinesthatrunanFTPserver.Theserver,inturn,willmakeaspecificportionofitstilesystemavailableforFTPess,providingthattheclientisabletosupplyarecognizedusernameandpasswordtotheserver.Fullshotgunclone(鸟枪法克隆)Alarge-insertcloneforwhichfullshotgunsequencehasbeenproduced. 133 《生物信息学札记》樊龙江 Functionalgenomics(功能基因组学)Assessmentofthefunctionofgenesidentifiedbyparisons.Thefunctionofanewlyidentifiedgeneistestedbyintroducingmutationsintothegeneandthenexaminingtheresultantanismforanalteredphenotype.gap(空位/间隙/缺口)Aspaceintroducedintoanalignmentpensateforinsertionsanddeletionsinonesequencerelativetoanother.Topreventtheumulationoftoomanygapsinanalignment,introductionofagapcausesthedeductionofafixedamount(thegapscore)fromthealignmentscore.Extensionofthegaptopassadditionalnucleotidesoraminoacidisalsopenalizedinthescoringofanalignment.Gappenalty(空位罚分)Anumericscoreusedinsequencealignmentprogramstopenalizethepresenceofgapswithinanalignment.Thevalueofagappenaltyaffectshowoftengapsappearinalignmentsproducedbythealgorithm.Mostalignmentprogramssuggestgappenaltiesthatareappropriateforparticularscoringmatrices.icalgorithm(遗传算法)Akindofsearchalgorithmthatwasinspiredbytheprinciplesofevolution.Apopulationofinitialsolutionsisencodedandthealgorithmsearchesthroughthesebyapplyingapre-definedfitnessmeasurementtoeachsolution,selectingthosewiththehighestfitnessforreproduction.Newsolutionscanbegeneratedduringthisphasebycrossoverandmutationoperations,definedintheencodedsolutions.icmap(遗传图谱)Agenomemapinwhichpolymorphiclociarepositionedrelativetooneanotheronthebasisofthefrequencywithwhichtheybineduringmeiosis.Theunitofdistanceisans(cM),denotinga1%chanceofbination.Genome(基因组)Theicmaterialofanism,containedinonehaploidsetofchromosomes.GibbssamplingmethodAnalgorithmforfindingconservedpatternswithinasetofrelatedsequences.Aguessedalignmentofallbutonesequenceismadeandusedtogenerateascoringmatrixthatrepresentsthealignment.Thematrixisthenmatchedtotheleft-outsequence,andaprobablelocationofthecorrespondingpatternisfound.Thispredictionistheninputintoanewalignmentandanotherscoringmatrixisproducedandtestedonanewleft-outsequence.Theprocessisrepeateduntilthereisnofurtherimprovementinthematrix.Globalalignment(整体联配)Attemptstomatchasmanycharactersaspossible,fromendtoend,inasetoftwoor 134 《生物信息学札记》樊龙江 moresequences.Gopher(一个文档发布系统,允许检索和显示文本文件)Graphtheory(图论)Abranchofmathematicswhichdealswithproblemsthatinvolveagraphworkstructure.Agraphisdefinedbyasetofnodes(orpoints)andasetofarcs(linesoredges)joiningthenodes.Insequenceandgenomeanalysis,graphtheoryisusedforsequencealignmentsandclusteringalikegenes.GSS(基因综述序列)Genomesurveysequence.GUI(图形用户界面)Graphicaluserinterface.H(相对熵值)Histherelativeentropyofthetargetandbackgroundresiduefrequencies.(KarlinandAltschul,1990).Hcanbethoughtofasameasureoftheaverageinformation(inbits)availableperpositionthatdistinguishesanalignmentfromchance.AthighvaluesofH,shortalignmentscanbedistinguishedbychance,whereasatlowerHvalues,alongeralignmentmaybenecessary.(Altschul,1991)Half-bitsSomescoringmatricesareinhalf-bitunits.Theseunitsarelogarithmstothebase2ofoddsscorestimes2.Heuristic(启发式方法)Aprocedurethatprogressesalongempiricallinesbyusingrulesofthumbtoreachasolution.Thesolutionisnotguaranteedtobeoptimal.Hexadecimalsystem(16制系统)Thebase16countingsystemthatusesthedigitsO-9followedbythelettersA-
F.HGMP(人类基因组图谱计划)HumanGenomeMappingProject.HiddenMarkovModel(HMM)(隐马尔可夫模型)Insequenceanalysis,aHMMisusuallyaprobabilisticmodelofamultiplesequencealignment,butcanalsobeamodelofperiodicpatternsinasinglesequence,representing,forexample,patternsfoundintheexonsofagene.Inamodelofmultiplesequencealignments,eachcolumnofsymbolsinthealignmentisrepresentedbyafrequencydistributionofthesymbolscalledastate,andinsertionsanddeletionsbyotherstates.Onethenmovesthroughthemodelalongaparticularpathfromstatetostatetryingtomatchagivensequence.Thenextmatchingsymbolischosenfromeachstate,recordingitsprobability(frequency)andalsotheprobabilityofgoingtothatparticularstatefromapreviousone(thetransitionprobability).Stateandtransitionprobabilitiesarethenmultipliedtoobtainaprobabilityofthegivensequence.Generallyspeaking,aHMMisastatisticalmodelforanorderedsequenceofsymbols,actingasastochasticstatemachinethatgeneratesasymboleachtimeatransitionismadefromonestatetothenext.Transitionsbetween 135 《生物信息学札记》樊龙江 statesarespecifiedbytransitionprobabilities.Hiddenlayer(隐藏层)Aninnerlayerwithinaworkthatreceivesitsinputandsendsitsoutputtootherlayerswithinwork.Onefunctionofthehiddenlayeristodetectcovariationwithintheinputdata,suchaspatternsofaminoacidcovariationthatareassociatedwithaparticulartypeofsecondarystructureinproteins.Hierarchicalclustering(分级聚类)Theclusteringorgroupingofobjectsbasedonsomesinglecriterionofsimilarityordifference.Anexampleistheclusteringofgenesinamicroarrayexperimentbasedonthecorrelationbetweentheirexpressionpatterns.Thedistancemethodusedinicanalysisisanotherexample.HillclimbingAnonoptimalsearchalgorithmthatselectsthesingularbestpossiblesolutionatagivenstateorstep.Thesolutionmayresultinalocallybestsolutionthatisnotagloballybestsolution.Homology(同源性)Aponentinanisms(e.g.,geneswithstronglysimilarsequences)thatcanbeattributedtomonancestoroftheanismsduringevolution.Horizontaltransfer(水平转移)Thetransferoficmaterialbetweentwodistinctspeciesthatdonotordinarilyexchangeicmaterial.ThetransferredDNAesestablishedintherecipientgenomeandcanbedetectedbyanovelichistoryandcodon-paredtotherestofthegenome.HSP(高比值片段对)High-scoringsegmentpair.Localalignmentswithnogapsthatachieveoneofalignmentscoresinagivensearch.HTGS/HGT(高通量基因组序列)High-throughoutgenomesequencesHTML(超文本标识语言)TheHyper-TextMarkupLanguage(HTML)providesastructuraldescriptionofadocumentusingaspecifiedtagset.HTMLcurrentlyservesasthelinguafrancafordescribinghypertextWebpagedocuments.HyperplaneAgeneralizationofthetwo-dimensionalplanetoNdimensions.HypercubeAgeneralizationofthethree-dimensionalcubetoNdimensions.Identity(相同性/相同率)Theextenttowhichtwo(nucleotideoraminoacid)sequencesareinvariant.Indel(插入或删除的缩略语)Aninsertionordeletioninasequencealignment.Informationcontent(ofascoringmatrix)Arepresentationofthedegreeofsequenceconservationinacolumnofa 136 《生物信息学札记》樊龙江 scoringmatrixrepresentinganalignmentofrelatedsequences.Itisalsothenumberofquestionsthatmustbeaskedtomatchthecolumntoapositioninatestsequence.Forbases,themax-imumpossiblenumberis2,andforproteins,4.32(logarithmtothebase2ofthenumberofpossiblesequencecharacters).Informationtheory(信息理论)Abranchofmathematicsthatmeasuresinformationintermsofbits,theminimalamountofplexityneededtoencodeagivenpieceofinformation.Inputlayer(输入层)Theinitiallayerinafeed-forward.Thislayerencodesinputinformationthatwillbefedthroughworkmodel.InterfacedefinitionlanguageUsedtodefineaninterfacetoanobjectmodelinaprogramminglanguageneutralform,whereaninterfaceisanabstractionofaservicedefinedonlybytheoperationsthatcanbeperformedonit.(因特网)workinfrastructure,consistingofcablesinterconnectedbyrouters,thatpro-videsglobalconnectivityforputersandworksputers.Asecondsenseofthewordistheputerresourcesavailableoverthiswork.InterpolatedMarkovmodelAtypeofMarkovmodelofsequencesthatexaminessequencesforpatternsofvariablelengthinordertodiscriminatebestbetweengenesandnon-genesequences.(内部网)Intron(内含子)Non-codingregionofDNA.Iterative(反复的/迭代的)Asequenceofoperationsinaprocedurethatisperformedrepeatedly.Java(一种由SUNMicrosystem开发的编程语言)K(BLAST程序的一个统计参数)AstatisticalparameterusedincalculatingBLASTscoresthatcanbethoughtofasanaturalscaleforsearchspacesize.ThevalueKisusedinconvertingarawscore(S)toabitscore(S').K-tuple(字/字长)Identicalshortstretchesofsequences,alsocalledwords.lambda(λ,BLAST程序的一个统计参数)AstatisticalparameterusedincalculatingBLASTscoresthatcanbethoughtofasanaturalscaleforscoringsystem.Thevaluelambdaisusedinconvertingarawscore(S)toabitscore(S').LAN(局域网)Localwork.Likelihood(似然性) 137 《生物信息学札记》樊龙江 Thehypotheticalprobabilitythataneventwhichhasalreadyurredwouldyieldaspecifice.Unlikeprobability,whichreferstofutureevents,likelihoodreferstopastevents.LineardiscriminantanalysisAnanalysisinwhichastraightlineislocatedonagraphbetweentwosetsofdatapointsinalocationthatbestseparatesthedatapointsintotwogroups.Localalignment(局部联配)Attemptstoalignregionsofsequenceswiththehighestdensityofmatches.Indoingso,oneormoreislandsofsubalignmentsarecreatedinthealignedsequences.Logoddsscore(概率对数值)Thelogarithmofanoddsscore.SeealsoOddsscore.LowComplexityRegion(LCR)(低复杂性区段)Regionsofpositionincludinghomopolymericruns,short-periodrepeats,andmoresubtleoverrepresentationofoneorafewresidues.TheSEGprogramisusedtomaskorfilterLCRsinaminoacidqueries.TheDUSTprogramisusedtomaskorfilterLCRsinnucleicacidqueries.Machinelearning(机器学习)Thetrainingofputationalmodelofaprocessorclassificationschemetodistinguishbetweenalternativepossibilities.Markovchain(马尔可夫链)Describesaprocessthatcanbeinoneofanumberofstatesatanygiventime.TheMarkovchainisdefinedbyprobabilitiesforeachtransitionurring;thatis,probabilitiesoftheurrenceofstatesjgiventhatthecurrentstateisspSubstitutionsinnucleicacidandproteinsequencesaregenerallyassumedtofollowaMarkovchaininthateachsitechangesindependentlyoftheprevioushistoryofthesite.Withthismodel,thenumberandtypesofsubstitutionsobservedoverarelativelyshortperiodofevolutionarytimecanbeextrapolatedtolongerperiodsoftime.Inperformingsequencealignmentsandcalculatingthestatisticalsignificanceofalignmentscores,sequencesareassumedtobeMarkovchainsinwhichthechoiceofonesequencepositionisnotinfluencedbyanother.Masking(过滤)AlsoknownasFiltering.Theremovalofrepeatedorplexityregionsfromasequenceinordertoimprovethesensitivityofsequencesimilaritysearchesperformedwiththatsequence.Maximumlikelihood(phylogeny,alignment)(最大似然法)Themostlikelye(treeoralignment),givenaprobabilisticmodelofevolutionarychangeinDNAsequences.Maximumparsimony(最大简约法)Theminimumnumberofevolutionarystepsrequiredtogeneratetheobservedvariationinasetofsequences,asfoundparisonofthenumberofstepsinallpossibleictrees.Methodofmoments 138 《生物信息学札记》樊龙江 Themeanorexpectedvalueofavariableisthefirstmomentofthevaluesofthevariablearoundthemean,definedasthatnumberfromwhichthesumofdeviationstoallvaluesiszero.Thestandarddeviationisthesecondmomentofthevaluesaboutthemean,andsoon.MinimumspanningtreeGivenasetofrelatedobjectsclassifiedbysomesimilarityordifferencescore,themini-mumspanningtreejoinsthemost-alikeobjectsonadjacentouterbranchesofatreeandthensequentiallyjoinsless-alikeobjectsbymoreinwardbranches.Thetreebranchlengthsarecalculatedbythesameneighbor-joiningalgorithmthatisusedtobuildictreesofsequencesfromadistancematrix.Thesumoftheresultingbranchlengthsbetweeneachpairofobjectswillbeapproximatelythatfoundbytheclassificationscheme.MMDB(分子建模数据库)MolecularModellingDatabase.AtaxonomyassigneddatabaseofPDB(seePDB)files,andrelatedinformation.Molecularclockhypothesis(分子钟假设)Thehypothesisthatsequenceschangeatthesamerateinthebranchesofanevolutionarytree.MonteCarlo(蒙特卡罗法)Amethodthatsamplespossiblesolutionstoplexproblemasawaytoestimateamoregeneralsolution.Motif(模序)Ashortconservedregioninaproteinsequence.Motifsarefrequentlyhighlyconservedpartsofdomains.MultipleSequenceAlignment(多序列联配)Analignmentofthreeormoresequenceswithgapsinsertedinthesequencessuchthatresiduesmonstructuralpositionsand/orancestralresiduesarealignedinthesamecolumn.ClustalWisoneofthemostwidelyusedmultiplesequencealignmentprogramsMutationdatamatrix(突变数据矩阵,即PAM矩阵)Ascoringpiledfromtheobservationofpointmutationsbetweenalignedsequences.AlsoreferstoaDayhoffPAMmatrixinwhichthescoresaregivenaslogoddsscores.N50length(N50长度,即覆盖50%所有核苷酸的最大序列重叠群长度)Ameasureofthecontiglength(orscaffoldlength)containinga'typical'nucleotide.Specifically,itisthemaximumlengthLsuchthat50%ofallnucleotideslieincontigs(orscaffolds)ofsizeatleastL.Nats(naturallogarithm)Anumberexpressedinunitsofthenaturallogarithm.NCBI(美国国家生物技术信息中心)NationalCenterforBiotechnologyInformation(USA).CreatedbytheUnitedStatesCongressin1988,todevelopinformationsystemstosupportthe 139 《生物信息学札记》樊龙江 biologicalmunity.Needleman-Wunschalgorithm(Needleman-Wunsch算法)Usesdynamicprogrammingtofindglobalalignmentsbetweensequences.Neighbor-joiningmethod(邻接法)Clusterstogetheralikepairswithinagroupofrelatedobjects(e.g.,geneswithsimilarsequences)tocreateatreewhosebranchesreflectthedegreesofdifferenceamongtheobjects.work(神经网络)Fromartificialintelligencealgorithms,techniquesthatinvolveasetofmanysimpleunitsthatholdsymbolicdata,whichareinterconnectedbyworkoflinksassociatedwithnumericweights.Unitsoperateonlyontheirsymbolicdataandontheinputsthattheyreceivethroughtheirconnections.Mostworksuseatrainingalgorithm(seeBack-propagation)toadjustconnectionweights,allowingworktolearnassociationsbetweenvariousinputandoutputpatterns.SeealsoFeed-forwardwork.NIH(美国国家卫生研究院)NationalInstitutesofHealth(USA).Noise(噪音)Insequenceanalysis,asmallamountofrandomlygeneratedvariationinsequencesthatisaddedtoamodelofthesequences;e.g.,ahiddenMarkovmodelorscoringmatrix,inordertoavoidthemodeloverfittingthesequences.SeealsoOverfitting.Normaldistribution(正态分布)Thedistributionfoundformanytypesofdatasuchasbodyweight,size,andexamscores.Thedistributionisabell-shapedcurvethatisdescribedbyameanandstandarddeviationofthemean.Localsequencealignmentscoresbetweenunrelatedorrandomsequencesdonotfollowthisdistributionbutinsteadtheextremevaluedistributionwhichhasamuchextendedtailforhigherscores.SeealsoExtremevaluedistribution.ObjectManagementGroup(OMG)(国际对象管理协作组)Anot-for-profitcorporationthatwasformedtoponent-basedsoftwarebyintroducingstandardizedobjectsoftware.TheOMGestablishesindustryguidelinesanddetailedobjectmanagementspecificationsinordertoprovidemonframeworkforapplicationdevelopment.WithinOMGisaLifeSciencesResearchgroup,aconsortiumrepresentingpanies,academicinstitutions,softwarevendors,andhardwarevendorswhoareworkingtogethertomunicationandinter-operabilityputationalresourcesinlifesciencesresearch.SeeCORBA.Object-orienteddatabase(面向对象数据库)Unlikerelationaldatabases(seeentry),whichuseatabularstructure,object-orienteddatabasesattempttomodelthestructureofagivendatasetascloselyaspossible.Indoingso,object-orienteddatabasestendtoreducetheappearanceofduplicateddataandplexityofquerystructureoftenfoundinrelationaldatabases. 140 《生物信息学札记》樊龙江 Oddsscore(概率/几率值)Theratioofthelikelihoodsoftwoeventsores.Insequencealignmentsandscoringmatrices,theoddsscoreformatchingtwosequencecharactersistheratioofthefrequencywithwhichthecharactersarealignedinrelatedsequencesdividedbythefrequencywithwhichthosesametwocharactersalignbychancealone,giventhefrequencyofurrenceofeachinthesequences.Oddsscoresforasetofindividuallyalignedpositionsareobtainedbymultiplyingtheoddsscoresforeachposition.Oddsscoresareoftenconvertedtologarithmstocreatelogoddsscoresthatcanbeaddedtoobtainthelogoddsscoreofasequencealignment.OMIM(一种人类遗传疾病数据库)OnlineMendelianInheritanceinMan.Databaseoficdiseaseswithreferencestomolecularmedicine,cellbiology,biochemistryandclinicaldetailsofthediseases.Optimalalignment(最佳联配)Thehighest-scoringalignmentfoundbyanalgorithmcapableofproducingmultiplesolutions.Thisisthebestpossiblealignmentthatcanbefound,givenanyparameterssuppliedbytheusertothesequencealignmentprogram.ORF(开放阅读框)OpenReadingFrame.Aseriesofcodons(basetriplets)whichcanbetranslatedintoaprotein.Therearesixpotentialreadingframesofanunidentifedsequence;TBLASTN(seeBLAST)transalatesanucleotidesequenceinallsixreadingframes,intoaprotein,thenattemptstoaligntheresultstosequenecesinaproteindatabase,returningtheresultsasanucleotidesequence.Themostlikelyreadingframecanbeidentifiedusingon-linesoftware(e.g.ORFFinder).Orthologous(直系同源)Homologoussequencesindifferentspeciesthatarosefrommonancestralgeneduringspeciation;mayormaynotberesponsibleforasimilarfunction.Apairofgenesfoundintwospeciesareorthologouswhentheencodedproteinsare60-80%identicalinanalignment.Theproteinsalmostcertainlyhavethesamethree-dimensionalstructure,domainstructure,andbiologicalfunction,andtheencodinggeneshaveoriginatedfrommonancestorgeneatanearlierevolutionarytime.Twoorthologs1andIIingenomesAandB,respectively,maybeidentifiedwhenpletegenomesoftwospeciesareavailable:(1)inadatabasesimilaritysearchofalloftheproteomeofBusingIasaquery,IIisthebesthitfound,and
(2)Iisthebesthitwhen11isusedasaqueryoftheproteomeofB.Thebesthitisthedatabasesequencewiththehighestexpectvalue(E).Orthologyisalsopredictedbyaverycloseicrelationshipbetweensequencesorbyaclusteranalysis.ComparetoParalogs.SeealsoClusteranalysis.Outputlayer(输出层)Thefinallayerofaworkinwhichsignalsfromlowerlevelsinworkareinputintooutputstateswheretheyareweightedandsummedto 141 《生物信息学札记》樊龙江 giveanoutputsignal.Forexample,theoutputsignalmightbethepredictionofonetypeofproteinsecondarystructureforthecentralaminoacidinasequencewindow.OverfittingCanurwhenusingalearningalgorithmtotrainamodelsuchasaorhid-denMarkovmodel.Overfittingreferstothemodelingtoohighlyrepresentativeofthetrainingdataandthusnolongerrepresentativeoftheoverallrangeofdatathatissupposedtobemodeled.Pvalue(P值/概率值)Theprobabilityofanalignmenturringwiththescoreinquestionorbetter.Thepvalueiscalculatedbyrelatingtheobservedalignmentscore,S,totheexpecteddistributionofHSPscoresparisonsofrandomsequencesofthesamelengthpositionasthequerytothedatabase.ThemosthighlysignificantPvalueswillbethosecloseto0.PvaluesandEvaluesaredifferentwaysofrepresentingthesignificanceofthealignment.Pair-wisesequencealignment(双序列联配)Analignmentperformedbetweentwosequences.PAM(可接受突变百分率/可以观察到的突变百分率,它可作为一种进化时间单位) PercenteptedMutation.AunitintroducedbyDayhoffetal.toquantifytheamountofevolutionarychangeinaproteinsequence.1.0PAMunit,istheamountofevolutionwhichwillchange,onaverage,1%ofaminoacidsinaproteinsequence.APAM(x)substitutionmatrixisalook-uptableinwhichscoresforeachaminoacidsubstitutionhavebeencalculatedbasedonthefrequencyofthatsubstitutionincloselyrelatedproteinsthathaveexperiencedacertainamount(x)ofevolutionarydivergence.Paralogous(旁系同源)Homologoussequenceswithinasinglespeciesthatarosebygeneduplication.Genesthatarerelatedthroughgeneduplicationevents.Theseeventsmayleadtotheproductionofafamilyofrelatedproteinswithsimilarbiologicalfunctionswithinaspecies.Paralogousgenefamilieswithinaspeciesareidentifiedbyusinganindividualproteinasaqueryinadatabasesimilaritysearchoftheentireproteomeofanism.Theprocessisrepeatedfortheentireproteomeandtheresultingsetsofrelatedproteinsarethensearchedforclustersthataremostlikelytohaveaconserveddomainstructureandshouldrepresentaparalogousgenefamily.ParametricsequencealignmentAnalgorithmthatfindsarangeofpossiblealignmentsbasedonvaryingtheparametersofthescoringsystemformatches,mismatches,andgappenalties.AnexampleistheBayesblockaligner.PDB(主要蛋白质结构数据库之一)BrookhavenProteinDataBank.Adatabaseandformatoffileswhichdescribethe3Dstructureofaproteinornucleicacid,asdeterminedbyX-raycrystallographyornuclearicresonance(NMR)imaging.The 142 《生物信息学札记》樊龙江 moleculesdescribedbythefilesareusuallyviewedlocallybydedicatedsoftware,butcansometimesbevisualisedontheworldwideweb.Pearsoncorrelationcoefficent(Pearson相关系数)Ameasureofthecorrelationbetweentwovariablesthatreflectsthedegreetowhichthetwovariablesarerelated.Forexample,thecoefficientisusedasameasureofsimilarityofgeneexpressioninamicroarrayexperiment.SeealsoCorrelationcoefficient.PercentidentityThepercentageofthecolumnsinanalignmentoftwosequencesthatincludesidenticalaminoacids.Columnsinthealignmentthatincludegapsarenotscoredinthecalculation.Percentsimilarity(相似百分率)Thepercentageofthecolumnsinanalignmentoftwosequencesthatincludeseitheridenticalaminoacidsoraminoacidsthatarefrequentlyfoundsubstitutedforeachotherinsequencesofrelatedproteins(conservativesubstitutions).ThesesubstitutionsmaybefoundinanaminoacidsubstitutionmatrixsuchastheDayhoffPAMandHenikoffBLOSUMmatrices.Columnsinthealignmentthatincludegapsarenotscoredinthecalculation.Perceptron(感知器,模拟人类视神经控制系统的图形识别机)Aworkinwhichinputandoutputstatesaredirectlyconnectedwithoutinterveninghiddenlayers.PHRED(一种广泛应用的原始序列分析程序,可以对序列的各个碱基进行识别和质量评价) Awidelyputerprogramthatanalysesrawsequencetoproducea'basecall'withanassociated'qualityscore'foreachpositioninthesequence.APHREDqualityscoreofXcorrespondstoanerrorprobabilityofapproximately10-X/10.Thus,aPHREDqualityscoreof30correspondsto99.9%uracyforthebasecallintherawread.PHRAP(一种广泛应用的原始序列组装程序)Awidelyputerprogramthatassemblesrawsequenceintosequencecontigsandassignstoeachpositioninthesequenceanassociated'qualityscore',onthebasisofthePHREDscoresoftherawsequencereads.APHRAPqualityscoreofXcorrespondstoanerrorprobabilityofapproximately10-X/10.Thus,aPHRAPqualityscoreof30correspondsto99.9%uracyforabaseintheassembledsequence.icstudies(系统发育研究)PIR(主要蛋白质序列数据库之
一,翻译自GenBank)AdatabaseoftranslatedGenBanknucleotidesequences.PIRisaredundant(seeRedundancy)proteinsequencedatabase.Thedatabaseisdividedintofourcategories:PIR1-Classifiedandannotated.PIR2-Annotated.PIR3-Unverified.PIR4-Unencodedoruntranslated.Poissondistribution(帕松分布)Usedtopredicttheurrenceofinfrequenteventsoveralongperiodoftime 143 《生物信息学札记》樊龙江 orwhentherearealargenumberoftrials.Insequenceanalysis,itisusedtocalculatethechancethatonepairofalargenumberofpairsofunrelatedsequencesmaygiveahighlocalalignmentscore.Position-specificscoringmatrix(PSSM)(特定位点记分矩阵,PSI-BLAST等搜索程序使用) ThePSSMgivesthelog-oddsscoreforfindingaparticularmatchingaminoacidinatargetsequence.Representsthevariationfoundinthecolumnsofanalignmentofasetofrelatedsequences.Eachsubsequentmatrixcolumncorrespondstothenextcolumninthealignmentandeachrowcorrespondstoaparticularsequencecharacter(oneoffourbasesinDNAsequencesor20aminoacidsinproteinsequences).Matrixvaluesarelogoddsscoresobtainedbydividingthecountsoftheresidueinthealignment,dividingbytheexpectednumberofcountsbasedonposition,andconvertingtheratiotoalogscore.Thematrixismovedalongsequencestofindsimilarregionsbyaddingthematchinglogoddsscoresandlookingforhighvalues.Thereisnoallowanceforgaps.Alsocalledaweightmatrixorscoringmatrix.Posterior(Bayesiananalysis)AconditionalprobabilitybasedonpriorknowledgeandnewlyevaluatedrelationshipsamongvariablesusingBayesrule.SeealsoBayesrule.Prior(Bayesiananalysis)Theexpecteddistributionofavariablebasedonpreviousdata.Profile(分布型)Amatrixrepresentationofaconservedregioninamultiplesequencealignmentthatallowsforgapsinthealignment.Therowsincludescoresformatchingsequentialcolumnsofthealignmenttoatestsequence.Thecolumnsinclude

标签: #文件 #cpi #驾照 #cr #cv #货币 #职位 #call