生物 化学专业学术搜索引擎
http://scholar.chemgogo.com/
chemgogo专业的学术搜索
chemgogo内容最全面的在线学术搜索致力于为中国的科研工作者提供最新最权威的信息搜索如:生物医药 科研基金 国家政策 科研动态 会议信息 学术导航 科技新闻 名校视频课程 医药中间体 化学科技论文 生物论文 SCI论文 生化试剂 标准品 影响因子 化工词典 元素周期表 生物化学 生化企业 中文MSDS 生物信息 危险品
chemgogo在线学术信息搜索如:生物医药 科研基金 国家政策 科研动态 会议信息 学术导航 科技新闻 名校视频课程 医药中间体 化学科技论文 生物论文 SCI论文 生化试剂 标准品 影响因子
biopython NCBIWWW.qblast
http://www.biopython.org/DIST/docs/tutorial/Tutorial.html
6.2 Running BLAST over the Internet
We use the function qblast() in the Bio.Blast.NCBIWWW module call the online version of BLAST. This has three non-optional arguments:
* The first argument is the blast program to use for the search, as a lower case string. The options and descriptions of the programs are available at http://www.ncbi.nlm.nih.gov/BLAST/blast_program.html. Currently qblast only works with blastn, blastp, blastx, tblast and tblastx.
* The second argument specifies the databases to search against. Again, the options for this are available on the NCBI web pages at http://www.ncbi.nlm.nih.gov/BLAST/blast_databases.html.
* The third argument is a string containing your query sequence. This can either be the sequence itself, the sequence in fasta format, or an identifier like a GI number.
The qblast function also take a number of other option arguments which are basically analogous to the different parameters you can set on the BLAST web page. We’ll just highlight a few of them here:
* The qblast function can return the BLAST results in various formats, which you can choose with the optional format_type keyword: “HTML”, “Text”, “ASN.1″, or “XML”. The default is “XML”, as that is the format expected by the parser, described in section 6.4 below.
* The argument expect sets the expectation or e-value threshold.
For more about the optional BLAST arguments, we refer you to the NCBI’s own documentation, or that built into Biopython:
>>> from Bio.Blast import NCBIWWW
>>> help(NCBIWWW.qblast)
For example, if you have a nucleotide sequence you want to search against the non-redundant database using BLASTN, and you know the GI number of your query sequence, you can use:
>>> from Bio.Blast import NCBIWWW
>>> result_handle = NCBIWWW.qblast(”blastn”, “nr”, “8332116″)
Alternatively, if we have our query sequence already in a FASTA formatted file, we just need to open the file and read in this record as a string, and use that as the query argument:
>>> from Bio.Blast import NCBIWWW
>>> fasta_string = open(”m_cold.fasta”).read()
>>> result_handle = NCBIWWW.qblast(”blastn”, “nr”, fasta_string)
We could also have read in the FASTA file as a SeqRecord and then supplied just the sequence itself:
>>> from Bio.Blast import NCBIWWW
>>> from Bio import SeqIO
>>> record = SeqIO.read(open(”m_cold.fasta”), format=”fasta”)
>>> result_handle = NCBIWWW.qblast(”blastn”, “nr”, record.seq)
Supplying just the sequence means that BLAST will assign an identifier for your sequence automatically. You might prefer to use the SeqRecord object’s format method to make a fasta string (which will include the existing identifier):
>>> from Bio.Blast import NCBIWWW
>>> from Bio import SeqIO
>>> record = SeqIO.read(open(”m_cold.fasta”), format=”fasta”)
>>> result_handle = NCBIWWW.qblast(”blastn”, “nr”, record.format(”fasta”))
This approach makes more sense if you have your sequence(s) in a non-FASTA file format which you can extract using Bio.SeqIO (see Chapter 4).
Whatever arguments you give the qblast() function, you should get back your results in a handle object (by default in XML format). The next step would be to parse the XML output into python objects representing the search results (Section 6.4), but you might want to save a local copy of the output file first.
6.3 Saving BLAST output
Before parsing the results, it is often useful to save them into a file so that you can use them later without having to go back and re-blasting everything. I find this especially useful when debugging my code that extracts info from the BLAST files, but it could also be useful just for making backups of things you’ve done.
If you don’t want to save the BLAST output, you can skip to section 6.4. If you do, read on.
We need to be a bit careful since we can use result_handle.read() to read the BLAST output only once – calling result_handle.read() again returns an empty string. First, we use read() and store all of the information from the handle into a string:
>>> blast_results = result_handle.read()
Next, we save this string in a file:
>>> save_file = open(”my_blast.xml”, “w”)
>>> save_file.write(blast_results)
>>> save_file.close()
After doing this, the results are in the file my_blast.xml and the variable blast_results contains the BLAST results in a string form. However, the parse function of the BLAST parser (described in 6.4) takes a file-handle-like object, not a plain string. To get a handle, there are two things you can do:
* Use the Python standard library module cStringIO. The following code will turn the plain string into a handle, which we can feed directly into the BLAST parser:
>>> import cStringIO
>>> result_handle = cStringIO.StringIO(blast_results)
* Open the saved file for reading. Duh.
>>> result_handle = open(”my_blast.xml”)
Now that we’ve got the BLAST results back into a handle again, we are ready to do something with them, so this leads us right into the parsing section.
6.4 Parsing BLAST output
As mentioned above, BLAST can generate output in various formats, such as XML, HTML, and plain text. Originally, Biopython had a parser for BLAST plain text and HTML output, as these were the only output formats supported by BLAST. Unfortunately, the BLAST output in these formats kept changing, each time breaking the Biopython parsers. As keeping up with changes in BLAST became a hopeless endeavor, especially with users running different BLAST versions, we now recommend to parse the output in XML format, which can be generated by recent versions of BLAST. Not only is the XML output more stable than the plain text and HTML output, it is also much easier to parse automatically, making Biopython a whole lot more stable.
Though deprecated, the parsers for BLAST output in plain text or HTML output are still available in Biopython (see Section 6.6). Use them at your own risk: they may or may not work, depending on which BLAST version you’re using.
You can get BLAST output in XML format in various ways. For the parser, it doesn’t matter how the output was generated, as long as it is in the XML format.
* You can use Biopython to run BLAST locally, as described in section 6.1.
* You can use Biopython to run BLAST over the internet, as described in section 6.2.
* You can do the BLAST search yourself on the NCBI site through your web browser, and then save the results. You need to choose XML as the format in which to receive the results, and save the final BLAST page you get (you know, the one with all of the interesting results!) to a file.
* You can also run BLAST locally without using Biopython, and save the output in a file. Again, you need to choose XML as the format in which to receive the results.
The important point is that you do not have to use Biopython scripts to fetch the data in order to be able to parse it.
Doing things in one of these ways, you then need to get a handle to the results. In Python, a handle is just a nice general way of describing input to any info source so that the info can be retrieved using read() and readline() functions. This is the type of input the BLAST parser (and most other Biopython parsers) take.
If you followed the code above for interacting with BLAST through a script, then you already have result_handle, the handle to the BLAST results. For example, using a GI number to do an online search:
>>> from Bio.Blast import NCBIWWW
>>> result_handle = NCBIWWW.qblast(”blastn”, “nr”, “8332116″)
If instead you ran BLAST some other way, and have the BLAST output (in XML format) in the file my_blast.xml, all you need to do is to open the file for reading:
>>> result_handle = open(”my_blast.xml”)
Now that we’ve got a handle, we are ready to parse the output. The code to parse it is really quite small:
>>> from Bio.Blast import NCBIXML
>>> blast_records = NCBIXML.parse(result_handle)
To understand what NCBIXML.parse returns, there are two things that you need to keep in mind:
* The BLAST output may contain the output of more than one BLAST search. This will for example be the case if you ran BLAST locally on a Fasta file containing more than one sequence. For each sequence, the BLAST parser will return one BLAST record.
* The BLAST output may therefore be huge.
To be able to handle these situations, NCBIXML.parse() returns an iterator (like Bio.SeqIO.parse(), see Chapter 4). In plain English, an iterator allows you to step through the BLAST output, retrieving BLAST records one by one for each BLAST search:
>>> blast_record = blast_records.next()
# … do something with blast_record
>>> blast_record = blast_records.next()
# … do something with blast_record
>>> blast_record = blast_records.next()
# … do something with blast_record
>>> blast_record = blast_records.next()
Traceback (most recent call last):
File “”, line 1, in
StopIteration
# No further records
Or, you can use a for-loop:
>>> for blast_record in blast_records:
… # Do something with blast_record
Note though that you can step through the BLAST records only once. Usually, from each BLAST record you would save the information that you are interested in. If you want to save all returned BLAST records, you can convert the iterator into a list:
>>> blast_records = list(blast_records)
Now you can access each BLAST record in the list with an index as usual. If your BLAST file is huge though, you may run into memory problems trying to save them all in a list.
Usually, you’ll be running one BLAST search at a time. Then, all you need to do is to pick up the first (and only) BLAST record in blast_records:
>>> blast_record = blast_records.next()
I guess by now you’re wondering what is in a BLAST record.
6.5 The BLAST record class
A BLAST Record contains everything you might ever want to extract from the BLAST output. Right now we’ll just show an example of how to get some info out of the BLAST report, but if you want something in particular that is not described here, look at the info on the record class in detail, and take a gander into the code or automatically generated documentation – the docstrings have lots of good info about what is stored in each piece of information.
To continue with our example, let’s just print out some summary info about all hits in our blast report greater than a particular threshold. The following code does this:
>>> E_VALUE_THRESH = 0.04
>>> for alignment in blast_record.alignments:
… for hsp in alignment.hsps:
… if hsp.expect gb|AF283004.1|AF283004 Arabidopsis thaliana cold acclimation protein WCOR413-like protein
alpha form mRNA, complete cds
length: 783
e value: 0.034
tacttgttgatattggatcgaacaaactggagaaccaacatgctcacgtcacttttagtcccttacatattcctc…
||||||||| | ||||||||||| || |||| || || |||||||| |||||| | | |||||||| ||| ||…
tacttgttggtgttggatcgaaccaattggaagacgaatatgctcacatcacttctcattccttacatcttcttc…
Basically, you can do anything you want to with the info in the BLAST report once you have parsed it. This will, of course, depend on what you want to use it for, but hopefully this helps you get started on doing what you need to do!
An important consideration for extracting information from a BLAST report is the type of objects that the information is stored in. In Biopython, the parsers return Record objects, either Blast or PSIBlast depending on what you are parsing. These objects are defined in Bio.Blast.Record and are quite complete.
Here are my attempts at UML class diagrams for the Blast and PSIBlast record classes. If you are good at UML and see mistakes/improvements that can be made, please let me know. The Blast class diagram is shown in Figure 6.5.
The PSIBlast record object is similar, but has support for the rounds that are used in the iteration steps of PSIBlast. The class diagram for PSIBlast is shown in Figure 6.5.
6.6 Deprecated BLAST parsers
Older versions of Biopython had parsers for BLAST output in plain text or HTML format. Over the years, we discovered that it is very hard to maintain these parsers in working order. Basically, any small change to the BLAST output in newly released BLAST versions tends to cause the plain text and HTML parsers to break. We therefore recommend parsing BLAST output in XML format, as described in section 6.4.
The HTML parser in Bio.Blast.NCBIWWW has been officially deprecated and will issue warnings if you try and use it. We plan to remove this completely in a few releases time.
Our plain text BLAST parser works a bit better, but use it at your own risk. It may or may not work, depending on which BLAST versions or programs you’re using.
6.6.1 Parsing plain-text BLAST output
The plain text BLAST parser is located in Bio.Blast.NCBIStandalone.
As with the XML parser, we need to have a handle object that we can pass to the parser. The handle must implement the readline() method and do this properly. The common ways to get such a handle are to either use the provided blastall or blastpgp functions to run the local blast, or to run a local blast via the command line, and then do something like the following:
>>> result_handle = open(”my_file_of_blast_output.txt”)
Well, now that we’ve got a handle (which we’ll call result_handle), we are ready to parse it. This can be done with the following code:
>>> from Bio.Blast import NCBIStandalone
>>> blast_parser = NCBIStandalone.BlastParser()
>>> blast_record = blast_parser.parse(result_handle)
This will parse the BLAST report into a Blast Record class (either a Blast or a PSIBlast record, depending on what you are parsing) so that you can extract the information from it. In our case, let’s just use print out a quick summary of all of the alignments greater than some threshold value.
>>> E_VALUE_THRESH = 0.04
>>> for alignment in blast_record.alignments:
… for hsp in alignment.hsps:
… if hsp.expect >> from Bio.Blast import NCBIStandalone
>>> blast_parser = NCBIStandalone.BlastParser()
Then we will assume we have a handle to a bunch of blast records, which we’ll call result_handle. Getting a handle is described in full detail above in the blast parsing sections.
Now that we’ve got a parser and a handle, we are ready to set up the iterator with the following command:
>>> blast_iterator = NCBIStandalone.Iterator(result_handle, blast_parser)
The second option, the parser, is optional. If we don’t supply a parser, then the iterator will just return the raw BLAST reports one at a time.
Now that we’ve got an iterator, we start retrieving blast records (generated by our parser) using next():
>>> blast_record = blast_iterator.next()
Each call to next will return a new record that we can deal with. Now we can iterate through this records and generate our old favorite, a nice little blast report:
>>> for blast_record in blast_iterator :
… E_VALUE_THRESH = 0.04
… for alignment in blast_record.alignments:
… for hsp in alignment.hsps:
… if hsp.expect 75:
… dots = ‘…’
… else:
… dots = ”
… print hsp.query[0:75] + dots
… print hsp.match[0:75] + dots
… print hsp.sbjct[0:75] + dots
The iterator allows you to deal with huge blast records without any memory problems, since things are read in one at a time. I have parsed tremendously huge files without any problems using this.
6.6.3 Finding a bad record somewhere in a huge file
One really ugly problem that happens to me is that I’ll be parsing a huge blast file for a while, and the parser will bomb out with a ValueError. This is a serious problem, since you can’t tell if the ValueError is due to a parser problem, or a problem with the BLAST. To make it even worse, you have no idea where the parse failed, so you can’t just ignore the error, since this could be ignoring an important data point.
We used to have to make a little script to get around this problem, but the Bio.Blast module now includes a BlastErrorParser which really helps make this easier. The BlastErrorParser works very similar to the regular BlastParser, but it adds an extra layer of work by catching ValueErrors that are generated by the parser, and attempting to diagnose the errors.
Let’s take a look at using this parser – first we define the file we are going to parse and the file to write the problem reports to:
>>> import os
>>> blast_file = os.path.join(os.getcwd(), “blast_out”, “big_blast.out”)
>>> error_file = os.path.join(os.getcwd(), “blast_out”, “big_blast.problems”)
Now we want to get a BlastErrorParser:
>>> from Bio.Blast import NCBIStandalone
>>> error_handle = open(error_file, “w”)
>>> blast_error_parser = NCBIStandalone.BlastErrorParser(error_handle)
Notice that the parser take an optional argument of a handle. If a handle is passed, then the parser will write any blast records which generate a ValueError to this handle. Otherwise, these records will not be recorded.
Now we can use the BlastErrorParser just like a regular blast parser. Specifically, we might want to make an iterator that goes through our blast records one at a time and parses them with the error parser:
>>> result_handle = open(blast_file)
>>> iterator = NCBIStandalone.Iterator(result_handle, blast_error_parser)
We can read these records one a time, but now we can catch and deal with errors that are due to problems with Blast (and not with the parser itself):
>>> try:
… next_record = iterator.next()
… except NCBIStandalone.LowQualityBlastError, info:
… print “LowQualityBlastError detected in id %s” % info[1]
The .next() method is normally called indirectly via a for-loop. Right now the BlastErrorParser can generate the following errors:
* ValueError – This is the same error generated by the regular BlastParser, and is due to the parser not being able to parse a specific file. This is normally either due to a bug in the parser, or some kind of discrepancy between the version of BLAST you are using and the versions the parser is able to handle.
* LowQualityBlastError – When BLASTing a sequence that is of really bad quality (for example, a short sequence that is basically a stretch of one nucleotide), it seems that Blast ends up masking out the entire sequence and ending up with nothing to parse. In this case it will produce a truncated report that causes the parser to generate a ValueError. LowQualityBlastError is reported in these cases. This error returns an info item with the following information:
o item[0] – The error message
o item[1] – The id of the input record that caused the error. This is really useful if you want to record all of the records that are causing problems.
As mentioned, with each error generated, the BlastErrorParser will write the offending record to the specified error_handle. You can then go ahead and look and these and deal with them as you see fit. Either you will be able to debug the parser with a single blast report, or will find out problems in your blast runs. Either way, it will definitely be a useful experience!
Hopefully the BlastErrorParser will make it much easier to debug and deal with large Blast files.
6.7 Dealing with PSI-BLAST
You can run the standalone verion of PSI-BLAST (the command line tool blastpgp) using the blastpgp function in the Bio.Blast.NCBIStandalone module. At the time of writing, the NCBI do not appear to support tools running a PSI-BLAST search via the internet.
Note that the Bio.Blast.NCBIXML parser can read the XML output from current versions of PSI-BLAST, but information like which sequences in each iteration is new or reused isn’t present in the XML file. If you care about this information you may have more joy with the plain text output and the PSIBlastParser in Bio.Blast.NCBIStandalone.
6.8 Dealing with RPS-BLAST
You can run the standalone verion of RPS-BLAST (the command line tool rpsblast) using the rpsblast function in the Bio.Blast.NCBIStandalone module. At the time of writing, the NCBI do not appear to support tools running an RPS-BLAST search via the internet.
You can use the Bio.Blast.NCBIXML parser to read the XML output from current versions of RPS-BLAST.
python blast programmer:
from Bio.Blast import NCBIWWW
from Bio import SeqIO
record = SeqIO.read(open("protein.txt"), format="fasta")
result_handle = NCBIWWW.qblast("blastp", "refseq_protein", record.seq, format_type='Text')
blast_results = result_handle.read()
save_file = open("protein1ok.text", "w")
save_file.write(blast_results)
save_file.close()
blastn:
from Bio.Blast import NCBIWWW
from Bio import SeqIO
record = SeqIO.read(open("cdna.txt"), format="fasta")
result_handle = NCBIWWW.qblast("blastn", "nr", record.seq)
blast_results = result_handle.read()
save_file = open("protein1.xml", "w")
save_file.write(blast_results)
save_file.close()
Protein production and purification database
http://targetdb.pdb.org/TargetDB/downloads/downloads.html
http://pepcdb.pdb.org/protocolStat.html
http://www.nigms.nih.gov/Initiatives/PSI/
http://www.spineurope.org/page.php?page=home
Bioinformatics and Functional Genomics
http://www.ls.manchester.ac.uk/research/themes/bioinformatics/
很多生物信息工具
多重序列比对及系统发生树的构建
一、用CLUSTALX软件对已知DNA序列做多序列比对。
操作步骤:
1、以FASTA格式准备8个DNA序列test.seq(或txt)文件。
2、双击进入CLUSTALX程序,点FILE进入LOAD SEQUENCE,打开test.seq(或txt)文件。
3、点ALIGNMENT,在默认alignment parameters下,点击Do complete Alignment 。在新出现的窗口中点击ALIGN进行比对,这时输出两个文件(默认输出文件格式为Clustal格式):比对文件test.aln和向导树文件 test.dnd。
4、点FILE进入Save sequence as,在format 框中选PHYLIP,文件在PHYLIP软件目录下以test.phy存在,点击OK。
5、将PHYLIP软件目录下的test.phy文件拷贝到EXE文件夹中。用计事本方式打开的test.phy文件的部分序列如下:
图中的8和50分别表示8个序列和每个序列有50个碱基。
二、用PHYLIP软件推导进化树。
1、进入EXE文件夹,点击SEQBOOT软件输入test.phy文件名,回车。
图中的D、J、R、I、O、1、2代表可选择的选项,键入这些字母,程序的条件就会发生改变。D选项无须改变。J选项有三种条件可以选择,分别是 Bootstrap、Jackknife和Permute。文章上面提到用Bootstraping法对进化树进行评估,所谓Bootstraping法就是从整个序列的碱基(氨基酸)中任意选取一半,剩下的一半序列随机补齐组成一个新的序列。这样,一个序列就可以变成了许多序列。一个多序列组也就可以变成许多个多序列组。根据某种算法(最大简约性法、最大可能性法、除权配对法或邻位相连法)每个多序列组都可以生成一个进化树。将生成的许多进化树进行比较,按照多数规则(majority-rule)我们就会得到一个最“逼真”的进化树。Jackknife则是另外一种随机选取序列的方法。它与 Bootstrap法的区别是不将剩下的一半序列补齐,只生成一个缩短了一半的新序列。Permute是另外一种取样方法,其目的与Bootstrap和 Jackknife法不同,这里不再介绍。R选项让使用者输入republicate的数目。所谓republicate就是用Bootstrap法生成的一个多序列组。根据多序列中所含的序列的数目的不同可以选取不同的republicate,此处选200,输入Y确认参数并在Random number seed (must be odd) ?的下面输入一个奇数(比如3)。当我们设置好条件后按回车,程序开始运行,并在EXE文件夹中产生一个文件outfile,Outfile用记事本打开如下:
这个文件包括了200个republicate。
2、文件outfile改为infile。点击DNADIST程序。选项M是输入刚才设置的republicate的数目,输入D选择data sets,输入200。
设置好条件后,输入Y确认参数。程序开始运行,并在EXE文件夹中产生outfile,部分内容如下:
将outfile文件名改为infile,为避免与原先infile文件重复,将 原先文件名改为infile1。
3、EXE文件夹中选择通过距离矩阵推测进化树的算法,点击NEIGHBOR程序。输入M更改参数,输入D选择data sets。输入200。输入奇数种子3。
输 Y确认参数。程序开始运行,并在EXE文件夹中产生outfile和outtree两个结果输出。outtree文件是一个树文件,可以用 treeview等软件打开。outfile是一个分析结果的输出报告,包括了树和其他一些分析报告,可以用记事本直接打开。部分内容如下:
4、将outtree文件名改为intree,点击DRAWTREE程序,输入font1文件名,作为参数。输Y确认参数。程序开始运行,并出现Tree Preview图。
5、点击DRAWGRAM程序,输入font1文件名,作为参数。输Y确认参数。程序开始运行,并出现Tree Preview图。
6、将EXE文件夹中的outfile文件名改为outfile1,以避免被新生成的outfile 文件覆盖。点击CONSENSE程序。输入Y确认设置。EXE文件夹中新生成outfile和outtree。Outfile文件用记事本打开,内容如下:
7 、将EXE文件夹中的intree文件名改为intree1,将outtree改intree。点击DRAWTREE程序,输入font1文件名,作为参数。输Y确认参数。程序开始运行,并出现Tree Preview图。
8、点击DRAWGRAM程序,输入font1文件名,作为参数。输Y确认参数。程序开始运行,并出现Tree Preview图。
clustalw 命令行
clustalw -INFILE=myfile.seqs -GAPOPEN=2 -GAPEXT=4 -OUTPUTTREE=nj
clustalw -INFILE=2.txt -OUTPUTTREE=nj -TYPE=PROTEIN
DATA (sequences)
-INFILE=file.ext :input sequences.
-PROFILE1=file.ext and -PROFILE2=file.ext :profiles (old alignment).
VERBS (do things)
-OPTIONS :list the command line parameters
-HELP or -CHECK
utline the command line params.
-ALIGN :do full multiple alignment
-TREE :calculate NJ tree.
-BOOTSTRAP(=n) :bootstrap a NJ tree (n= number of bootstraps; def. = 1000).
-CONVERT
utput the input sequences in a different file format.
PARAMETERS (set things)
***General settings:****
-INTERACTIVE :read command line, then enter normal interactive menus
-QUICKTREE :use FAST algorithm for the alignment guide tree
-NEGATIVE :protein alignment with negative values in matrix
-OUTFILE= :sequence alignment file name
-OUTPUT= :GCG, GDE, PHYLIP or PIR
-OUTORDER= :INPUT or ALIGNED
-CASE :LOWER or UPPER (for GDE output only)
-SEQNOS= :OFF or ON (for Clustal output only)
***Fast Pairwise Alignments:***
-KTUPLE=n :word size -TOPDIAGS=n :number of best diags.
-WINDOW=n :window around best diags. -PAIRGAP=n :gap penalty
-SCORE
ERCENT or ABSOLUTE
***Slow Pairwise Alignments:***
-PWMATRIX=
rotein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-PWDNAMATRIX=
NA weight matrix=IUB, CLUSTALW or filename2
-PWGAPOPEN=f :gap opening penalty -PWGAPEXT=f :gap extension penalty
***Multiple Alignments:***
-NEWTREE= :file for new guide tree
-USETREE= :file for old guide tree
-MATRIX=
rotein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-DNAMATRIX=
NA weight matrix=IUB, CLUSTALW or filename
-GAPOPEN=f :gap opening penalty -GAPEXT=f :gap extension penalty
-ENDGAPS :no end gap separation pen. -GAPDIST=n :gap separation pen. range
-NOPGAP :residue-specific gaps off -NOHGAP :hydrophilic gaps off
-HGAPRESIDUES= :list hydrophilic res. -MAXDIV=n :% ident. for delay
-TYPE=
ROTEIN or DNA -TRANSWEIGHT :transitions weighted.
***Profile Alignments:***
-PROFILE :Merge two alignments by profile alignment
-NEWTREE1= :file for new guide tree for profile1
-NEWTREE2= :file for new guide tree for profile2
-USETREE1= :file for old guide tree for profile1
-USETREE2= :file for old guide tree for profile2
***Sequence to Profile Alignments:***
-SEQUENCES :Sequentially add profile2 sequences to profile1 alignment
-NEWTREE= :file for new guide tree
-USETREE= :file for old guide tree
***Structure Alignments:***
-NOSECSTR1 :do not use secondary structure-gap penalty mask for profile 1
-NOSECSTR2 :do not use secondary structure-gap penalty mask for profile 2
-SECSTROUT= :STRUCTURE or MASK or BOTH or NONE output in alignment file
-HELIXGAP=n :gap penalty for helix core residues
-STRANDGAP=n :gap penalty for strand core residues
-LOOPGAP=n :gap penalty for loop regions
-TERMINALGAP=n :gap penalty for structure termini
-HELIXENDIN=n :number of residues inside helix to be treated as terminal
-HELIXENDOUT=n :number of residues outside helix to be treated as terminal
-STRANDENDIN=n :number of residues inside strand to be treated as terminal
-STRANDENDOUT=n:number of residues outside strand to be treated as terminal
***Trees:***
-OUTPUTTREE=nj OR phylip OR dist
-SEED=n :seed number for bootstraps.
-KIMURA :use Kimura’s correction. -TOSSGAPS :ignore positions with gaps.
一、获取序列
一般自己通过测序得到一段序列(已知或未知的都可以),通过NCBI的BLAST获取相似性较高的一组序列,下载保存为FASTA格式。用BIOEDIT等软件编辑序列名称,注意PHYLIP在DOS下运行,文件名不能超过10位,超过的会自动截留前面10位。
二、多序列比对
目前一般应用CLASTAL X进行,注意输出格式选用PHY格式。生成的指导树文件(DND文件)可以直接用TREEVIEW打开编辑,形式上和最终生成的进化树类似,但是注意不是真正的进化树。
三、构建进化树
1.N-J法建树
依次应用PHYLIP软件中的SEQBOOT.EXE、DNADIST.EXE、NEIGHBOR.EXE和CONSENSE.EXE打开。具体步骤如下:
(1)打开seqboot.exe
输入文件名:输入你用CLASTAL X生成的PHY文件(*.phy)。
R为bootstrap的次数,一般为1000 (设你输入的值为M,即下两步DNADIST.EXE、NEIGHBOR.EXE中的M值也为1000)
odd number: (4N+1)(eg: 1、5、9…)
改好了y
得到outfile(在phylip文件夹内)
改名为2
(2)打开Dnadist.EXE
输入2
修改M值,再按D,然后输入1000(M值)
y
得到outfile(在phylip文件夹内)
改名为3
(3)打开Neighboor.EXE
输入3
M=1000(M值)
按Y
得到outfile和outtree(在phylip文件夹内)
改outtree为4,outfile改为402
(4)打开consense.exe
输入4
y
得到outfile和outtree(在phylip文件夹内)
Outfile可以改为*.txt文件,用记事本打开阅读。
四、进化树编辑和阅读
outtree 可改为*.tre文件,直接双击在treeview里看;也可以不改文件扩展名,直接用treeview、PHYLODRAW、NJPLOT等软件打开编辑。TREEVIEW可以显示BOOTSTRAN值,序列较多(60条以上)的时候打开直接显示有明显的重叠,可以在打印预览中显示,或输出为EMF WMF图片文件看,但是序列较多时BOOTSTRAN值的显示位置比较乱,和序列名称有重叠。
PHYLODRAW的编辑功能较强,可以自由调节 X、Y轴的长度。输出格式为BMP、PS格式。缺点是不能直接显示BOOTSTRAN值,包括打开TREEVIEW输出的NEX文件,而且输出的BMP文件不全,类似截屏文件,我用PHOTOSHOP进行拼接合成,添加BOOTSTRAN值和注解符号等。据说也可以将PS文件用记事本打开,改变其中的字号,然后通过ADOBE DISTRILLOR将PS转化为PDF,就可以解决问题。如果发现还有重叠,可以再次改变PS文件中的字号大小,直到合适为止。
NJPLOT可以显示BOOTSTRAN值和分值长度。但是不能调节图片X、Y轴的长度。
建MP,ML树将Dnadist和Neighboot两步分别改为Dnapars和Dnaml,其余步骤相同。据说ML法序列较多是非常耗时,我没有尝试。因为我的序列较多。
也可以用CLASTAL X中的BOOTSTRAN N-J TREE法生成进化树,TREE菜单输出格式选项(OUTPUT FORMAT OPTION)中的BOOTSTRAN LABELS ON 选NODE(节点)。在treeview里,选择tree菜单 ,然后把show internal edge lables 的选项打勾了,直接打开生成的文件bootstrap的值就可以显示出来。
clustalw -INFILE=5.txt -TYPE=PROTEIN -SEED=1000 -BOOTSTRAP=1000 -BOOTLABELS=node -OUTPUTTREE=phylip这行命令就可以简单的生成树了
蛋白酶切位点预测
PeptideCutter [references / documentation] predicts potential cleavage sites cleaved by proteases or chemicals in a given protein sequence. PeptideCutter returns the query sequence with the possible cleavage sites mapped on it and /or a table of cleavage site positions.
http://www.expasy.ch/tools/peptidecutter/
http://cmgm.stanford.edu/WWW/www_predict.html
不错的生物blog
http://ustcers.com/blogs/shininglake/pages/181.aspx
A0 结构基因组
A1 生物网站
A2 生物学论坛
A3 生物学数据库
A4 在线工具
A5 生物公司
A6 专业杂志
A7 专业个人主页
B 常用网站
C1 文史哲
gene Ontology mysql database 本地搭建
http://www.geneontology.org/GO.downloads.database.shtml
http://archive.geneontology.org/latest-full/
go_-
YYYYMM is the release date (the release export usually follows some
time after the monthly release, due to time taken to build)
the DATASET is one of:
——-
* termdb – a database containing just the information on the
GO terms and relationships. These are the table that are populated:
term GO controlled vocab terms
term2term relationships between GO terms
term_definition definitions of terms
dbxref external database identifier entities
term_dbxref links from terms to other databases
term_synonym synonyms for terms
graph_path transitive closure (all paths) in graph
* assocdb – a database containing both the GO vocabulary and
associations between GO terms and gene products. This database
subsumes termdb. These are the extra tables that are populated:
gene_product gene or protein or entity annotated
association link between gene product and GO term
evidence evidence type and reference for an assoc
gene_product_count recursive product counts per GO term
*seqdb – a database containing GO terms, gene products and the
sequences associated with these gene products. This db subsumes the
two above. It populates these additional tables:
seq biological sequence
gene_product_seq link between a product and a sequence
seq_dbxref external database links for a sequence
NOTE: there are other unpopulated tables – we may or may not decide to
populate these at some point in the future.
NOTE: The production version of seqdb with the full database has been
suspended until further notice.
*seqdblite – this is the same as seqdb, except all IEA associations
have been removed. The IEA associations provide relatively little
value compared to the curated associations, and they slow querying
down immensely. This is the distribution that AmiGO runs off of. We
are working on optimisations to allow AmiGO to run off of the full
seqdb release.
the TYPE is either
—-
.rdf-xml – RDF XML export of the database. this comes as one single
file. Note there is no RDF XML export of seqdb, as we do not include
sequences in the xml yet. We do not include IEA evidence associations
in the xml. We may decide to split this xml file into multiple files
at a later date.
.obo-xml – OBO XML Export. Currently ontology only
.owl – OWL Export. Currently ontology only
.tables – this is a directory containing the MySQL dump, see below
.sql – SQL CREATE TABLE and INSERT statements for building a local
instance of the database. equivalent to the .tables TYPE (but slower
to load)
安装termdb – a database containing just the information on the GO terms and relationships. These are the table that are populated:
下载termdb的所有文件:(go_200902-termdb-tables 这个文件是mysql的,只需要这个就可以了。)
1。
The database export was prepared from a mysql db – you should have no
problem importing it:
tar -zxvf go-YYYYMM-TYPE-tables.gz
cd
echo “create database mygo” | mysql -u root -p
cat *.sql | mysql mygo -u root -p
mysqlimport -L mygo *.txt -u root -p
Note: if you are using Windows, you may see warning messages when
loading some tables; to avoid this, load tables this way:
mysql> load data infile
“c:\\download\\GO\\july-release\\go_200307-assocdb-table
s\\association.txt” into table association lines terminated by ‘\r\n’;
This can be avoided if you disable “TAR file smart CR/LF conversion”
when using Winzip (thanks to Henrik Edgren for the tip
We are unable to support Windows users – please refer to your MySQL
documentation; if you experience other problems, you may wish to try
posting a question to the go-database mail list to see if other
Windows users have any advice.
2.安装好了之后,term 里面就有数据了
use GO::AppHandle;
my $dbname = “mygo”;
my $dbhost = “localhost”;
# my $mysqlhost = “localhost”;
# connect to a database on a specific host
$apph = GO::AppHandle->connect(-dbname=>$dbname, -dbhost=>”localhost”,-dbuser=>”root”,-dbauth=>”password”);
# EXAMPLE 1
# fetching a GO term from the datasource
$term = $apph->get_term({acc=>”GO:0003677″});
printf
“GO term; name=%s GO ID=%s\n”,
$term->name(), $term->public_acc();
that’s ok.
example:
http://wiki.geneontology.org/index.php/Example_Queries
http://cpansearch.perl.org/src/CMUNGALL/go-db-perl-0.01/doc/go-db-perl-doc.html
bioinformatics database2
101 MA Adult Mouse Anatomical Dictionary; part of Gene Expression Database Identifier http://www.informatics.jax.org/ http://www.informatics.jax.org/searches/AMA.cgi?id=MA:[example_id] http://www.informatics.jax.org/searches/AMA.cgi?id=MA:0000003 \N
102 MaizeGDB MaizeGDB MaizeGDB Object ID Number http://www.maizegdb.org http://www.maizegdb.org/cgi-bin/id_search.cgi?id=[example_id] http://www.maizegdb.org/cgi-bin/id_search.cgi?id=881225 \N
103 MaizeGDB_Locus MaizeGDB Maize gene name http://www.maizegdb.org http://www.maizegdb.org/cgi-bin/displaylocusresults.cgi?term=[example_id] http://www.maizegdb.org/cgi-bin/displaylocusresults.cgi?term=ZmPK1 \N
104 MEDLINE The Medline literature database Identifier \N \N \N \N
105 MEROPS MEROPS – the Peptidase Database Identifier http://merops.sanger.ac.uk/ http://merops.sanger.ac.uk/cgi-bin/pepsum?mid=[example_id] http://merops.sanger.ac.uk/cgi-bin/pepsum?mid=A08.001 \N
106 MEROPS_fam MEROPS: The Peptidase Database Peptidase family identifier http://merops.sanger.ac.uk/ http://merops.sanger.ac.uk/cgi-bin/famsum?family=[example_id] http://merops.sanger.ac.uk/cgi-bin/famsum?family=m18 \N
107 MeSH Medical Subject Headings MeSH heading http://www.nlm.nih.gov/mesh/2005/MBrowser.html http://www.nlm.nih.gov/cgi/mesh/2005/MB_cgi?mode=&term=[example_id] http://www.nlm.nih.gov/cgi/mesh/2005/MB_cgi?mode=&term=mitosis \N
108 MetaCyc The Metabolic Encyclopedia of metabolic and other pathways Identifier (pathway or reaction) http://metacyc.org/ http://biocyc.org/META/NEW-IMAGE?type=NIL&object=[example_id] http://biocyc.org/META/NEW-IMAGE?type=NIL&object=GLUTDEG-PWY \N
109 MGD Mouse Genome Database Gene symbol http://www.informatics.jax.org/ \N \N \N
110 MGI Mouse Genome Informatics Accession number http://www.informatics.jax.org/ http://www.informatics.jax.org/searches/accession_report.cgi?id=MGI:[example_id] http://www.informatics.jax.org/searches/accession_report.cgi?id=MGI:80863 \N
111 MIPS_funcat MIPS Functional Catalogue Identifier http://mips.gsf.de/proj/funcatDB/ http://mips.gsf.de/cgi-bin/proj/funcatDB/search_advanced.pl?action=2&wert=[example_id] http://mips.gsf.de/cgi-bin/proj/funcatDB/search_advanced.pl?action=2&wert=11.02 \N
112 MO The MGED Ontology ontology term http://mged.sourceforge.net/ontologies/MGEDontology.php http://mged.sourceforge.net/ontologies/MGEDontology.php#[example_id] http://mged.sourceforge.net/ontologies/MGEDontology.php#Action \N
113 MultiFun MultiFun, a cellfunction assignment schema \N http://genprotec.mbl.edu/files/Multifun.html \N \N \N
114 NASC_code Nottingham Arabidopsis Stock Centre Seeds Database NASC code Identifier http://arabidopsis.info http://seeds.nottingham.ac.uk/NASC/stockatidb.lasso?code=[example_id] http://seeds.nottingham.ac.uk/NASC/stockatidb.lasso?code=N3371 \N
115 NC-IUBMB Nomenclature Committee of the International Union of Biochemistry and Molecular Biology \N http://www.chem.qmw.ac.uk/iubmb/ \N \N \N
116 NCBI National Center for Biotechnology Information, Bethesda Prefix http://www.ncbi.nlm.nih.gov/ \N \N \N
117 NCBI_Gene NCBI Gene Identifier http://www.ncbi.nlm.nih.gov/ http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=gene&list_uids=[example_id] http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=gene&list_uids=4771 \N
118 NCBI_gi NCBI databases Identifier http://www.ncbi.nlm.nih.gov/ http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=[example_id] http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=113194944 \N
119 NCBI_GP NCBI GenPept Protein identifier http://www.ncbi.nlm.nih.gov/ http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=[example_id] http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=EAL72968 \N
120 NCBI_NM NCBI RefSeq mRNA identifier http://www.ncbi.nlm.nih.gov/ \N \N \N
121 NCBI_NP NCBI RefSeq Protein identifier http://www.ncbi.nlm.nih.gov/ \N \N \N
122 NMPDR National Microbial Pathogen Data Resource Identifier http://www.nmpdr.org http://www.nmpdr.org/linkin.cgi?id=[example_id] http://www.nmpdr.org/linkin.cgi?id=fig|306254.1.peg.183 \N
123 OMIM Mendelian Inheritance in Man Identifier http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=[example_id] http://www.ncbi.nlm.nih.gov/htbin-post/entrez/dispomim.cgi?id=190198 \N
124 PAMGO Plant-Associated Microbe Gene Ontology Interest Group \N http://pamgo.vbi.vt.edu/ \N \N \N
125 PAMGO_GAT Genome Annotation Tool (Agrobacterium tumefaciens C58); PAMGO Interest Group Gene http://agro.vbi.vt.edu/public/ http://agro.vbi.vt.edu/public/servlet/GeneEdit?&Search=Search&level=2&genename=[example_id] http://agro.vbi.vt.edu/public/servlet/GeneEdit?&Search=Search&level=2&genename=atu0001 \N
126 PAMGO_MGG Magnaporthe grisea Database at North Carolina State University; member of PAMGO Interest Group Locus http://scotland.fgl.ncsu.edu/smeng/GoAnnotationMagnaporthegrisea.html http://scotland.fgl.ncsu.edu/cgi-bin/adHocQuery.cgi?adHocQuery_dbName=smeng_goannotation&Action=Data&QueryName=Functional+Categorization+of+MGG+GO+Annotation&P_DBObjectSymbol=&P_EvidenceCode=&P_Aspect=&P_DBObjectSynonym=&P_KeyWord=[example_id] http://scotland.fgl.ncsu.edu/cgi-bin/adHocQuery.cgi?adHocQuery_dbName=smeng_goannotation&Action=Data&QueryName=Functional+Categorization+of+MGG+GO+Annotation&P_DBObjectSymbol=&P_EvidenceCode=&P_Aspect=&P_DBObjectSynonym=&P_KeyWord=MGG_05132 \N
127 PAMGO_VMD Virginia Bioinformatics Institute Microbial Database; member of PAMGO Interest Group Gene identifier http://phytophthora.vbi.vt.edu http://vmd.vbi.vt.edu/cgi-bin/browse/go_detail.cgi?gene_id=[example_id] http://vmd.vbi.vt.edu/cgi-bin/browse/go_detail.cgi?gene_id=109198 \N
128 PANTHER Protein ANalysis THrough Evolutionary Relationships Classification System \N http://www.pantherdb.org/ \N \N \N
129 PATO Phenotypic quality ontology Identifier http://www.bioontology.org/wiki/index.php/PATO:Main_Page \N \N \N
130 PATRIC PathoSystems Resource Integration Center at the Virginia Bioinformatics Institute Feature identifieer http://patric.vbi.vt.edu http://patric.vbi.vt.edu/gene/overview.php?fid=[example_id] http://patric.vbi.vt.edu/gene/overview.php?fid=cds.000002.436951 \N
131 PDB Protein Data Bank Identifier http://www.rcsb.org/pdb/ http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=[example_id] http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1A4U \N
132 Pfam Pfam: Protein families database of alignments and HMMs Accession number http://www.sanger.ac.uk/Software/Pfam/ http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?[example_id] http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00046 \N
133 PfamB Pfam-B supplement to Pfam Accession number http://www.sanger.ac.uk/Software/Pfam/ \N \N \N
134 PharmGKB_PA The Pharmacogenetics and Pharmacogenomics Knowledge Base \N http://www.pharmgkb.org http://www.pharmgkb.org/do/serve?objId=[example_id] http://www.pharmgkb.org/do/serve?objId=PA267 \N
135 PharmGKB_PGKB The Pharmacogenetics and Pharmacogenomics Knowledge Base \N http://www.pharmgkb.org http://www.pharmgkb.org/do/serve?objId=[example_id] http://www.pharmgkb.org/do/serve?objId=PA267 \N
136 PINC Proteome Inc.; represents GO annotations created in 2001 for NCBI and extracted into GOA from EntrezGene \N http://www.proteome.com/ \N \N \N
137 PIR Protein Information Resource Accession number http://pir.georgetown.edu/ http://pir.georgetown.edu/cgi-bin/pirwww/nbrfget?uid=[example_id] http://pir.georgetown.edu/cgi-bin/pirwww/nbrfget?uid=I49499 \N
138 PIRSF PIR Superfamily Classification System Identifier http://pir.georgetown.edu/pirsf/ http://pir.georgetown.edu/cgi-bin/ipcSF?id=[example_id] http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002327 \N
139 PMCID Pubmed Central Identifier http://www.pubmedcentral.nih.gov/ http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc&cmd=search&term=[example_id] http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc&cmd=search&term=PMC201377 \N
140 PMID PubMed Identifier http://www.ncbi.nlm.nih.gov/PubMed/ http://www.ncbi.nlm.nih.gov/pubmed/[example_id] http://www.ncbi.nlm.nih.gov/pubmed/4208797 \N
141 PO Plant Ontology Consortium Database Identifier http://www.plantontology.org/ http://www.plantontology.org/amigo/go.cgi?action=query&view=query&search_constraint=terms&query=PO:[example_id] http://www.plantontology.org/amigo/go.cgi?action=query&view=query&search_constraint=terms&query=PO:0009004 \N
142 POC Plant Ontology Consortium \N \N \N \N \N
143 Pompep Schizosaccharomyces pombe protein data Gene/protein identifier ftp://ftp.sanger.ac.uk/pub/yeast/pombe/Protein_data/ \N \N \N
144 PPI The Pseudomonas syringae community annotation project \N http://genome.pseudomonas-syringae.org/ \N \N \N
145 PRINTS PRINTS compendium of protein fingerprints Accession http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/ http://www.bioinf.manchester.ac.uk/cgi-bin/dbbrowser/sprint/searchprintss.cgi?display_opts=Prints&category=None&queryform=false®expr=off&prints_accn=[example_id] http://www.bioinf.manchester.ac.uk/cgi-bin/dbbrowser/sprint/searchprintss.cgi?display_opts=Prints&category=None&queryform=false®expr=off&prints_accn=PR00025 \N
146 ProDom ProDom protein domain families automatically generated from Swiss-Prot and TrEMBL Accession http://prodes.toulouse.inra.fr/prodom/current/html/home.php http://prodes.toulouse.inra.fr/prodom/current/cgi-bin/request.pl?question=DBEN&query=[example_id] http://prodes.toulouse.inra.fr/prodom/current/cgi-bin/request.pl?question=DBEN&query=PD000001 \N
147 Prosite Prosite. Database of protein families and domains Accession number http://www.expasy.ch/prosite/ http://www.expasy.ch/cgi-bin/prosite-search-ac?[example_id] http://www.expasy.ch/cgi-bin/prosite-search-ac?PS00365 \N
148 protein_id The protein identifier shared by DDBJ/EMBL-bank/GenBank nucleotide sequence databases Identifier \N \N \N \N
149 PROW Protein Reviews on the Web \N http://www.ncbi.nlm.nih.gov/prow/ \N \N \N
150 PseudoCAP Pseudomonas Genome Project Identifier http://v2.pseudomonas.com/ http://v2.pseudomonas.com/getAnnotation.do?locusID=[example_id] http://v2.pseudomonas.com/getAnnotation.do?locusID=PA4756 \N
151 PSI-MI Proteomic Standard Initiative for Molecular Interaction Interaction identifier http://psidev.sourceforge.net/mi/xml/doc/user/index.html \N \N \N
152 PSI-MOD Proteomics Standards Initiative protein modification ontology Protein modification identifier http://psidev.sourceforge.net/mod/ http://www.ebi.ac.uk/ontology-lookup/?termId=MOD:[example_id] http://www.ebi.ac.uk/ontology-lookup/?termId=MOD:00219 \N
153 PSORT PSORT protein subcellular localization databases and prediction tools for bacteria \N http://www.psort.org/ \N \N \N
154 pTARGET pTARGET Prediction server for protein subcellular localization \N http://bioinformatics.albany.edu/~ptarget/ \N \N \N
155 PubChem_BioAssay NCBI PubChem database of bioassay records Identifier http://pubchem.ncbi.nlm.nih.gov/ http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=[example_id] http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=177 \N
156 PubChem_Compound NCBI PubChem database of chemical structures Identifier http://pubchem.ncbi.nlm.nih.gov/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pccompound&term=[example_id] http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pccompound&term=2244 \N
157 PubChem_Substance NCBI PubChem database of chemical substances Identifier http://pubchem.ncbi.nlm.nih.gov/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pcsubstance&term=[example_id] http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pcsubstance&term=4594 \N
158 Reactome Reactome – a curated knowledgebase of biological pathways Identifier http://www.reactome.org/ http://www.reactome.org/cgi-bin/eventbrowser_st_id?ST_ID=[example_id] http://www.reactome.org/cgi-bin/eventbrowser_st_id?ST_ID=REACT_1240.1 \N
159 REBASE REBASE, The Restriction Enzyme Database Restriction enzyme name http://rebase.neb.com/rebase/rebase.html http://rebase.neb.com/rebase/enz/[example_id].html http://rebase.neb.com/rebase/enz/EcoRI.html \N
160 RefSeq RefSeq Identifier http://www.ncbi.nlm.nih.gov/RefSeq/ http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=[example_id] http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=XP_001068954 \N
161 RefSeq_NA RefSeq (Nucleic Acid) Identifier http://www.ncbi.nlm.nih.gov/RefSeq/ http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=[example_id] http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NC_000913 \N
162 RefSeq_Prot RefSeq (Protein) Identifier http://www.ncbi.nlm.nih.gov/RefSeq/ http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=[example_id] http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=YP_498627 \N
163 RESID RESID Database of Protein Modifications Identifier ftp://ftp.ncifcrf.gov/pub/users/residues/ \N \N \N
164 RGD Rat Genome Database Accession Number http://rgd.mcw.edu/ http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=[example_id] http://rgd.mcw.edu/generalSearch/RgdSearch.jsp?quickSearch=1&searchKeyword=2004 \N
165 RI Roslin Institute \N http://www.roslin.ac.uk/ \N \N \N
166 RNAmods The RNA Modification Database Identifier http://medlib.med.utah.edu/RNAmods/ http://medlib.med.utah.edu/cgi-bin/rnashow.cgi?[example_id] http://medlib.med.utah.edu/cgi-bin/rnashow.cgi?037 \N
167 Sanger The Wellcome Trust Sanger Institute \N http://www.sanger.ac.uk/ \N \N \N
168 SEED The SEED; The Project to Annotate the First 1000 Sequenced Genomes, Develop Detailed Metabolic Reconstructions, and Construct the Corresponding Stoichiometric Matrices Identifier http://www.theseed.org http://www.theseed.org/linkin.cgi?id=[example_id] http://www.theseed.org/linkin.cgi?id=fig|83331.1.peg.1 \N
169 SGD Saccharomyces Genome Database Identifier for SGD Loci http://www.yeastgenome.org/ http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=[example_id] http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=S000006169 \N
170 SGD_LOCUS Saccharomyces Genome Database Gene name (gene symbol in mammalian nomenclature) http://www.yeastgenome.org/ http://db.yeastgenome.org/cgi-bin/locus.pl?locus=[example_id] http://db.yeastgenome.org/cgi-bin/locus.pl?locus=GAL4 \N
171 SGD_REF Saccharomyces Genome Database Literature Reference Identifier http://www.yeastgenome.org/ http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=[example_id] http://db.yeastgenome.org/cgi-bin/reference/reference.pl?dbid=S000049602 \N
172 SGN Sol Genomics Network Gene identifier http://www.sgn.cornell.edu/ http://www.sgn.cornell.edu/phenome/locus_display.pl?locus_id=[example_id] http://www.sgn.cornell.edu/phenome/locus_display.pl?locus_id=4476 \N
173 SGN_ref Sol Genomics Network Reference identifier http://www.sgn.cornell.edu/ http://www.sgn.cornell.edu/chado/publication.pl?pub_id=[example_id] http://www.sgn.cornell.edu/chado/publication.pl?pub_id=861 \N
174 SMART Simple Modular Architecture Research Tool Accession http://smart.embl-heidelberg.de/ http://smart.embl-heidelberg.de/smart/do_annotation.pl?BLAST=DUMMY&DOMAIN=[example_id] http://smart.embl-heidelberg.de/smart/do_annotation.pl?BLAST=DUMMY&DOMAIN=SM00005 \N
175 SMD Stanford Microarray Database \N http://genome-www.stanford.edu/microarray \N \N \N
176 SO Sequence Ontology Identifier http://sequenceontology.org/ http://song.sourceforge.net/SOterm_tables.html#[example_id] http://song.sourceforge.net/SOterm_tables.html#SO:0000195 \N
177 SP_KW UniProt Knowledgebase keywords Identifier http://www.uniprot.org/keywords/ http://www.uniprot.org/keywords/[example_id] http://www.uniprot.org/keywords/KW-0812 \N
178 SP_SL UniProt Subcellular Location vocabulary Identifier http://beta.uniprot.org/docs/subcell \N \N \N
179 SUBTILIST Bacillus subtilis Genome Sequence Project Accession number http://genolist.pasteur.fr/SubtiList/ \N \N \N
180 SUBTILISTG Bacillus subtilis Genome Sequence Project Gene symbol http://genolist.pasteur.fr/SubtiList/ \N \N \N
181 SUPERFAMILY SUPERFAMILY. A database of structural and functional protein annotations for completely sequenced genomes \N http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/ \N \N \N
182 Swiss-Prot UniProtKB/Swiss-Prot, a curated protein sequence database which provides a high level of annotation and a minimal level of redundancy Accession number http://www.uniprot.org http://www.ebi.uniprot.org/entry/[example_id] http://www.ebi.uniprot.org/entry/P51587 \N
183 TAIR The Arabidopsis Information Resource Accession number http://www.arabidopsis.org/ http://arabidopsis.org/servlets/TairObject?accession=[example_id] http://arabidopsis.org/servlets/TairObject?accession=gene:2062713 \N
184 taxon NCBI Taxonomy Identifier http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/ http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=[example_id] http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=3702 \N
185 TC The Transport Protein Database Identifier http://www.tcdb.org/ http://www.tcdb.org/tcdb/index.php?tc=[example_id] http://www.tcdb.org/tcdb/index.php?tc=9.A.4.1.1 \N
186 TGD Tetrahymena Genome Database \N http://www.ciliate.org/ \N \N \N
187 TGD_LOCUS Tetrahymena Genome Database Gene name (gene symbol in mammalian nomenclature) http://www.ciliate.org/ http://db.ciliate.org/cgi-bin/locus.pl?locus=[example_id] http://db.ciliate.org/cgi-bin/locus.pl?locus=PDD1 \N
188 TGD_REF Tetrahymena Genome Database Literature Reference Identifier http://www.ciliate.org/ http://db.ciliate.org/cgi-bin/reference/reference.pl?dbid=[example_id] http://db.ciliate.org/cgi-bin/reference/reference.pl?dbid=T000005818 \N
189 TRAIT TRAnscript Integrated Table, an integrated database of transcripts expressed in human skeletal muscle \N http://muscle.cribi.unipd.it/ \N \N \N
190 TRANSFAC TRANSFAC database of eukaryotic transcription factors \N http://www.gene-regulation.com/pub/databases.html#transfac \N \N \N
191 TrEMBL UniProtKB-TrEMBL, a computer-annotated protein sequence database supplementing UniProtKB and containing the translations of all coding sequences (CDS) present in the EMBL Nucleotide Sequence Database but not yet integrated in UniProtKB/Swiss-Prot Accession number http://www.uniprot.org http://www.ebi.uniprot.org/entry/[example_id] http://www.ebi.uniprot.org/entry/O31124 \N
192 UM-BBD The University of Minnesota Biocatalysis/Biodegradation Database Prefix http://umbbd.msi.umn.edu/ \N \N \N
193 UM-BBD_enzymeID The University of Minnesota Biocatalysis/Biodegradation Database Enzyme identifier http://umbbd.msi.umn.edu/ http://umbbd.msi.umn.edu/servlets/pageservlet?ptype=ep&enzymeID=[example_id] http://umbbd.msi.umn.edu/servlets/pageservlet?ptype=ep&enzymeID=e0230 \N
194 UM-BBD_pathwayID The University of Minnesota Biocatalysis/Biodegradation Database Pathway identifier http://umbbd.msi.umn.edu/ \N http://umbbd.msi.umn.edu/acr/acr_map.html \N
195 UM-BBD_reactionID The University of Minnesota Biocatalysis/Biodegradation Database Reaction identifier http://umbbd.msi.umn.edu/ http://umbbd.msi.umn.edu/servlets/pageservlet?ptype=r&reacID=[example_id] http://umbbd.msi.umn.edu/servlets/pageservlet?ptype=r&reacID=r0129 \N
196 UM-BBD_ruleID The University of Minnesota Biocatalysis/Biodegradation Database Rule identifier http://umbbd.msi.umn.edu/ http://umbbd.msi.umn.edu/servlets/rule.jsp?rule=[example_id] http://umbbd.msi.umn.edu/servlets/rule.jsp?rule=bt0330 \N
197 UniParc UniProt Archive; a non-redundant archive of protein sequences extracted from Swiss-Prot, TrEMBL, PIR-PSD, EMBL, Ensembl, IPI, PDB, RefSeq, FlyBase, WormBase, European Patent Office, United States Patent and Trademark Office, and Japanese Patent Office Accession number http://www.ebi.ac.uk/uniparc/ http://www.ebi.ac.uk/cgi-bin/dbfetch?db=uniparc&id=[example_id] http://www.ebi.ac.uk/cgi-bin/dbfetch?db=uniparc&id=UPI000000000A \N
198 UniProtKB The Universal Protein Knowledgebase, a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR Accession number http://www.uniprot.org http://www.ebi.uniprot.org/entry/[example_id] http://www.ebi.uniprot.org/entry/P51587 \N
199 VBRC Viral Bioinformatics Resource Center Identifier http://vbrc.org http://vbrc.org/query.asp?web_id=VBRC:[example_id] http://vbrc.org/query.asp?web_id=VBRC:F35742 \N
200 VEGA The Vertebrate Genome Annotation database Identifier http://vega.sanger.ac.uk/index.html http://vega.sanger.ac.uk/perl/searchview?species=all&idx=All&q=[example_id] http://vega.sanger.ac.uk/perl/searchview?species=all&idx=All&q=OTTHUMP00000000661 \N
201 VIDA Virus Database at University College London \N http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html \N \N \N
202 VMD Virginia Bioinformatics Institute Microbial Database Gene identifier http://phytophthora.vbi.vt.edu http://vmd.vbi.vt.edu/cgi-bin/browse/browserDetail_new.cgi?gene_id=[example_id] http://vmd.vbi.vt.edu/cgi-bin/browse/browserDetail_new.cgi?gene_id=109198 \N
203 WB WormBase, database of nematode biology Gene identifier http://www.wormbase.org/ http://www.wormbase.org/db/gene/gene?name=[example_id] http://www.wormbase.org/db/get?class=Gene;name=WBGene00003001 \N
204 WB_REF WormBase, database of nematode biology Literature Reference Identifier http://www.wormbase.org/ http://www.wormbase.org/db/misc/paper?name=[example_id] http://www.wormbase.org/db/misc/paper?name=WBPaper00004823 \N
205 WP Wormpep, database of proteins of C. elegans Identifier http://www.wormbase.org/ http://www.wormbase.org/db/get?class=Protein;name=WP:[example_id] http://www.wormbase.org/db/get?class=Protein;name=WP:CE15104 \N
206 YeastFunc Yeast Function \N http://func.med.harvard.edu/yeast/ \N \N \N
207 ZFIN The Zebrafish Information Network Accession ID http://zfin.org/ http://zfin.org/cgi-bin/ZFIN_jump?record=[example_id] http://zfin.org/cgi-bin/ZFIN_jump?record=ZDB-GENE-990415-103 \N
208 FlyBase \N \N \N \N \N \N
209 Roslin_Institute \N \N \N \N \N \N
210 UniProt \N \N \N \N \N \N
211 TIGR \N \N \N \N \N \N
212 WormBase \N \N \N \N \N \N
bioinformatics database1
bioinformatics database1:
1 AgBase AgBase resource for functional analysis of agricultural plant and animal gene products \N http://www.agbase.msstate.edu/ http://www.agbase.msstate.edu/cgi-bin/getEntry.pl?db_pick=[ChickGO/MaizeGO]&uid=[example_id] \N \N
2 AGI_LocusCode Arabidopsis Genome Initiative (TAIR, TIGR, MIPS) Locus identifier http://www.arabidopsis.org http://arabidopsis.org/servlets/TairObject?type=locus&name=[example_id] http://arabidopsis.org/servlets/TairObject?type=locus&name=At2g17950 \N
3 AGRICOLA_ID AGRICultural OnLine Access AGRICOLA call number http://agricola.nal.usda.gov/ \N \N \N
4 AGRICOLA_IND AGRICultural OnLine Access AGRICOLA IND number http://agricola.nal.usda.gov/ \N \N \N
5 ApiDB_PlasmoDB PlasmoDB Plasmodium Genome Resource PlasmoDB Gene ID http://plasmodb.org/ http://www.plasmodb.org/gene/[example_id] http://www.plasmodb.org/gene/PF11_0344 \N
6 AraCyc AraCyc metabolic pathway database for Arabidopsis thaliana Identifier http://www.arabidopsis.org/biocyc/index.jsp http://www.arabidopsis.org:1555/ARA/NEW-IMAGE?type=NIL&object=[example_id] http://www.arabidopsis.org:1555/ARA/NEW-IMAGE?type=NIL&object=PWYQT-62 \N
7 ASAP A Systematic Annotation Package for Community Analysis of Genomes Feature identifier https://asap.ahabs.wisc.edu/annotation/php/ASAP1.htm https://asap.ahabs.wisc.edu/annotation/php/feature_info.php?FeatureID=[example_id] https://asap.ahabs.wisc.edu/annotation/php/feature_info.php?FeatureID=ABE-0000008 \N
8 BHF-UCL Cardiovascular Gene Ontology Annotation Initiative; supported by the British Heart Foundation (BHF) at University College London (UCL) \N http://www.cardiovasculargeneontology.com \N \N \N
9 BIOMD BioModels Database Accession http://www.ebi.ac.uk/biomodels/ http://www.ebi.ac.uk/compneur-srv/biomodels-main/publ-model.do?mid=[example_id] http://www.ebi.ac.uk/compneur-srv/biomodels-main/publ-model.do?mid=BIOMD0000000045 \N
10 bioPIXIE_MEFIT biological Process Inference from eXperimental Interaction Evidence/Microarray Experiment Functional Integration Technology \N http://avis.princeton.edu/mefit/ \N \N \N
11 BIOSIS BIOSIS previews Identifier http://www.biosis.org/ \N \N \N
12 BRENDA BRENDA, The Comprehensive Enzyme Information System EC enzyme identifier http://www.brenda.uni-koeln.de/ http://www.brenda.uni-koeln.de/php/result_flat.php4?ecno=[example_id] http://www.brenda.uni-koeln.de/php/result_flat.php4?ecno=4.2.1.3 \N
13 Broad Broad Institute \N http://www.broad.mit.edu/ \N \N \N
14 Broad_MGG Magnaporthe grisea Database at the Broad Institute Locus http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/Home.html http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/GeneLocus.html?sp=S[example_id] http://www.broad.mit.edu/annotation/genome/magnaporthe_grisea/GeneLocus.html?sp=SMGG_05132 \N
15 CASGEN Catalog of Fishes genus database Identifier http://research.calacademy.org/research/ichthyology/catalog/fishcatsearch.html http://research.calacademy.org/research/ichthyology/catalog/getname.asp?rank=Genus&id=[example_id] http://research.calacademy.org/research/ichthyology/catalog/getname.asp?rank=Genus&id=1040 \N
16 CASSPC Catalog of Fishes species database Identifier http://research.calacademy.org/research/ichthyology/catalog/fishcatsearch.html http://research.calacademy.org/research/ichthyology/catalog/getname.asp?rank=Species&id=[example_id] http://research.calacademy.org/research/ichthyology/catalog/getname.asp?rank=Species&id=1979 \N
17 CBS Center for Biological Sequence Analysis prediction tool http://www.cbs.dtu.dk/ \N http://www.cbs.dtu.dk/services/[example_id]/ \N
18 CDD Conserved Domain Database at NCBI Identifier http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=[example_id] http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=34222 \N
19 CGD Candida Genome Database Identifier for CGD Loci http://www.candidagenome.org/ http://www.candidagenome.org/cgi-bin/locus.pl?dbid=[example_id] http://www.candidagenome.org/cgi-bin/locus.pl?dbid=CAL0005516 \N
20 CGD_LOCUS Candida Genome Database Gene name (gene symbol in mammalian nomenclature) http://www.candidagenome.org/ http://www.candidagenome.org/cgi-bin/locus.pl?locus=[example_id] http://www.candidagenome.org/cgi-bin/locus.pl?locus=HWP1 \N
21 CGD_REF Candida Genome Database Literature Reference Identifier http://www.candidagenome.org/ http://www.candidagenome.org/cgi-bin/reference/reference.pl?refNo=[example_id] http://www.candidagenome.org/cgi-bin/reference/reference.pl?refNo=1490 \N
22 CGEN Compugen Gene Ontology Gene Association Data Identifier http://www.cgen.com/ \N \N \N
23 CGSC CGSC: E.coli Genetic Stock Center Gene symbol http://cgsc.biology.yale.edu/ \N http://cgsc.biology.yale.edu/Site.php?ID=315 \N
24 ChEBI Chemical Entities of Biological Interest Identifier http://www.ebi.ac.uk/chebi/ http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:[example_id] http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:17234 \N
25 CL Cell Type Ontology Identifier https://lists.sourceforge.net/lists/listinfo/obo-cell-type \N \N \N
26 COG NCBI Clusters of Orthologous Groups \N http://www.ncbi.nlm.nih.gov/COG/ \N \N \N
27 COG_Cluster NCBI COG cluster Identifier http://www.ncbi.nlm.nih.gov/COG/ http://www.ncbi.nlm.nih.gov/COG/new/release/cow.cgi?cog=[example_id] http://www.ncbi.nlm.nih.gov/COG/new/release/cow.cgi?cog=COG0001 \N
28 COG_Function NCBI COG function Identifier http://www.ncbi.nlm.nih.gov/COG/ http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi?fun=[example_id] http://www.ncbi.nlm.nih.gov/COG/grace/shokog.cgi?fun=H \N
29 COG_Pathway NCBI COG pathway Identifier http://www.ncbi.nlm.nih.gov/COG/ http://www.ncbi.nlm.nih.gov/COG/new/release/coglist.cgi?pathw=[example_id] http://www.ncbi.nlm.nih.gov/COG/new/release/coglist.cgi?pathw=14 \N
30 CORUM CORUM – the Comprehensive Resource of Mammalian protein complexes Identifier http://mips.gsf.de/genre/proj/corum/ http://mips.gsf.de/genre/proj/corum/complexdetails.html?id=[example_id] http://mips.gsf.de/genre/proj/corum/complexdetails.html?id=837 \N
31 dictyBase dictyBase Identifier http://dictybase.org http://dictybase.org/db/cgi-bin/gene_page.pl?dictybaseid=[example_id] http://dictybase.org/db/cgi-bin/gene_page.pl?dictybaseid=DDB0001836 \N
32 dictyBase_gene_name dictyBase Gene name http://dictybase.org http://dictybase.org/db/cgi-bin/gene_page.pl?gene_name=[example_id] http://dictybase.org/db/cgi-bin/gene_page.pl?gene_name=mlcE \N
33 dictyBase_REF dictyBase literature references Literature Reference Identifier http://dictybase.org http://dictybase.org/db/cgi-bin/dictyBase/reference/reference.pl?refNo=[example_id] http://dictybase.org/db/cgi-bin/dictyBase/reference/reference.pl?refNo=10157 \N
34 DOI Digital Object Identifier Identifier http://dx.doi.org/ http://dx.doi.org/[example_id] http://dx.doi.org/DOI:10.1016/S0963-9969(99)00021-6 \N
35 EC The Enzyme Commission \N http://ww.expasy.org/enzyme/ http://www.expasy.org/enzyme/[example_id] http://www.expasy.org/enzyme/1.4.3.6 \N
36 EchoBASE EchoBASE post-genomic database for Escherichia coli Identifier http://www.ecoli-york.org/ http://www.biolws1.york.ac.uk/echobase/Gene.cfm?recordID=[example_id] http://www.biolws1.york.ac.uk/echobase/Gene.cfm?recordID=EB0231 \N
37 ECK The EcoGene Database of Escherichia coli Sequence and Function ECK accession number (E. coli K-12 gene identifier) http://www.ecogene.org/ http://www.ecogene.org/geneInfo.php?eck_id=[example_id] http://www.ecogene.org/geneInfo.php?eck_id=ECK3746 \N
38 EcoCyc The Encyclopedia of E. coli metabolism Pathway identifier http://ecocyc.org/ http://biocyc.org/ECOLI/NEW-IMAGE?type=PATHWAY&object=[example_id] http://biocyc.org/ECOLI/NEW-IMAGE?type=PATHWAY&object=P2-PWY \N
39 EcoCyc_REF The Encyclopedia of E. coli metabolism Reference identifier http://ecocyc.org/ http://biocyc.org/ECOLI/reference.html?type=CITATION-FRAME&object=[example_id] http://biocyc.org/ECOLI/reference.html?type=CITATION-FRAME&object=COLISALII \N
40 ECOGENE The EcoGene Database of Escherichia coli Sequence and Function EcoGene Accession Number http://www.ecogene.org/ http://www.ecogene.org/geneInfo.php?eg_id=[example_id] http://www.ecogene.org/geneInfo.php?eg_id=EG10818 \N
41 ECOGENE_G The EcoGene Database of Escherichia coli Sequence and Function EcoGene Primary Gene Name http://www.ecogene.org/ \N \N \N
42 EcoliWiki EcoliWiki, EcoliHub’s subsystem for community annotation of E. coli K-12 \N http://ecoliwiki.net/ \N \N \N
43 EMBL International Nucleotide Sequence Database Collaboration, comprising EMBL-EBI International Nucleotide Sequence Data Library (EMBL-Bank), DNA DataBank of Japan (DDBJ), and NCBI GenBank Sequence accession number http://www.ebi.ac.uk/embl/ http://www.ebi.ac.uk/cgi-bin/emblfetch?style=html&Submit=Go&id=[example_id] http://www.ebi.ac.uk/cgi-bin/emblfetch?style=html&Submit=Go&id=AA816246 \N
44 DDBJ International Nucleotide Sequence Database Collaboration, comprising EMBL-EBI International Nucleotide Sequence Data Library (EMBL-Bank), DNA DataBank of Japan (DDBJ), and NCBI GenBank Sequence accession number http://www.ddbj.nig.ac.jp/ http://arsa.ddbj.nig.ac.jp/arsa/ddbjSplSearch?KeyWord=[example_id] http://arsa.ddbj.nig.ac.jp/arsa/ddbjSplSearch?KeyWord=AA816246 \N
45 GenBank International Nucleotide Sequence Database Collaboration, comprising EMBL-EBI International Nucleotide Sequence Data Library (EMBL-Bank), DNA DataBank of Japan (DDBJ), and NCBI GenBank Sequence accession number http://www.ncbi.nlm.nih.gov/Genbank/ http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=[example_id] http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=AA816246 \N
46 ENSEMBL Database of automatically annotated genomic data Identifier http://www.ensembl.org/ http://www.ensembl.org/perl/protview?peptide=[example_id] http://www.ensembl.org/perl/protview?peptide=ENSP00000265949 \N
47 ENZYME The Swiss Institute of Bioinformatics database of Enzymes Identifier http://www.expasy.ch/ http://www.expasy.ch/cgi-bin/nicezyme.pl?[example_id] http://www.expasy.ch/cgi-bin/nicezyme.pl?1.1.1.1 \N
48 FB FlyBase Gene identifier http://flybase.org/ http://flybase.org/reports/[example_id].html http://flybase.org/reports/FBgn0000024.html \N
49 GDB Human Genome Database Accession number http://www.gdb.org/ http://www.gdb.org/gdb-bin/genera/accno?accessionNum=GDB:[example_id] http://www.gdb.org/gdb-bin/genera/accno?accessionNum=GDB:306600 \N
50 Gene3D Domain Architecture Classification \N http://gene3d.biochem.ucl.ac.uk/Gene3D/ \N \N \N
51 GeneDB_Gmorsitans GeneDB_Gmorsitans Gene identifier http://www.genedb.org/genedb/glossina/ http://www.genedb.org/genedb/Search?organism=glossina&name=[example_id] http://www.genedb.org/genedb/Search?organism=glossina&name=Gmm-0142 \N
52 GeneDB_Lmajor GeneDB_Lmajor Gene identifier http://www.genedb.org/genedb/leish/ http://www.genedb.org/genedb/Search?organism=leish&name=[example_id] http://www.genedb.org/genedb/Search?organism=leish&name=LM5.32 \N
53 GeneDB_Pfalciparum GeneDB_Pfalciparum Gene identifier http://www.genedb.org/genedb/malaria/ http://www.genedb.org/genedb/Search?organism=malaria&name=[example_id] http://www.genedb.org/genedb/Search?organism=malaria&name=PFD0755c \N
54 GeneDB_Spombe GeneDB_Spombe Gene identifier http://www.genedb.org/genedb/pombe/ http://www.genedb.org/genedb/Search?organism=pombe&name=[example_id] http://www.genedb.org/genedb/Search?organism=pombe&name=SPAC890.04C \N
55 GeneDB_Tbrucei GeneDB_Tbrucei Gene identifier http://www.genedb.org/genedb/tryp/ http://www.genedb.org/genedb/Search?organism=tryp&name=[example_id] http://www.genedb.org/genedb/Search?organism=tryp&name=Tb927.1.5250 \N
56 GenProtEC GenProtEC E. coli genome and proteome database \N http://genprotec.mbl.edu/ \N \N \N
57 GermOnline GermOnline \N http://www.germonline.org/ \N \N \N
58 GO Gene Ontology Database Identifier http://amigo.geneontology.org/ http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:[example_id] http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0004352 \N
59 GO_REF Gene Ontology Database references Accession (for reference) http://www.geneontology.org/ http://www.geneontology.org/cgi-bin/references.cgi#GO_REF:[example_id] http://www.geneontology.org/cgi-bin/references.cgi#GO_REF:0000001 \N
60 GOA GO Annotation at EBI \N http://www.ebi.ac.uk/goa/ \N \N \N
61 GOC Gene Ontology Consortium \N http://www.geneontology.org/ \N \N \N
62 GR Gramene: A Comparative Mapping Resource for Grains Identifier (any) http://www.gramene.org/ http://www.gramene.org/db/searches/browser?search_type=All&RGN=on&query=[example_id] http://www.gramene.org/db/searches/browser?search_type=All&RGN=on&query=sd1 \N
63 GR_GENE Gramene: A Comparative Mapping Resource for Grains Gene identifier http://www.gramene.org/ http://www.gramene.org/db/genes/search_gene?acc=[example_id] http://www.gramene.org/db/genes/search_gene?acc=GR:0060198 \N
64 GR_PROTEIN Gramene: A Comparative Mapping Resource for Grains Protein identifier http://www.gramene.org/ http://www.gramene.org/db/protein/protein_search?acc=[example_id] http://www.gramene.org/db/protein/protein_search?acc=Q6VSV0 \N
65 GR_QTL Gramene: A Comparative Mapping Resource for Grains QTL identifier http://www.gramene.org/ http://www.gramene.org/db/qtl/qtl_display?qtl_accession_id=[example_id] http://www.gramene.org/db/qtl/qtl_display?qtl_accession_id=CQU7 \N
66 GR_REF Gramene: A Comparative Mapping Resource for Grains Reference http://www.gramene.org/ http://www.gramene.org/db/literature/pub_search?ref_id=[example_id] http://www.gramene.org/db/literature/pub_search?ref_id=659 \N
67 H-invDB H-invitational Database \N http://www.h-invitational.jp/ \N \N \N
68 H-invDB_cDNA H-invitational Database Accession http://www.h-invitational.jp/ http://www.h-invitational.jp/hinv/spsoup/transcript_view?acc_id=[example_id] http://www.h-invitational.jp/hinv/spsoup/transcript_view?acc_id=AK093149 \N
69 H-invDB_locus H-invitational Database Cluster identifier http://www.h-invitational.jp/ http://www.h-invitational.jp/hinv/spsoup/locus_view?hix_id=[example_id] http://www.h-invitational.jp/hinv/spsoup/locus_view?hix_id=HIX0014446 \N
70 HAMAP High-quality Automated and Manual Annotation of microbial Proteomes Identifier http://us.expasy.org/sprot/hamap/ http://us.expasy.org/unirules/[example_id] http://us.expasy.org/unirules/MF_00031 \N
71 HGNC HUGO Gene Nomenclature Committee Identifier http://www.genenames.org/ http://www.genenames.org/data/hgnc_data.php?hgnc_id=HGNC:[example_id] http://www.genenames.org/data/hgnc_data.php?hgnc_id=HGNC:29 \N
72 HGNC_gene HUGO Gene Nomenclature Committee Gene symbol http://www.genenames.org/ http://www.genenames.org/data/hgnc_data.php?app_sym=[example_id] http://www.genenames.org/data/hgnc_data.php?app_sym=ABCA1 \N
73 HPA Human Protein Atlas tissue profile information Identifier http://www.proteinatlas.org/ http://www.proteinatlas.org/tissue_profile.php?antibody_id=[example_id] http://www.proteinatlas.org/tissue_profile.php?antibody_id=HPA000237 \N
74 HPA_antibody Human Protein Atlas antibody information Identifier http://www.proteinatlas.org/ http://www.proteinatlas.org/antibody_info.php?antibody_id=[example_id] http://www.proteinatlas.org/antibody_info.php?antibody_id=HPA000237 \N
75 HUGO Human Genome Organisation \N http://www.hugo-international.org/ \N \N \N
76 IMG Integrated Microbial Genomes; JGI web site for genome annotation Identifier http://img.jgi.doe.gov http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=GeneDetail&page=geneDetail&gene_oid=[example_id] http://img.jgi.doe.gov/cgi-bin/pub/main.cgi?section=GeneDetail&page=geneDetail&gene_oid=640008772 \N
77 IMGT_HLA Immunogenetics database, human MHC \N http://www.ebi.ac.uk/imgt/hla \N \N \N
78 IMGT_LIGM Immunogenetics database, immunoglobulins and T-cell receptors \N http://imgt.cines.fr \N \N \N
79 IntAct IntAct protein interaction database Accession http://www.ebi.ac.uk/intact/ http://www.ebi.ac.uk/intact/search/do/search?searchString=[example_id] http://www.ebi.ac.uk/intact/search/do/search?searchString=EBI-17086 \N
80 InterPro The InterPro database of protein domains and motifs Identifier http://www.ebi.ac.uk/interpro/ http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=[example_id] http://www.ebi.ac.uk/interpro/DisplayIproEntry?ac=IPR000001 \N
81 IPI International Protein Index Identifier http://www.ebi.ac.uk/IPI/IPIhelp.html \N \N \N
82 ISBN International Standard Book Number Identifier http://isbntools.com/ http://my.linkbaton.com/get?lbCC=q&nC=q&genre=book&item=[example_id] http://my.linkbaton.com/get?lbCC=q&nC=q&genre=book&item=0781702534 \N
83 ISSN International Standard Serial Number Identifier http://www.issn.org/ \N \N \N
84 IUPHAR The International Union of Pharmacology \N http://www.iuphar.org/ \N \N \N
85 IUPHAR_GPCR The International Union of Pharmacology G-protein-coupled receptor family identifier http://www.iuphar.org/ http://www.iuphar-db.org/GPCR/ChapterMenuForward?chapterID=[example_id] http://www.iuphar-db.org/GPCR/ChapterMenuForward?chapterID=1279 \N
86 IUPHAR_RECEPTOR The International Union of Pharmacology Receptor identifier http://www.iuphar.org/ http://www.iuphar-db.org/GPCR/ReceptorDisplayForward?receptorID=[example_id] http://www.iuphar-db.org/GPCR/ReceptorDisplayForward?receptorID=2205 \N
87 JCVI The J. Craig Venter Institute \N http://www.jcvi.org/ \N \N \N
88 JCVI_Ath1 The J. Craig Venter Institute, Arabidopsis thaliana database Accession http://www.tigr.org/tdb/e2k1/ath1/ath1.shtml http://www.tigr.org/tigr-scripts/euk_manatee/shared/ORF_infopage.cgi?db=ath1&orf=[example_id] http://www.tigr.org/tigr-scripts/euk_manatee/shared/ORF_infopage.cgi?db=ath1&orf=At3g01440 \N
89 JCVI_CMR The J. Craig Venter Institute, Comprehensive Microbial Resource Locus http://cmr.jcvi.org/ http://cmr.jcvi.org/cgi-bin/CMR/shared/GenePage.cgi?locus=[example_id] http://cmr.jcvi.org/cgi-bin/CMR/shared/GenePage.cgi?locus=VCA0557 \N
90 JCVI_EGAD The J. Craig Venter Institute, EGAD database Accession http://cmr.jcvi.org/ http://cmr.jcvi.org/cgi-bin/CMR/EgadSearch.cgi?search_string=[example_id] http://cmr.jcvi.org/cgi-bin/CMR/EgadSearch.cgi?search_string=74462 \N
91 JCVI_GenProp The J. Craig Venter Institute, Genome Properties Accession http://cmr.jcvi.org/ http://cmr.jcvi.org/cgi-bin/CMR/shared/GenomePropDefinition.cgi?prop_acc=[example_id] http://cmr.jcvi.org/cgi-bin/CMR/shared/GenomePropDefinition.cgi?prop_acc=GenProp0120 \N
92 JCVI_Pfa1 The J. Craig Venter Institute, Plasmodium falciparum database Accession http://www.tigr.org/tdb/e2k1/pfa1/pfa1.shtml http://www.tigr.org/tigr-scripts/euk_manatee/shared/ORF_infopage.cgi?db=pfa1&orf=[example_id] http://www.tigr.org/tigr-scripts/euk_manatee/shared/ORF_infopage.cgi?db=pfa1&orf=PFB0010w \N
93 JCVI_REF The J. Craig Venter Institute Reference locator http://cmr.jcvi.org/ \N http://cmr.jcvi.org/CMR/AnnotationSops.shtml \N
94 JCVI_Tba1 The J. Craig Venter Institute, Trypanosoma brucei database Accession http://www.tigr.org/tdb/e2k1/tba1/ http://www.tigr.org/tigr-scripts/euk_manatee/shared/ORF_infopage.cgi?db=tba1&orf=[example_id] http://www.tigr.org/tigr-scripts/euk_manatee/shared/ORF_infopage.cgi?db=tba1&orf=25N14.10 \N
95 JCVI_TIGRFAMS The J. Craig Venter Institute, TIGRFAMs HMM collection Accession http://cmr.jcvi.org/ http://cmr.jcvi.org/cgi-bin/CMR/HmmReport.cgi?hmm_acc=[example_id] http://cmr.jcvi.org/cgi-bin/CMR/HmmReport.cgi?hmm_acc=TIGR00254 \N
96 KEGG Kyoto Encyclopedia of Genes and Genomes \N http://www.genome.ad.jp/kegg/ \N \N \N
97 KEGG_PATHWAY KEGG Pathways Database Pathway http://www.genome.ad.jp/kegg/docs/upd_pathway.html http://www.genome.ad.jp/dbget-bin/www_bget?path:[example_id] http://www.genome.ad.jp/dbget-bin/www_bget?path:ot00020 \N
98 KEGG_LIGAND KEGG LIGAND Database Compound http://www.genome.ad.jp/kegg/docs/upd_ligand.html http://www.genome.ad.jp/dbget-bin/www_bget?cpd:[example_id] http://www.genome.ad.jp/dbget-bin/www_bget?cpd:C00577 \N
99 LIFEdb LIFEdb, a database for the integration and dissemination of functional data cDNA clone identifier http://www.lifedb.de/ http://www.dkfz.de/LIFEdb/LIFEdb.aspx?ID=[example_id] http://www.dkfz.de/LIFEdb/LIFEdb.aspx?ID=DKFZp564O1716 \N
100 LOCSVMpsi LOCSVMPSI: subcellular localization for eukayotic proteins based on SVM and PSI-BLAST \N http://bioinformatics.ustc.edu.cn/locsvmpsi/locsvmpsi.php \N \N \N
