busco-usage

BUSCO 安装

  1. BUSCO使用需要依赖软件:

    • python3;

    • hmmer;

    • NCBI BLAST+;

    • Augustus 3.0.x (genome only);

      • boost

      • zlib

      • bamtools (编译需要cmake)

    • EMBOSS tools 6.x.x

安装相关软件并设置环境变量:

python3

wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz
tar zxvf Python-3.5.2.tgz
cd Python-3.5.2
./configure --prefix=$SFTW/app/python35
make -j8 && make install
echo 'export PATH=SSFTW/app/python35/bin:$PATH' >> ~/.zshrc && source ~/.zshrc #addpath

hmmer:

wget http://eddylab.org/software/hmmer3/3.1b2/hmmer-3.1b2.tar.gz
tar zxvf hmmer-3.1b2.tar.gz
cd hmmer-3.1b2
./configure --prefix=$SFTW/app/hmmer
make -j8 && make install
addpath

boost:

wget https://sourceforge.net/projects/boost/files/boost/1.61.0/boost_1_61_0.tar.gz
tar zxvf boost_1_61_0.tar.gz
cd boost_1_61_0
./bootstrap.sh --prefix=$SFTW/app/boost
./b2
./bjam install
添加环境变量到 ~/.zshrc
export BOOST_INCLUDE=$SFTW/app/boost/include
export BOOST_LIB=$SFTW/app/boost/lib

zlib:

wget http://zlib.net/zlib-1.2.8.tar.gz
tar zxvf zlib-1.2.8.tar.gz
cd zlib-1.2.8
./configure --prefix=/lustre/home/jiangff/software/app/zlib
make -j8 && make install
cp -r /lustre/home/jiangff/software/app/zlib/* $SFTW 
#$sftw主目录下的include 和lib已经添加环境变量 所以直接复制过去

cmake:

wget https://cmake.org/files/v3.6/cmake-3.6.1.tar.gz --no-check-certificate
tar zxvf |cd 
./configure --prefix=$SFTW/app/cmake
make -j8 && make install
cd bin|addpath

bamtools:

wget https://codeload.github.com/pezmaster31/bamtools/zip/master
mv master bamtools.zip
unzip bamtools.zip
cd bamtools-master
mkdir build
cd build
cmake ..
make -j8
cd ..
cp -r bin/ include/ lib/ $SFTW

Augustus:

wget http://bioinf.uni-greifswald.de/augustus/binaries/augustus.current.tar.gz
tar | cd
make -j8
cd ..
mv augustus-3.2.2/ /lustre/home/jiangff/software/app/augustus
addpath: $SFTW/app/augustus/bin | scripts
#在~/.zshrc 加上:
export AUGUSTUS_CONFIG_PATH=$SFTW/app/augustus/config #不加这个不行!

EMBOSS tools:

wget http://debian.rub.de/ubuntu/pool/universe/e/emboss/emboss_6.6.0.orig.tar.gz
tar |cd 
./configure --prefix...
make && make install
addpath

BUSCO

wget http://busco.ezlab.org/files/BUSCO_v1.2.tar.gz
tar zxvf BUSCO_v1.2.tar.gz
#解压即可用
mv to $SFTW/app/BUSCO
#在~/.zshrc添加:
alias BUSCO="python /lustre/home/jiangff/software/app/BUSCO/BUSCO_v1.2.py"

#下载资料库(以真核资料库为例):
    wget http://busco.ezlab.org/files/eukaryota_buscos.tar.gz
    tar zxvf eukaryota_buscos.tar.gz
    mv eukaryota $SFTW/app/BUSCO

BUSCO使用

BUSCO quick start

  • Genome assembly assessment:

    shell python BUSCO_v1.1b.py -o NAME -in ASSEMBLY -l LINEAGE –m genome

NAME name to use for the run and all temporary files ASSEMBLY genome assembly file in fasta format LINEAGE path to the BUSCO lineage data (i.e. /path/to/lineage)

  • ­Gene set assessment:

    shell python BUSCO_v1.1b.py -o NAME -in GENE_SET -l LINEAGE -m OGS

  • ­Transcriptome assessment:

``` python BUSCO_v1.1b.py -o NAME -in TRANSCRIPTOME -l LINEAGE -m trans ```

BUSCO使用

  1. Genome assembly assessment:

python BUSCO_v1.1b.py -o NAME -in ASSEMBLY -l LINEAGE –m genome

#NAME      name to use for the run and all temporary files
#ASSEMBLY  genome assembly file in fasta format
#LINEAGE   path to the BUSCO lineage data

  1. Gene set assessment:

python BUSCO_v1.1b.py -o NAME -in GENE_SET -l LINEAGE -m OGS

#NAME       name to use for the run and temporary files
#GENE_SET   gene set protein sequence file in fasta format
#Lineage    path to the BUSCO lineage data (i.e. /path/to/lineage)

  1. Transcriptome assessment:

python BUSCO_v1.1b.py -o NAME -in TRANSCRIPTOME -l LINEAGE -m trans

#NAME           name to use for the run and temporary files
#TRANSCRIPTOME  transcript set sequence file in fasta format
#LINEAGE        path to the BUSCO lineage data (i.e. /path/to/lineage)

BUSCO options

python BUSCO_v1.1b.py -in INPUT -o OUTPUT -l LINEAGE -m MODE

[Options]

1-­ Mandatory arguments

-o name Name used for naming output files -in input_file Genome assembly / gene set / transcriptome in fasta format -l lineage Path to the BUSCO lineage data to be used Example: -­l /path/to/metazoa -m mode Mode of analysis Valid options: genome, ogs, trans Default: genome

2-­ Optional arguments

-h –help Print help -c integer Number of CPU threads to be used Default: 1 -sp species Select from the pre-­computed Augustus metaparameters Selecting a closely-­related species usually produces better results Valid options: see Augustus help for list of options Default: generic -e evalue Use a custom blast e-­value cutoff Default: 0.01 -f Force overwriting of results files from a previous run with the same name --flank N Custom flanking genomic regions in base pairs (bp) Used when extending selected candidate regions before gene prediction Default: Automatically calculated flank sizes based on genome size -­-­long Performs full optimization for Augustus gene finding training Default: Off

BUSCO Output

Successful execution of the BUSCO assessment pipeline will create a directory named name_OUTPUT where ‘name’ is your assigned name for the assessment run. The directory will contain several files and directories:

Files

short_summary_ Contains summary results in BUSCO notation and a brief breakdown of the metrics full_table_ Complete results in tabular format with coordinates, scores and lengths of BUSCO matches training_set_ Set of complete BUSCO matches used for training Augustus Only created during genome assessment _tblastn Results in tabular format of tBLASTn searches with BUSCO consensus sequences

Directories

augustus_ Augustus-­predicted genes Only created during genome assessment augutus_proteins Corresponding Augustus-­predicted proteins Only created during genome assessment selected Complete BUSCO matches, used for training Augustus gb Complete BUSCO matches, GenBank format gffs Complete BUSCO matches, GFF format hmmer_output Tabular format HMMER output of searches with BUSCO HMMs

例子

BUSCO 评价基因组组装质量

Ask for Plant Lineage data: felipe.simao@unige.ch NO ANSWER

Using Eukaryote instead:

#BUSCO: aliased to "python XXX/XXX/BUSCO/BUSCO_v1.2.py"

BUSCO -o Rad -in AAA.fasta -l $SFTW/app/BUSCO/eukaryota -m genome

结果:

Summarized benchmarks in BUSCO notation:
        C:0%[D:0%],F:0%,M:0%,n:429

        190     Complete BUSCOs
        139     Complete and single-copy BUSCOs
        51      Complete and duplicated BUSCOs
        32      Fragmented BUSCOs
        207     Missing BUSCOs
        429     Total BUSCO groups searched