一本到高清DVD91日韩伦理影院|无码AV中文一区国产强奸三级簧片|日韩无码色哟哟午夜福利国产一区|丁香激情五月亚洲亚洲影院123区|五月天综合久久国产精品free|亚洲免费专区日韩热在线视频|黄片看视频免费久久偷拍的视频|五月婷桃色网日韩国产一级

服務(wù)熱線02152235399
當前位置:博客 > 生物信息

PacBio數(shù)據(jù)組裝軟件Sprai的安裝及使用說明

時間:2018-10-19    |    閱讀量:6555

一、Sprai簡介

Sprai (single-pass read accuracy improver) is a tool to correct sequencing errors in single-pass reads for de novo assembly. It is originally designed for correcting sequencing errors in single-molecule DNA sequencing reads, especially in Continuous Long Reads (CLRs) generated by PacBio RS sequencers. The goal of Sprai is not maximizing the accuracy of error-corrected reads; instead, Sprai aims at maximizing the continuity (i.e., N50 contig length) of assembled contigs after error correction.

官網(wǎng): http://zombie.cb.k.u-tokyo.ac.jp/sprai/README.html#introduction

二、安裝方法:

2.1 軟件需求

1. python 2.6 or newer

2. BLAST+ 2.2.27 or newer

3. Celera Assembler ver. 8.1 or newer (if you assemble reads after error-correction)

2.2 安裝方法:

2.2.1 CA 安裝過程:

CA 下載地址: https://sourceforge.net/projects/wgs-assembler/

bzip2 -dc wgs-8.3rc2.tar.bz2 | tar -xf -

cd wgs-8.3rc2

cd kmer

make install

cd ../src

make

cd ../..

2.2.2 安裝List-MoreUtils-0.415.tar.gz:

perl Makefile.PL

make

make install

2.2.3 安裝 Exporter-Tiny-0.042.tar.gz (注意需要先安裝該模塊,然后安裝下面的Statistics-Descriptive-3.0612.tar.gz模塊,才不會出錯)

tar -xzvf Exporter-Tiny-0.042.tar.gz

cd Exporter-Tiny-0.042/

perl Makefile.PL

make

make install

2.2.4 安裝 Statistics-Descriptive-3.0612.tar.gz

tar -xzvf Statistics-Descriptive-3.0612.tar.gz

cd Statistics-Descriptive-3.0612/

perl Build.PL

./Build

./Build test

./Build install

2.2.5 sprai安裝:

spri下載地址:http://zombie.cb.k.u-tokyo.ac.jp/sprai/Download.html

tar -xzvf sprai-0.9.9.17.tar.gz

cd sprai-0.9.9.17/

./waf configure

./waf build

./waf install

三、使用方法

3.1 輸入文件要求是subreads in FASTQ格式,如果文件是.bas.h5格式,則需要使用軟件bash5tools.py進行格式的轉(zhuǎn)換。PacBio GitHub (pbh5tools) 使用方法:

bash5tools.py --outFilePrefix example_output --readType subreads --outType fastq --minReadScore 0.75 example.bas.h5

如果是多個subreads,則需要將所有的文件合并成一個fastq文件作為輸入,注意輸入的fastq文件不能為壓縮文件。

3.2 創(chuàng)建一個文件夾 mkdir tmp; cd tmp ,并復制sprai路徑下的pbasm.specec.spec文件到當前的路徑中

3.3 修改配置文件

1ec.spec是軟件Sprai的配置文件,根據(jù)實際情況修改該配置文件

#>- params -<#input_fastq all.fqestimated_genome_size 50000estimated_depth 100partition 12evalue 1e-50trim 42ca_path /path/to/your/wgs/Linux-amd64/bin/word_size 18

參數(shù)說明:

input_fastq is your input file name.

estimated_genome_size is the number of nucleotides of your target. If you do not know it, set large number. For example, set 1e+12.

estimated_depth is the depth of coverage of input_fastq of your target. If you do not know it, set 0.

partition is the number of processors Sprai uses.

evalue is used by blastn.

trim is the number of nucleotides Sprai cut from both sides of alignments.

ca_path is the path to your wgs-assembler (Celera Assembler) installed.

word_size is used by blastn.

2pbasm.spec 是組裝軟件Celera assembler的配置文件,如果僅做數(shù)據(jù)的糾錯,則不需要這個配置文件。該文件中設(shè)置組裝過程中所用到的一些參數(shù),包括CPU使用個數(shù)等。

3.4 運行方法:

1)數(shù)據(jù)糾錯及組裝

ezez_vx1.pl ec.spec pbasm.spec > log.txt 2>&1 &

2)僅做數(shù)據(jù)糾錯

ezez_vx1.pl ec.spec -ec_only > log 2>&1 &

或者

ezez_vx1.pl ec.spec > log 2>&1 &

即可

3)僅做組裝

ca_ikki_v5.pl pbasm.spec estimated_genome_size \ -d directory in which fin.idfq.gzs exist \ -ca_path /path/to/your/wgs/Linux-amd64/bin \ -sprai_path the path to get_top_20x_fa.pl installed

3.5 輸出文件

1)第一步,數(shù)據(jù)糾錯,輸出一個result_yyyymmdd_hhmmss的文件夾,處理后結(jié)果文件名稱為c01.fin.idfq.gz

2)第二步,組裝,輸出的config文件為./CA/9-terminator/asm.ctg.fasta

3)組裝統(tǒng)計結(jié)果,在CA/do_*_c01.fin.top20x.log 文件中

. 軟件安裝過程中所遇問題

4.1 找不到/usr/bin/time 命令

解決方法:

a. 修改軟件中的代碼,將/usr/bin/time 修改為time

4.2 軟件運行過程中報"set Illegal option -o pipefail"

解決方法:

查看 sh調(diào)用的是什么,如果不是/bin/bash,則需要進行第二步的修改

1)$ls -al /bin/sh

2)直接修改 /bin/sh 鏈接文件,將其指定到 /bin/bash:

$sudo ln -fs /bin/bash /bin/sh