一、CIRI簡(jiǎn)介:
CIRI 根據(jù)circRNA 連接點(diǎn)處的reads來識(shí)別circRNA, 在連接點(diǎn)處的reads 其比對(duì)情況非常特殊;
CIRI 根據(jù)3種模型來識(shí)別circRNA, 連接點(diǎn)處的read 叫做junction read
A)
circRNA 由3個(gè)外顯子環(huán)化形成, 由于測(cè)序讀長(zhǎng)的限制,junction read 只覆蓋了起始外顯子和終止外顯子的部分序列,這兩部分reads的比對(duì)位置在基因組上的位置是相反的
circRNA 由3個(gè)外顯子環(huán)化形成, 由于連接點(diǎn)處的一個(gè)外顯子其長(zhǎng)度太短,junction read 除了覆蓋了起始外顯子和終止外顯子的兩部分序列外,還覆蓋了中間的一個(gè)外顯子的部分序列
C)
circRNA 由1個(gè)外顯子環(huán)化形成, junction read 除了覆蓋了整個(gè)外顯子外,還重復(fù)又讀了一部分序列
D)
為了進(jìn)一步降低假陽(yáng)性率,CIRI 通過以下3條規(guī)則對(duì)結(jié)果進(jìn)行過濾:
1)雙端測(cè)序的兩條reads 必須符合PEM 信號(hào),以上面的示意圖為例,進(jìn)行說明read1 是一條junction read, 來源于兩個(gè)外顯子,根據(jù)read1 的比對(duì)情況,確定了circRNA 在基因組上的位置,此時(shí),如果這個(gè)circRNA 識(shí)別準(zhǔn)確,那么read2 就肯定落在對(duì)應(yīng)的位置內(nèi);
根據(jù)兩條reads的比對(duì)情況,進(jìn)一步過濾結(jié)果;
2) 檢測(cè)到的circRNA 的連接處符合AG-GT 剪切信號(hào);
3)根據(jù)比對(duì)的質(zhì)量和數(shù)量進(jìn)行過濾,質(zhì)量就是說mapping 的質(zhì)量越高,識(shí)別的circRNA 越準(zhǔn)確;數(shù)量就是說對(duì)于某個(gè)circRNA來說,檢測(cè)到的juntion reads 越多,說明這個(gè)circRNA越可靠;
上面圖中的幾種模型只是幫助我們理解了exonic-circRNA的檢測(cè),其實(shí)對(duì)于non-exonic circRNA(包括intronic circRNA 和 intergenic circRNA)的檢測(cè),其原理是相似的,只是綜合考慮了測(cè)序讀長(zhǎng)和連接點(diǎn)兩段序列的長(zhǎng)度,提出幾種可能的比對(duì)模型,然后根據(jù)比對(duì)模型來檢測(cè)對(duì)應(yīng)的junction reads, 從而預(yù)測(cè)circRNA;
circRNA 結(jié)果的驗(yàn)證:
以一個(gè)預(yù)測(cè)得到的circRNA chr2: 58,311,224|58,316,858 為例,在基因組上的長(zhǎng)度為 5634bp, 其連接點(diǎn)為VRK2基因的exon6和exon10
理論上產(chǎn)生的circRNA的序列為所有外顯子組成的序列,splicing length為407bp
為了驗(yàn)證該circRNA , 根據(jù)連接點(diǎn)兩端的序列設(shè)計(jì)引物,擴(kuò)增出該circRNA 片段,跑電泳,確定產(chǎn)物長(zhǎng)度
圖中的黑色片段為擴(kuò)增產(chǎn)物的條帶,根據(jù)PAGE 電泳的結(jié)果,確定其長(zhǎng)度;然后進(jìn)行一代測(cè)序,確定具體序列
文獻(xiàn):
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0571-3#Sec18
二、CIRI安裝
2.1 下載地址:
https://sourceforge.net/projects/ciri/files/latest/download
2.2 安裝方法:
解壓即可。
三、CIRI使用方法
Usage: perl CIRI.pl -I in.sam -O output.ciri -F ref.fa (-R ref_dir/)
Arguments:
-I, --in
input SAM file name (required; generated by BWA-MEM)
-O, --out
output circRNA list name (required)
-F, --ref_file
FASTA file of all reference sequences. Please make sure this file is
the same one provided to BWA-MEM. Either this argument or
-R/--ref-dir is required.
-R, --ref_dir
directory of reference sequence(s). Please make sure fasta files in
this directory are from the FASTA file(s) provided to BWA-MEM. Either
this argument or -F/--ref-file is required.
-A, --anno
input GTF/GFF3 formatted annotation file name (optional)
-G, --log
output log file name (optional)
-H, --help
show this help information
-S, --max_span
max spanning distance of circRNAs (default: 200000)
-high, --high_strigency
use high strigency: only output circRNAs supported by more than 2
distinct PCC signals (default)
-low, --low_strigency
use low strigency: only output circRNAs supported by more than 2
junction reads
-0, --no_strigency
output all circRNAs regardless junction read or PCC signal counts
-U, --mapq_uni
set threshold for mappqing quality of each segment of junction reads
(default: 10; should be within [0,30])
-E, --rel_exp
set threshold for relative expression calculated based on counts of
junction reads and non-junction reads (optional: e.g. 0.1)
-M, --chrM
tell CIRI2 the ID of mitochondrion in reference file(s) (default:
chrM)
-T, --thread_num
set number of threads for parallel running (default: 1)
-Q, --quiet
keep quiet when running
-D, --output_all
keep the temporary files after running (more disk space would be
needed)
四、檢測(cè)流程
1.使用BWA-MEM進(jìn)行比對(duì),
2.使用CIRI2進(jìn)行檢測(cè),使用命令如:perl CIRI2.pl -I sample.sam -O test.ciri -F chr1.fa -D -Q -0 -S 200000 -A
CIRI 運(yùn)行過程中所需要的內(nèi)存資源比較多