mapcomp

mapcomp通过比较SNP序列在参考近缘种基因组的位置,判断两个图谱中标记的对应关系。

Citation:

Sutherland B J G, Gosselin T, Normandeau E, et al. Salmonid chromosome evolution as revealed by a novel method for comparing RADseq linkage maps[J]. bioRxiv, 2016: 039164.

下载安装

在 Github 上: HERE

mapcomp是一系列shell和py脚本,直接下载可用,但是需要依赖以下软件的安装:

Linux or MacOS
Python 2.7
numpy (Python library)
bwa
samtools
The R statistical language

基本用法

注意: 所有指令需要在mapcomp 的根目录下运行, 即"mapcomp"可执行文件所在目录下.

(因为有些shell脚本默认的识别路径是当前目录的某些文件夹)

将参考基因组复制到mapcomp安装目录中"./02_data/genome/"目录下并重命名为"genome.fasta",建立bwa索引。
```
 cp XXX.fasta 02_data/genome/genome.fasta
 bwa index 02_data/genome/genome.fasta
```
准备SNP标记和连锁群的数据,格式为6列csv格式表格,如下:

SpeciesName,LG,Position,Zeroes,markerName,markerSequence

注意:
- There is no header line in the .csv file
- There are 6 columns of information
- The different columns are separated by a comma (,)
- The fourth column is filled with zeroes (0)
- You need more than one map in the .csv file
- You should avoid special characters, including underscores (_) in the marker names.
("./01scripts/utilityscripts/findminmaxtotalpositions.py"这个脚本好像是根据""分割数据,所以尽量不要出现"")
- You must use the period (.) as the decimal separator (no comma (,))
一个范例csv如下:

hsapiens,1,0.58,0,marker0001,CGGCACCTCCACTGCGGCACGAAGAGTTAGGCCCCGTGCTTTGCGG

hsapiens,1,5.74,0,marker0002,CGGCACCTCCACTGCGGCACGAAGAGTTAGGCCCCGTGCTTTGCGG ... hsapiens,1,122.39,0,marker0227,CGGCACCTCCACTGCGGCACGAAGAGTTAGGCCCCGTGCTTTGCGG

用".csv"文件生成"marker.fasta"

 ./01_scripts/00_prepare_input_fasta_file_from_csv.sh <your_file.csv>

运行 MapComp
```
 ./mapcomp
```
"mapcomp中可以更改"./01scripts/04pairmarkers.py"的参数"maxdist"(默认为10000000), 参数意义是什么?

个人认为是认为两个标记配对的最大碱基距离,默认的1E7已经很大了,越小,配对的标记数越少.考虑到参考基因组的scfd长度,这个参数改动一个数量级基本没有影响,可以不纠结它(个人意见)
查看结果

"03mapped/wantedloci.info": the details needed in marker pairs between species.

(can be useful to obtain exact locations of marker mapping on the reference genome)

"04_figures": plots comparing between linkage maps
"05_results": a summary of results

示例结果如下: