Maizego Summer Tutorial






Lecture 1

Brief Introduction to Bioinformatics Armory

Maizego Summer Tutorial

Know how, and why


  • Primer Design
  • Target Length
  • Reaction loops
  • RT-PCR; Q-PCR ...



  • e-value cutoff ..
  • matrix;
  • gap open/extension;
  • reward/penalty;
  • Pre-build Tasks: megablast, blast?-fast ...

Maizego Summer Tutorial

👨‍💻 :

~~ Bioinformatics is my lifework, what should I learn?

Bioinformatics ≈ Programming + Biology (omics) + Maths

# For beginers:
1. Linux & shell (Bash)
2. At least one general language:
    python, [perl + R], julia, rust, jave, c++ ...
3. Data analysis skills:
    string manipulating, table manipulating, visualisation ...
4. Basic omics data analyses knowledge:
    DNA-seq; RNA-seq; Protein; Methylation; ChIP-seq; ATAC-seq; HiC; Single-Cell ...
5. Basic statistic knowledge:
    Distribution; Common hypothesis tests; Regression; Cluster; Correlation ...
6. Aware of common databases:
    NCBI, Ensemble, UCSC, Uniprot, CoGe, MaizeGDB ...

Maizego Summer Tutorial

👨‍🔬 :

~~ I occasionally use Bioinfomatics tools, what about me?




  • Linux & shell

  • Highly recommend: One high-level language

  • Gets the picture of the omics/sequencing tech

  • The ability to i) Search for tools; ii) read the README

  • How to ask 👨‍💻 questions so that they can understand you (without losing patience)

Maizego Summer Tutorial

🤷‍♂️ :

~~ Meh, computer do not breed crop.




  • Programming trains the ability of abstraction


  • Be friendly to 👨‍💻, because we are usually emotional (for lack of sleep and hair)

Maizego Summer Tutorial


With great ignorance, comes great "power"


Maizego Summer Tutorial

How to be an "occasionally-bioinfo-tools-using-guy"

1. Basic Computer Concepts (Lecture 01)

2. Basic Linux Commands (Lecture 01)

3. How to use the HPC (Lecture 02)

4. How to make explicit and reusable pipelines (Lecture 03)

Maizego Summer Tutorial

Basic Computer Concepts

  • [CPU], [RAM], [DISK], [I/O]

    • CPU: the brain
    • RAM: short-term memory, electrical power based
    • Disk: storage unit
  • CPU: Multi-Cores
  • DISK -> RAM -> Cache -> CPU
  • I/O: keyboad, cursor; terminal, printer ...

Maizego Summer Tutorial

Linux and Shell (Bash)

Why Linux

  • Linux is everywhere
  • low-budget
  • stable and efficiency
  • open-source

Common Release

  • Debian/Ubuntu; RHEL/CentOS; ...

Shell

  • Shell: script language for system control => translator
  • Easy and efficiency to use (👨‍🔬: and can meet 90% of my needs, woohoo~!)
  • Bash: GNU Bourne-Again Shell, the most commonly used shell version

Maizego Summer Tutorial

Basic usage of Linux and Shell (Bash)

ALREADLY well documented in: NCPGR HPC Wiki accessible with HZAU VPN only

Maizego Summer Tutorial

Basic usage: Overview

# 系统知识: 环境变量, 终端配置 ...

# Bash 常用指令
chmod, setfacl, ls, ln, rm, cp, mv, touch, mkdir, mktemp, cd, rm, find ... # 文件操作,目录操作
vim, cat, less, more, head, tail, wc, sort, uniq, paste, cut ... # 内容操作
sed, awk, grep # Linux三剑客
tar, unzip, gzip, bzip2 ... # 打包/压缩
stdin, stdout, stderr, redirection, pipe, mkfifo, ... # IO与管道
top/htop, ps, kill, free, df, scree, nohup &, ... # 任务管理
alias, xargs, parallel ... # 效率工具
wget, curl, rsyn, scp, sftp, lftp ... # 网络, 同步

# Bash 脚本编程
{...}, "", '', seq, $(), `` # 展开, 转义
a="123.txt"; $a; ${a%.txt}; # 变量赋值和引用, 字符串操作;
if; [ ]; [[ ]]; -a / -o / && / ||; while; for; break; until; case ... #条件判断;  流程控制;
function # 函数: 变量作用域, 函数返回值;
read, readline, echo, printf # 输入输出

# 通配符与正则表达式

Maizego Summer Tutorial

Basic usage: a test

Q1. Get all integers from [0,100] that end with 7

Q2. Create 100 files with names of test001.txt, test002,txt, ... test100.txt

Q3. Rename files from Q2 to maize001.tmp, maize002.tmp ... maize100.tmp

Q4. A52A_5^2 -> [(1,2),(2,1),...(4,5),(5,4)]

Maizego Summer Tutorial

Basic usage: complements

shortcuts

  • Ctrl+R; Ctrl+A; Ctrl+E; Ctrl+U; Ctrl+L;
  • Ctrl+Z, Ctrl+C
  • Tab
  • ⬆, ⬇ (can be better: bind '"\e[A": history-search-backward',bind '"\e[B": history-search-forward')
  • Ctrl + [⬅, ➡]

alias

alias d2u="sed -i 's/\r//g' "
alias u2d="sed -i 's/$/\r/g' "

Maizego Summer Tutorial

Basic usage: complements


String manipulation

${var#*/}; ${var##*/}
${var%/*}; ${var%%/*}
${var:start:len}; ${var:start}; ${var:0-start:len}; ${var:0-start}
${var/pattern/replace}

Simple Functions

function cl(){
    cd $1 && ls
}

Maizego Summer Tutorial

Basic usage: complements

Test on File Features

-e filename #如果 filename存在, 则为真  [ -e /var/log/syslog ]
-d filename #如果 filename为目录, 则为真  [ -d /tmp/mydir ]
-f filename #如果 filename为常规文件, 则为真  [ -f /usr/bin/grep ]
-L filename #如果 filename为符号链接, 则为真  [ -L /usr/bin/grep ]
-r filename #如果 filename可读, 则为真  [ -r /var/log/syslog ]
-w filename #如果 filename可写, 则为真  [ -w /var/mytmp.txt ]
-x filename #如果 filename可执行, 则为真  [ -L /usr/bin/grep ]
filename1 -nt filename2 #如果 filename1比 filename2新, 则为真  [ /tmp/install/etc/services -nt /etc/services ]
filename1 -ot filename2 #如果 filename1比 filename2旧, 则为真  [ /boot/bzImage -ot arch/i386/boot/bzImage ]

Maizego Summer Tutorial

Take home message

  1. Learning at least a script language is necessary, for everyone;

  2. Computer: [CPU: Cache] + [RAM] + [DISK] + [IO]

  3. Each maizego member should master (or at least be impressed with) the listed Linux/Bash CMDs

Preview of the next Lecture

  • HPC
  • LSF job management
  • Skills, Dangers and Habits for working on HPC