Jan 16 2015

gitlab使用说明

这是gitlab搭建起来后，为团队内部写的简单配置说明。

Step1: Use `ssh-keygen` to generate a new pair of id_rsa_new / id_rsa_new.pub

1 2	cd ~/.ssh ssh-keygen -t rsa -C "tanhao2013@foxmail.com" # your email

step1

Step2: Add the ssh key to the gitlab

1	cat gitlab_rsa.pub

step2

Dec 31 2014

使用gitlab搭建代码仓库

我来之前，公司一直用windows server搭建的svn托管代码，每次都要手动远程登陆到服务器然后后台授权。我看网上很多类github的平台，于是选了gitlab实验推荐大家迁移到git上来。

1. 搭建脚本很简单，下载安装包，启动即可。

curl -O https://downloads-packages.s3.amazonaws.com/centos-6.6/gitlab-7.6.1_omnibus.5.3.0.ci.1-1.el6.x86_64.rpm

yum install openssh-server postfix cronie

service postfix start && chkconfig postfix on

rpm -i gitlab-7.6.1_omnibus.5.3.0.ci.1-1.el6.x86_64.rpm

然后按说明配置一下gitlab.rb，启动服务即可。注意8080端口和ssh端口转发。

2. 用docker来更新最新版本的gitlab

Updated 2016-03-24

## 修改防火墙
iptables -A INPUT -m state --state NEW -p tcp --dport 10022 -j ACCEPT
iptables -A INPUT -m state --state NEW -p tcp --dport 8080 -j ACCEPT
service iptables save
service iptables status
service iptables restart
iptables -L

service docker restart
docker run --detach \
    --hostname gitlab.example.com \
    --env GITLAB_OMNIBUS_CONFIG="external_url 'http://119.*.*.*/'; gitlab_rails['lfs_enabled'] = true;" \
    -p 443:443 -p 8080:80 -p 10022:22 \
    --name gitlab \
    --restart always \
    --volume /srv/gitlab/config:/etc/gitlab \
    --volume /srv/gitlab/logs:/var/log/gitlab \
    --volume /srv/gitlab/data:/var/opt/gitlab \
    gitlab/gitlab-ce:latest

Dec 15 2014

rChart and morris.js

1. Using `rChart` and `morris.js` for time series visualization

Here are the codes!

library('rCharts', 'ramnathv')
df <- read.csv("type.1h.csv",header=FALSE,stringsAsFactors=FALSE)
colnames(df) <- c("date","1", "2","3","4","5","6","7","8","9","10")
transform(df, date = as.character(date))
m1 <- mPlot(x = "date", y = c("1", "2","3","4","5","6","7","8","9","10"), type = "Line", data = df)
m1$set(pointSize = 0, lineWidth = 1)
m1$print("chart2")
m1
#base64enc
# install.packages("base64enc")
library("base64enc")
m1$save('graph1.html', 'inline', cdn=TRUE)
#m1$save('graph1.html', 'inline', standalone=TRUE)

2. Graph

Here, we can see this is a test graph!

Here is a test image!

Here is a test page!
graph1.html

Aug 22 2014

服务器重装centOS设定静态ip

电信机房5台托管服务器被攻击，为安全起见，老板令我需要重装系统。可采用光盘刻录或u盘安装方法(过程从略)。

1. 重装系统

reboot(crtl+allt+delete) 
-> 
F2(system manage) 
-> 
F11(BIOS Menu) 
-> 
BIOS Boot Setting 
->
Boot Sequence 
-> 
# 光盘或U盘
COD(DVD or U Driver) 
-> 
OK 
-> 
Install from video / U Driver
-> 
No Test 
-> 
Basic 
-> 
Fresh 
-> 
Use All 
-> 
Basic Server 
-> 
reboot

2. 设置固定IP上网

#### 内网或外网ip
IPADDR=192.168.1.201

#### 2.1 网关配置
cp /etc/sysconfig/network /etc/sysconfig/network.bak
echo "
NETWORKING=yes
NETWORKING_IPV6=yes
GATEWAY=192.168.1.1
" >> /etc/sysconfig/network

#### 2.2 网卡配置 
cp /etc/sysconfig/network-scripts/ifcfg-em1 /etc/sysconfig/network-scripts/ifcfg-em1.bak
sed -i 's/BOOTPROTO=dhcp/BOOTPROTO=none/' /etc/sysconfig/network-scripts/ifcfg-em1
sed -i 's/ONBOOT=no/ONBOOT=yes/' /etc/sysconfig/network-scripts/ifcfg-em1

echo "
BROADCAST=192.168.1.255
IPADDR=$IPADDR
NETWORK=192.168.1.0
" >>/etc/sysconfig/network-scripts/ifcfg-em1

#### 2.3 DNS解析
echo "nameserver 202.103.24.68" >/etc/resolv.conf

#### 2.4 测试
chkconfig | grep network
service network restart
ping www.baidu.com

3. 参考

http://blog.51yip.com/linux/1120.html
https://github.com/iofdata/DM/issues/8

Jul 25 2014

Sysbench for MySQL Testing

1. For whole testing scripts

    host=localhost
    port=3306
    socket=/home/data/mysql/mysql.sock
    user=root
    password=123456

    resultsdir=./results-thread

    threads="8 16 32 64 128"

    sizes="1000000 5000000 10000000 15000000 20000000 25000000 30000000"


    printf "sizes,threads,transactions,trns p/s,deadlocks, dls p/s,read/write requests,r/w reqs p/s,min,avg,max,99 percentile \n" >> stat.txt

    mkdir -p $resultsdir

    for thread in $threads;do
        mkdir $resultsdir/thread-$thread
        for size in $sizes; do
            sysbench --test=oltp --mysql-table-engine=innodb \
            --oltp-table-size=$size  --mysql-socket=$socket \
            --mysql-user=$user --mysql-host=$host \
            --mysql-password=$password --mysql-db=students \
            --oltp-table-name=test$size prepare;
            sysbench --test=oltp --mysql-table-engine=innodb \
            --oltp-table-size=$size --mysql-socket=$socket \
            --mysql-user=$user --mysql-host=$host \
            --mysql-password=$password --mysql-db=students \
            --oltp-table-name=test$size  --max-requests=1000 \
            --num-threads=$thread run | \ 
            tee -a $resultsdir/thread-$thread/sysbench.$thread.$size.report;
            sysbench --test=oltp --mysql-host=$host  --mysql-user=$user \
            --mysql-password=$password --mysql-socket=$socket \
            --mysql-db=students --oltp-table-name=test$size  cleanup;

            cat $resultsdir/thread-$thread/sysbench.$thread.$size.report | \
            egrep "cat|threads:|transactions:|deadlocks|
            read/write|min:|avg:|max:|percentile:" | \
            sed  -e '1 s/Number of threads: //' | \
            tr -d "\n" | \
            sed -e 's/Number of threads: /\n/g' \
            -e 's/[A-Za-z\/]\{1,\}://g' \
            -e 's/read\/write//g' \
            -e 's/approx\.  95//g' \
            -e 's/per sec.)//g' \
            -e 's/ms//g' \
            -e 's/(//g'  \
            -e 's/  */,/g' | awk -v d=$size '{$0=d","$0}1' >> stat.txt
        done
    done

Feb 12 2014

PacBio Sequencing

Here is a short talk about PacBio Sequencing by me, any suggestion is welcome!

Dec 29 2013

genome annotation

Here I draw a genome annotation flow chart. Any suggestion is welcome!

A roadmap for genome annotation!

Nov 18 2013

genome assembly

Here is a short talk about genome assembly.

And for DBG details, please click on DBG.

Here I updated the images of the slide for you.

Aug 14 2013

单文库基因组组装 (A Single Library for Genome Assemble)

Illumina 报告中比较了 Reads 长度，coverage，insert size 等对组装结果的影响，可以看到理想状况下，对于简单基因组，30X左右短片段reads加上适量长片段reads可以覆盖足够的基因组区域，并且有较好的N50等指标。

最开始sanger测序可能为了避免重复序列的影响，采用了1k-40k的建库策略，后来soapdenovo在做人类基因组的时候沿用了200，500，2k，5k，10k的测序方法。但是不同基因组具体采用的策略并不一致，但是一般均需要短片段文库（<2k）和长片段文库(>2k)。像Abyss由于做非洲人的时候就只用了42X的210文库数据。

GAGE评价了一些组装软件的组装效果，有 Effect of multiple libraries on assembly 这一段。结合我自己的项目经验，multilib的策略是为了辅助scaffolding。因为contig的组装主要用到reads见的overlap信息，只要测序随机和均一，并且深度足够，短片段reads可以很好的组装出contig（无N的一致性序列），contig的组装步骤并不设计文库片段信息（insert-size和pair-end关系），后面scaffolding则需要用到文库信息来辅助contig间建立连接关系，而这里最主要的也是需要大雨2k的文库梯度分配。所以像allpath这种软件推荐的就是一个短片段文库加一个大片段文库。金小峰这种单倍体物种，基因组也不太大，考虑到个体小，提取DNA复杂，一只蜜蜂样品不足以构建三个短片段文库（200，500，800），我们可以尽量尝试建1到2个文库，对于contig组装影响不会太大（我曾经组装的单染色体蚂蚁也是由于样品原因，建了一个500的文库，效果也很好）。

另外我们注意到像fermi这样的最新的组装软件的进展，对人类基因组已经可以一个样品一个库，35X数据做denovo assembly了。

为了更好的开展后续的分析和讨论工作，后面我还会具体找下已经出来的蜜蜂或蚂蚁的组装文献给大家看看，应该说膜翅目的研究现在还是比较热门的，有很多可参考的借鉴的地方。为了尽快推进这个项目，我们没必要非建3个文库。这是我的意见。

Jul 2 2013

Bayesian Genome Assembly and MCMC Assessment

Introduction

They first build an assembly graph starting from a de Bruijn graph of the reads. Then they remove all tips and merge all unambiguous paths into single nodes that are annotated by the sequence of merged K-mers.

The resulting unresolved assembly graph (no longer de Bruijn) is a directed graph that consists only of bubbles and is a minimal representation of the variants that can be inferred from the sequenced data. Concatenating the sequences across the nodes in a particular path through this graph gives a possible assembly sequence.

Step1: Use ssh-keygen to generate a new pair of id_rsa_new / id_rsa_new.pub