SSPACE-LongRead: scaffolding with long reads

Introduction

We are happy to say that SSPACE is ready for dealing with the PacBio long reads for scaffolding1.

They proposed a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone.

The SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner.

Circos for Comparative genomics

Synteny and Comparative genomics

Welcome to see details in my github issues.

First, use Lastz for synteny block alignment.

Then use SVG for synteny block drawing.
line

And we found Circos more powerful on handing this.

circos

说明,本文以下内容多数来自网络资源转载,像原作者致意,感谢他们的原创工作!如想详细了解,请点击衔接,接入原文!

Circos Installation

For OSX: you can refer to this os-x-installation-guide and this.

Read this first

Circos系列教程一安装

Circos系列教程二染色体示意图ideograms

Circos系列教程三突出标记Highlight

Circos系列教程四连线links

Tutorial

CIRCOS教程翻译 1.1——helloworld
1.1

CIRCOS教程翻译 1.2——ticks
1.2

CIRCOS教程翻译 1.3 ——染色体的变化
1.3

CIRCOS教程翻译 1.4——links和rules
1.4

杂合基因组的组装(De novo assembly of highly heterozygous genomes )

杂合基因组的组装(De novo assembly of highly heterozygous genomes )

最近看到一篇文献
Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads
感觉高杂合的大型基因组的组装有了一些可喜的进展,日本人的工作还是很扎实的,有一些值得借鉴和参考的地方。

Abstract

随着测序平台的改进和发展,测序通量已经不是问题,测序价格越来越便宜,对于一些非模式生物或野生物种来说,测定它们的基因组序列对于科学研究意义越来越明显。但是,大多数情况来看,它们通常又具有较高的杂合或者多倍性这些问题,这对于以短reads做组装为主流的de novo项目非常棘手,目前尚缺乏较完善兼具可实践的方案(经费多的实验室除外)。

一般来说,杂合基因组的解决有两种可行的方案,但都公认为比较费时费力费钱费脑.

  1. Fosmid-based (or Bac-based) hierarchical sequencing;
  2. Inbred lines ( doubled-monoploid clone).

Fosmid或者bac为基础的分层组装需要构建大量的长片段文库,实验工作以及组装拼接都是精细活,像牡蛎oyster (Zhang et al. 2012), 小菜饿diamondback moth (You et al. 2013), and 挪威云杉Norway spruce (Nysted et al. 2013)这些经典案例都值得看一看、读一读。

Install matplotlib is a nightmare

Building matplotlib on OSX has proved to be a nightmare because of the different types of zlib, png and freetype that may be on your system.

The recommended and supported way to build is to use a third-party
package manager to install the required dependencies, and then
install matplotlib from source using the setup.py script. Two widely
used package managers are homebrew and MacPorts. The following
example illustrates how to install libpng and freetype using
homebrew.

Example usage::

brew install libpng freetype

If you are using MacPorts, execute the following instead:

Example usage::

port install libpng freetype

To install matplotlib from source, execute:

Example usage::

  python setup.py install

结果freetype报错:

/usr/local/include/ft2build.h:56:10: fatal error: ‘freetype/config/ftheader.h’ file not found

google 之在stackoverflow
仍然不成功;

最后到处试试,终于在这篇博客看到;
http://blog.caoyuan.me/2012/08/matplotlib-error-mac-os-x/

最后终于成功!

I am coming back

Cuz I was busy these days, I spent a lot of time for updating these beautiful theme
for my blog. I like it very much!
And here I am testing some code insertion.

Assemble a genome

基因组拼接就是将测序得到的短 reads 还原成更长基因组序列的过程,
不同组装软件和组装策略采用的具体算法和细节不尽相同,
但总体上都经过如下几步:

a) Contig 组装

首先,利用 readsoverlap 和覆盖度情况,拼接出 contigs 序列;
Contigs 组装方法较多,软件丰富,算法实现侧重点不同,具体细节比较麻烦;
但从整体上来看,都是先将 readsoverlap 关系构图,然后具体去简化这个图。
reads 一般采用基于 Kmer 的 De Brujin Graph(DBG),
传统长 reads 一般采用 Overlap-Layout-Consensus(OLC*) 或 String Graph(SG)。

About Me

About Me !

I’m a bioinformatic engineer and amateur programmer, and here is a short introduction about me and this blog, thanks for your reading!

I’m engaged in genomics research, including genome assembly/annotation/evolution and comparative genomics.

As a mathematics graduate, I’m also interested in interdisciplinary work, specifically data mining and visualization. And I hope we can exchange our ideas and experiences about Data Mining through this blog.

I mostly work in python/perl/R on OSX and Linux for most of my work, I also dabble in C/C++/Shell. And I am trying to learn D3.js and processing for my hobby of Data Visualization now.

Thanks for following me: @Github and @Weibo.

I’m living in Wuhan, a city near the Yangtze River in China. It’s beautiful besides it’s bad weather(long hot summer and cold winter), I spent my college life there and I love my friends there forever.

Besides programming, I exercise and read books regularly. To be a stronger and better version of myself!

About this blog !

This blog is proudly powered by Hexo, which takes great advantage of Node.js.

And it is only for my scattered ideas of reading and programming, there are no business attempts.

Any suggestions about this blog and topics talked there are welcome.

Like the TED said: Ideas are worth spreading and sharing! Many THX!

Time axis !

Time Department/Works Company/University
2010-Now Bioinformatics Engineer/Analyst BGI
2006-2010 Student in College of Mathematics And Statistics HUST

About This Blog!

1.Bulletin Board

Hi, I’m Buttonwood! This is my first blog on github.

Writing blogs like Hackers!

This is really very very cool! And I’ve learned lots of things from it.

Yes, after days of struggling, my Techblog is online at last!
Thx to the Internet and we‘re in an Open Society as well as A Boom Time.

It is a personnal blog focus on new technology and thoughts, and also for summing up the practical experience of my work and life.

It is nonbusiness and all right reserved.

Besides, this is an individual experiment and my personal deed,it has nothing to do with any company、organization or institution.

Any suggestion about this blog is welcome, and I’ll keep on updating.