Parallel Merging Method to Integrate Different Genome Assemblies

Kirill V. Romanenkov, Alexey N. Salnikov, Andrey V. Alexeevski

Abstract


In this paper research in the field of application multiprocessor systems for genome assemblies reconciliation has been carried out. A large number of algorithmic approaches aimed to solve the task of de novo assembly from short reads, however the results of their work on the same raw data often differ essentially. Due to the large data volume the computations in the distributed memory model on computational cluster are required. Authors develop merging algorithm to integrate different genome assemblies based on distributed weighted contig graph. The proposed method integrates a combination of draft assemblies reducing resulting contigs fragmentation. Sequential version of the algorithm is implemented in C/C++ and is available at https://bitbucket.org/kromanenkov/gar.


Keywords


bioinformatics, multiprocessor systems, parallel algorithms

References


Miller J.R. Assembly algorithms for next-generation sequencing data / J.R. Miller, S. Koren,

G. Sutton // Genomics. — 2010. — Vol. 95, No. 6. — P. 315–327.

NCBI, БД Assembly, геном человека: URL: http://www.ncbi.nlm.nih.gov/assembly/883148

(дата обращения: 1.08.2015).

Информация о генах эукариотов на сайте NCBI:

URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/eukaryotes.txt (дата

обращения: 1.08.2015)

Vicedomini R. GAM-NGS: genomic assemblies merger for next generation sequencing /

R. Vicedomini, F. Vezzi, S. Scalabrin, L. Arvestad, A. Policriti // BMC Bioinformatics.

Vol. 14(Suppl.7), No. 1. P. 1–18.

Yao G. Graph accordance of next-generation sequence assemblies / G. Yao, L. Ye, H. Gao,

P. Min, W.C. Warren, G.M. Weinstock // Bioinformatics. 2012. Vol. 28, No. 1.

P. 13–16.

Zimin A.V. Assembly reconciliation / V.A. Zimin, D.R. Smith, G. Sutton, J.A. Yorke //

Bioinformatics. 2008. Vol. 24, No. 1. P. 42–45.

Zorro The masked assembler URL: http://lge.ibi.unicamp.br/zorro/ (дата обращения: 22.07.2015).

European Nucleotide Archive URL: http://www.ebi.ac.uk/ena/data/view/SRR122309

(дата обращения: 1.08.2015).

Encephalitozoon cuniculi GB-M1

URL: http://www.ncbi.nlm.nih.gov/genome/39?genome_assembly_id=22671 (дата об-

ращения: 1.08.2015).

Gurevich A. QUAST: quality assessment tool for genome assemblies / A. Gurevich,

V. Saveliev, N. Vyahhi, G. Tesler // Bioinformatics. 2013. Vol. 29, No. 8. P. 1072–1075.

Koren S. Automated ensemble assembly and validation of microbial genomes / S. Koren,

T.J. Treangen, C.M. Hill, M. Pop, A.M. Phillippy // BMC Bioinformatics. 2014. Vol. 15,

N. 5. P. 126–134.

Simpson J.T. Efficient construction of an assembly string graph using the FM-index /

J.T. Simpson, R. Durbin // Bioinformatics. 2010. Vol. 26, N. 12. P. 367–373.




DOI: http://dx.doi.org/10.14529/cmse160103