Over the past few years, next-generation sequencing has become an invaluable technology for numerous applications in the field of genomics. The success of these applications are dependent on the performance of each phase in the genomic sequence pipeline, which starts with read mapping. However, read mapping is computationally intensive since it requires mapping billions of reads to numerous locations in a large reference genome. Building a q-gram index hash table has proven to be an efficient alternative to reduce the repetitive scanning of the reference during the verification step. A q-gram index hash table stores the locations of each q-gram in the reference genome. To accelerate the process of building this data structure and to exploit the multi-core architecture, instructions can be executed in parallel and distributed to multiple CPU cores. This paper performs a comparison analysis between the sequential and multiprocessing implementation of the index build time of the three methods for building a q-gram index hash table. The implementation results show that all multiprocessing versions are faster than sequential ones, with speedups ranging from 1.53 to 2.57. Although the open addressing method yields the fastest index build time, the best speedup is achieved by the minimizer-based method.
展开▼