English  |  正體中文  |  简体中文  |  Items with full text/Total items : 888/888 (100%)
Visitors : 13063284      Online Users : 222
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: http://ccur.lib.ccu.edu.tw/handle/A095B0000Q/584

    Title: 以 FM-index 為基礎之第三代定序自我型錯誤修正法;A self-error correction algorithm for third-generation sequencing using FM-index
    Authors: 蔡政威;TSAI, CHENG-WEI
    Contributors: 資訊工程研究所
    Keywords: 第三代定序技術;非序列比對錯誤修正法;FM索引;Third generation sequencing;Alignment-free error correction;FM-index
    Date: 2017
    Issue Date: 2019-07-17
    Publisher: 資訊工程研究所
    Abstract: 因為第三代定序技術所產生出的序列為較長的序列,定序的偏差也較低還有定序分布平均等特質,使得第三代定序技術成為現有基因組裝(de novo assembly)的受歡迎選項。 但是由於它所產出的序列錯誤率較高,所以在進行基因組裝前都先必須進行序列的錯誤修正。目前錯誤修正的方法可以分為比對序列分析法和非比對序列分析法。比對序列分析法比較費時但可以在高相似度和低覆蓋率的區域修正。另一方面,分比對錯誤修正法比較快速但敏感度較低。在這篇論文裡,我們研發出一個新的非比對錯誤修正法,藉由FM-index試著把錯誤修正問題轉化成路徑搜尋問題。為了能夠在高相似度和低覆蓋率的區域進行錯誤修正,研發出了使用多種長度子字串的可適性種子搜尋演算法。最後實驗結果指出我們的方法比現有的比對序列分析法和非比對序列分析法還要快在大腸桿菌跟酵母菌之下。在大物種線蟲我們的方法比現有的比對序列分析法還要慢但還是比現有的非比對序列分析法還要快速。
    The 3rd-generation sequencing technologies are becoming the popular choice in de novo assembly projects, because of long reads, less sequencing bias, and more uniform coverage. But it comes at the cost of much higher error rates and thus error correction is often performed prior to assembly. Currently, error correction methods can be divided into alignment-based and alignment-free approaches. Alignment-based methods are more time-consuming but able to correct reads in repetitive and low-coverage regions. On the other hand, alignment-free methods are much faster but have less sensitivity. In this thesis, we develop a novel alignment-free algorithm which reduces the correction problem to a path-searching problem via FM-index extension. In order to correct reads in low-coverage and repetitive regions, an adaptive seeding algorithm using multiple sizes of k-mers is developed. The experimental results indicated that our method is faster than existing alignment-based and alignment-free methods in E. coli and S. cerevisiae datasets. For large genome datasets, our method is slower than alignment-based methods but still faster than existing alignment-free method.
    Appears in Collections:[資訊工程學系] 學位論文

    Files in This Item:

    File Description SizeFormat

    All items in CCUR are protected by copyright, with all rights reserved.

    版權聲明 © 國立中正大學圖書館網頁內容著作權屬國立中正大學圖書館


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback