「文字」是人類歷經好幾個世紀逐漸演化出來，是人與人之間溝通的符號，隨著影像辨識技術逐漸成熟，在自然場景的影像中偵測、辨識文字的辨識率已相當高，其中又以深度學習最為卓越，在深度學習中的卷積神經網路(Convolutional Neural Networks, CNN)近幾年來普遍被應用在文字偵測、辨識，但辨識率高的代價則是其運算複雜造成耗時過久，故此研究目標為利用影像前處理減少耗時以及雲端字典對詞彙校正。 首先，尋找文字出沒區域及顯著文字優先來得到視覺重心，藉此縮小辨識範圍來降低使用卷積神經網路辨識出字元之耗時，並在文字偵測部分由最大穩定極值區域 (Maximally Stable Extremal Regions, MSER)取代；再者，以往辨識出來的文字串通常是與自己建的詞彙庫比對得出最相近之單字，但此舉受限於詞彙庫，故以雲端字典取代得到較廣辨識能力以及使辨識容錯上升。 Text is the evolution of humanity after many centuries and the symbol of communication between people. As the image recognition technology matures, the recognition rate of the text in the natural scene is accurate. In many technologies, the deep learning is the best. Convolutional neural network in deep learning has been widely used in text detection and recognition in recent years. However, convolutional neural networks are computationally complex and time-consuming. Our research goals are reduce time-consuming by image preprocessing and word correction by using cloud dictionary. First, searching text areas and significant text priority are used to search visual centers. Visual centers are used to reduce the time-consuming of character recognition with convolution neural networks. The text detection uses MSERs to avoid time-consuming of the sliding window method. Second, the previous people are usually to obtain correct word by searching their own dictionaries. We use cloud dictionary for more efficient recognition and greater fault tolerance.