近幾年深度學習(Deep learning)開始廣泛利用在許多地方,例如電腦視覺、語音辨識、音訊辨識、影像處理、生物資訊學等。深度學習至今已有數種學習方式,例如深度神經網絡(DNN)、卷積神經網絡(CNN)、遞迴神經網絡(RNN)等。與其他相比,CNN在圖像文字辨識上能夠有更優秀的結果。雖然CNN有著更優秀的結果,可是因為運算過程的資料量過大,以及其網絡結構複雜,使得CNN的FPGA硬體實作頗為困難許多。因此本論文提出以下幾點貢獻,首先,提出一個優化的卷積神經網絡架構,重複利用卷積神經網絡計算層,以減少硬體資源 ; 除此之外,並使用分塊方法(tiling),減少I/O的暫存器空間 ; 還有基於電路內部記憶元件數量的限制,減少了運算中暫存器儲存數量,且利用同時運算進而降低整體運算時間 ; 最後,亦利用分塊方式將CNN最後一級的SVM核心成功在硬體上實現。本論文使用ZedBoard Zynq-7000 EPP XC7Z020-CLG484-1。 In recent years, Deep Learning has been widely applied to fields like computer vision、automatic speech recognition、audio recognition、image processing and bioinformatics. There are several deep learning architectures, such as the Deep Neural Network(DNN), the Convolutional Neural Networks(CNN), and the Recurrent Neural Networks (RNN), etc. Compared to others, the CNN has better results in the fileds of image and text recognition. However, due to the huge amount of data required during processing and the complexity of the network structure, it has been a challenging task to realize CNN in hardware.In this thesis, we propose optimization schemes to fit the CNN into FPGA hardware. Firstly, an optimized CNN architecture is proposed that reuses the computing layer in order to reduce the hardware resource. In addition, tiling is employed to decrease the amount of I/O registers. Also, by taking the internal memory constraints, we further reduce the number of registers during computation. We also reduce the overall computation time by using simultaneous operations. Finally, the SVM training, which is the last kernel operation in CNN, is also realized via tiling.We used ZedBoard Zynq-7000 EPP XC7Z020-CLG484-1 in this paper.