在面對巨量資料(Big Data)的分析時,關聯式資料庫已經無法負荷如此龐大的資訊量。Hadoop可以透過HDFS儲存大量資料,並且透過MapReduce分析巨量資料。由於HDFS的高容錯性和高擴展性,許多企業都開始將資料轉移到Hadoop中,利用雲端運算環境處理大量的資料。而Hadoop對於資料的儲存,是將資料以資料區塊作為區分,以隨機存放的方式,平均分散到叢集中,並沒有考慮資料的相關性。使得在執行分析任務時,有相關性的資料被分散到不同的節點中,增加了需要透過網路交換資料的成本,對於效能有很大的影響。本篇論文提出了在將關聯式資料庫轉移到Hadoop時,當資料從關聯式資料庫轉移到Hadoop時,將儲存的資料區塊從原先只是隨機選擇儲存地點,改進成透過檢視運算節點效能和資料表的參照關係,提供較佳的儲存策略,使得有相關性的資料區塊放置在一起,使得查詢資料時能夠降低資料的找尋成本,讓整體執行效能獲得提升。 Nowadays, in case of Big Data analysis, traditional database management system is no longer capable of processing such big data. Hadoop could save huge amount of data via HDFS and analyze huge amount of data via MapReduce. Owing to scalability and fault tolerance of HDFS, many enterprises now are transferring data to Hadoop, taking advantage of cloud computing for dealing huge amount of data. But for the method of storage of data of Hadoop is dividing data based on Block and store randomly and evenly into cluster. This method lacks the importance of correlation between data and makes relative data separated to different nodes when executing analysis, which adds up cost of network transmission and affects performance immensely. This paper address that while data are transferred from relational database to Hadoop, data block will be selected from efficacy of computing nodes and correlation of tables, instead of randomly choosing which provides better strategy of storage. More relative data is stored together, less search cost is spent when search performs and more efficiency is elevated for executing performance.