隨著深度學習的普及和計算的提高功率，神經網絡變得越來越大。儘管模型很複雜，深度模型培訓仍然存在兩個挑戰。一個是昂貴的計算成本，另一個是可能有不足或無標籤的數據。最近研究家提出的目的在實現的知識蒸餾方法的推動，獲得小而快速執行的模型。在本文中，我們建議使用用於深度模型學習的輔助結構，通過對齊層將輔助模型的權重遷移到小模型。然後，我們使用預先訓練的參數來訓練小模型。在本文中，我們結合兩個評估來決定哪個卷積或完全連接的層具有更有用的信息：分層的類間矩陣和層間克數矩陣。分層的類間矩陣表示類別之間的特徵的相關程度而層間克矩陣表示蒸餾的知識，將複雜網絡所提取的特徵做內積以獲得層與層之間的關係。實驗結果證明使用以上這兩個矩陣所決定的層擁有最有價值的資訊，得到模型訓練的最佳性能。 With the popularity of deep learning and the improvement of computing power, neural network becomes deeper and bigger. Despite the complexity of model,there are two challenges remain for deep model training. One is expensive computational costs and the other is there may be insu fficient or unlabeled data to train or adapt a deep architecture to new tasks. In addition to the above two challenges, motivated by the recently proposed knowledge distillation approach that aims at obtaining small and fast-to-execute models. In this paper, we propose to use the auxiliary structure for deep model learning with insu fficient data via additional alignment layers to migrate the weights of the auxiliary model to a small model.Then, we use the pre-trained parameters to train the small model. In this paper, we combine two evaluations to decide which convolutional or fully-connected layer has more useful information: layered inter-class matrix and inter-layered gram matrix.The layered inter-class matrix indicates the degree of correlation between features of di fferent classes, while the inter-layered gram matrix represents the distilled knowledge from the complex network consisting of the inner products between features from ant two layers of the model. The experimental results demonstrate the superior performance of model training using the proposed auxiliary structure.