1-2hit |
Ji XI Yue XIE Pengxu JIANG Wei JIANG
Currently, a significant portion of acoustic scene categorization (ASC) research is centered around utilizing Convolutional Neural Network (CNN) models. This preference is primarily due to CNN’s ability to effectively extract time-frequency information from audio recordings of scenes by employing spectrum data as input. The expression of many dimensions can be achieved by utilizing 2D spectrum characteristics. Nevertheless, the diverse interpretations of the same object’s existence in different positions on the spectrum map can be attributed to the discrepancies between spectrum properties and picture qualities. The lack of distinction between different aspects of input information in ASC-based CNN networks may result in a decline in system performance. Considering this, a feature pyramid segmentation (FPS) approach based on CNN is proposed. The proposed approach involves utilizing spectrum features as the input for the model. These features are split based on a preset scale, and each segment-level feature is then fed into the CNN network for learning. The SoftMax classifier will receive the output of all feature scales, and these high-level features will be fused and fed to it to categorize different scenarios. The experiment provides evidence to support the efficacy of the FPS strategy and its potential to enhance the performance of the ASC system.
Ji XI Pengxu JIANG Yue XIE Wei JIANG Hao DING
The relevant model based on convolutional neural networks (CNNs) has been proven to be an effective solution in speech enhancement algorithms. However, there needs to be more research on CNNs based on microphone arrays, especially in exploring the correlation between networks associated with different microphones. In this paper, we proposed a CNN-based feature integration network for speech enhancement in microphone arrays. The input of CNN is composed of short-time Fourier transform (STFT) from different microphones. CNN includes the encoding layer, decoding layer, and skip structure. In addition, the designed feature integration layer enables information exchange between different microphones, and the designed feature fusion layer integrates additional information. The experiment proved the superiority of the designed structure.