多特征下室内声源定位的复合模型粒子滤波

doi:10.12382/bgxb.2022.0849

摘要/Abstract

摘要：

为提高混响噪声下声源定位的精度和稳健性,提出多特征复合模型粒子滤波算法。该算法以麦克风接收信号的多特征构建似然函数,采用卷积神经网络提取多假设时延估计图像的深度特征,建立基于支持向量回归的时延估计模型;引入波束输出能量融合机制,弥补单特征不能同时抑制噪声和混响的缺陷。针对说话人运动随机性的问题建立声源跟踪的复合模型,改善说话人跟踪系统的鲁棒性。仿真和实测结果表明:在复合模型跟踪下,多特征算法比可控响应功率时延估计算法位置平均均方根误差减少83%以上;在多特征观测下,复合模型比郎之万模型和随机行走模型位置平均均方根误差减少46%以上;新算法实现了对复杂环境下随机运动声源的有效跟踪。

关键词: 室内声源定位, 多特征, 时延估计, 复合模型, 粒子滤波

Abstract:

A multi-feature-based composite model particle filter algorithm is proposed to improve the accuracy and robustness of sound source location in reverberation and noise environment. In this algorithm, the likelihood function of the particle filter is constructed based on the multiple features of signal received by a microphone, where the depth features of multiple hypothesis time-delay estimated image are extracted by convolutional neural network (CNN), and a time-delay estimation model based on support vector regression (SVR) is established. Furthermore, the deficiency that single feature can’t suppress noise and reverberation simultaneously is remedied by introducing the beam output energy fusion mechanism. For the randomness of speaker motion, a composite model for sound source tracking is established to improve the robustness of speaker tracking system. The simulated and experimental results show that, based on the composite model, the position average root mean square error (RMSE) of multi-feature algorithm is reduced by more than 83% compared with that of steered response power and time delay estimation (SRPTDE) algorithm, and under multi-feature observation, the position average RMSE of composite model is reduced by more than 46% compared with that of Langevin model and the random walking model. The proposed algorithm realizes the effective tracking of random moving sound sources in complex environment.

Key words: indoor sound source localization, multi-feature, time delay estimation, composite model, particle filter

中图分类号:

TN272

刘望生, 潘海鹏, 王明环. 多特征下室内声源定位的复合模型粒子滤波[J]. 兵工学报, 2024, 45(3): 975-985.

LIU Wangsheng, PAN Haipeng, WANG Minghuan. Composite Model Particle Filter for Indoor Sound Source Location Based on Multi-feature[J]. Acta Armamentarii, 2024, 45(3): 975-985.

图/表 13

表1 CNN结构

Table 1 CNN structure

网络层	网络参数
输入层	输入维数:28×28
卷积层H₁	卷积面个数:6;卷积核大小:5×5
池化层H₂	过滤器尺寸:2×2;滑动步长:2
卷积层H₃	卷积面个数:12;卷积核大小:5×5
池化层H₄	过滤器尺寸:2×2;滑动步长:2
全连接层H₅	输出维数:192×1

图1 室内麦克风布置图

Fig.1 Indoor microphone layout

图2 平均RMSE与混响时间的关系

Fig.2 Average RMSE versus reverberation time

图3 平均RMSE与信噪比的关系

Fig.3 Average RMSE versus SNR

图4 转向机动位置估计RMSE

Fig.4 Estimated RMSE of steering maneuver location

图5 圆弧机动位置估计RMSE

Fig.5 Estimated RMSE of circular arc maneuver location

表2 平均均方根误差比较

Table 2 Comparison of average root mean square errors

算法	转向机动位置平均RMSE/m	圆弧机动位置平均RMSE/m
MFL	0.1976	0.1538
MFRW	0.2065	0.1543
MFCV	0.4073	0.5511
MFCM	0.0841	0.0819

表3 算法性能比较

Table 3 Performance comparison of algorithm

算法	空间复杂度	时间复杂度	运行时间/s
MFL	O(n⁴n_a+p)	O(n⁴n_a+p)	2.4757
MFRW	O(n⁴n_a+p)	O(n⁴n_a+p)	2.1241
MFCV	O(n⁴n_a+p)	O(n⁴n_a+p)	2.2353
MFCM	O(n⁴n_a+p)	O(n⁴n_a+p)	12.141

图6 转向轨迹

Fig.6 Steering track

图7 转向轨迹位置估计RMSE

Fig.7 Estimated RMSE of steering track location

图8 圆弧轨迹

Fig.8 Circular arc track

图9 圆弧轨迹位置估计RMSE

Fig.9 Estimated RMSE of circular arc tracklocation estimation

表4 实际轨迹平均均方根误差比较

Table 4 Comparison of average root mean square errors of actual trajectory

算法	转向机动位置平均RMSE/m	圆弧机动位置平均RMSE/m
SRPTDECM	1.1414	2.5258
MFL	1.8893	0.6776
MFRW	0.8011	0.4827
MFCM	0.1917	0.1991

参考文献 23

[1]	POLITIS A, MESAROS A, ADAVANNE S, et al. Overview and evaluation of sound event localization and detection in DCASE2019[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 684-698. doi: 10.1109/TASLP.6570655 URL
[2]	SALVATI D, DRIOLI C, FORESTI G L. Acoustic source localization using a geometrically sampled grid SRP-PHAT algorithm with Max-Pooling operation[J]. IEEE Signal Processing Letters, 2022, 29:1828-1832. doi: 10.1109/LSP.2022.3199662 URL
[3]	EVERS C, LOELLMANN H, MELLMANN H, et al. The LOCATA challenge: acoustic source localization and tracking[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28:1620-1643. doi: 10.1109/TASLP.6570655 URL
[4]	黄惠祥, 郭秋涵, 童峰. 基于分布式压缩感知的麦克风阵列声源定位[J]. 兵工学报, 2019, 40(8):1725-1731. doi: 10.3969/j.issn.1000-1093.2019.08.023
	HUANG H X, GUO Q H, TONG F. Microphone array sound source direction of arrival estimation based on distributed compressed sensing[J]. Acta Armamentarii, 2019, 40(8):1725-1731. (in Chinese)
[5]	李保伟, 张兴敢. 基于广义互相关改进的麦克风阵列声源定位方法[J]. 南京大学学报(自然科学), 2020, 56(6):917-922.
	LI B W, ZHANG X G. Improved microphone array sound source localization method based on generalized cross correlation[J]. Journal of Nanjing University(Natueal Sciences), 2020, 56(6):917-922. (in Chinese)
[6]	RANL, KANG M S, KIM B H, et al. Sound source localization based on GCC-PHAT with diffuseness mask in noisy and reverberant environments[J]. IEEE Access, 2020(8):7373-7382.
[7]	MARXIM R B B, MOHANTY A R. Time delay estimation in reverberant and low SNR environment by EMD based maximum likelihood method[J]. Measurement, 2019, 137: 655-663. doi: 10.1016/j.measurement.2019.01.096 URL
[8]	LEVY A, GANNOT S, HABETS E A P. Multiple-hypothesis extended particle filter for acoustic source localization in reverberant environments[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(6):1540-1555. doi: 10.1109/TASL.2010.2093517 URL
[9]	KWAK Y, KIM D, HAM H, et al. Convolutional neural network trained with synthetic pseudo-images for detecting an acoustic source[J]. Applied Acoustics, 2021, 179(6):1-7.
[10]	CHAKRABARTY S, HABETS E A P. Multi-speaker DOA estimation using deep convolutional networks trained with noise signals[J]. IEEE Journal of Selected Topics in Signal Processing, 2019, 13(1):8-21. doi: 10.1109/JSTSP.4200690 URL
[11]	VERA-DIAZ J M, PIZARRO D, MACIAS-GUARASA J. Acoustic source localization with deep generalized cross correlations[J]. Signal Processing, 2021, 187(2):1-22.
[12]	PERTILA P, PARVIAINEN M. Time difference of arrival estimation of speech signals using deep neural networks with integrated time-frequency masking[C]//Proceedings of IEEE International Conferenceon Acoustics, Speech and Signal Processing. Brighton,UK:IEEE, 2019: 436-440.
[13]	JOHANSSON A M, LEHMANN E A. Evolutionary optimization of dynamics models in sequential Monte Carlo target tracking[J]. IEEE Transactions on Evolutionary Computation, 2009, 13(4):879-894. doi: 10.1109/TEVC.2009.2017518 URL
[14]	侯代文, 殷福亮. 基于粒子滤波的交互式多模型说话人跟踪方法[J]. 电子学报, 2010, 38(4):835-841.
	HOU D W, YIN F L. An IMM particle filtering method for speaker tracking[J]. Acta Electronica Sinica, 2010, 38(4): 835-841. (in Chinese)
[15]	王文秀, 傅雨田, 董峰, 等. 基于深度卷积神经网络的红外船只目标检测方法[J]. 光学学报, 2018, 38(7):160-166.
	WANG W X, FU Y T, DONG F, et al. Infrared ship target detection method based on deep convolution neural network[J]. Acta Optica Sinica, 2018, 38(7):160-166. (in Chinese)
[16]	HAYKIN S. Neural networksand learning machines[M]. 3rd edition. NJ, US: Earson Education, Inc., 2009.
[17]	蔡卫平, 吴镇扬. 一种基于粒子滤波的鲁棒声源跟踪算法[J]. 电子测量与仪器学报, 2010, 24(5):407-413.
	CAI W P, WU Z Y. Robust acoustic source tracking algorithm based on particle filtering[J]. Journal of Electronic Measurement and Instrument, 2010, 24(5): 407-413. (in Chinese) doi: 10.3724/SP.J.1187.2010.00407 URL
[18]	BOORAR, DHULLS K. Iterative modified SRP-PHAT with adaptive search space for acoustic source localization[J]. IETE Technical Review, 2022, 39(1):28-36. doi: 10.1080/02564602.2020.1819895 URL
[19]	曹洁, 李军, 李伟, 等. 基于自适应有限差分粒子滤波的说话人跟踪[J]. 兰州理工大学学报, 2012, 38(5):93-97.
	CAO J, LI J, LI W, et al. Speaker tracking based on adaptive finite-difference particle filteration[J]. Journal of Lanzhou University of Technology, 2012, 38(5):93-97. (in Chinese)
[20]	ARULAMPALAM M S, MASKELL S, GORDON N, et al. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesial tracking[J]. IEEE Transactions Signal Processing, 2002, 50(2):174-188. doi: 10.1109/78.978374 URL
[21]	LEHMANNEA, JOHANSSONA M. Prediction of energy decay in room impulse responses simulated with an image-source model[J]. Journal of the Acoustical Society of America, 2008, 124(1):269-277. doi: 10.1121/1.2936367 pmid: 18646975
[22]	TIAN Y, CHEN Z, YIN F L. Distributed Kalman filter-based speaker tracking in microphone array networks[J]. Applied Acoustics, 2015, 89(3):71-77. doi: 10.1016/j.apacoust.2014.09.004 URL
[23]	TRANSFELD P, MARTENS U, BINDER H, et al. Acoustic event source localization for surveillance in reverberant environments supported by an event onset detection[C]//Proceedings of IEEE International Conferenceon Acoustics, Speech and Signal Processing.Brisbane, Australia:IEEE, 2015: 2629-2633.