基于光照感知和特征增强的可见光-热红外图像语义分割

doi:10.12382/bgxb.2024.0587

摘要/Abstract

摘要：

在智能光电设备中,基于人工智能的可见光-热红外(Red Greed Blue-Thermal, RGB-T)图像语义分割任务可以广泛应用于自动驾驶、无人机航拍、视频监控等。图像的光照信息能在一定程度上反映场景中图像局部区域信息的可靠性,利用光照先验信息有助于进一步提高语义分割的性能。基于此,提出一种基于光照感知和特征增强的RGB-T图像语义分割模型,通过挖掘光照先验信息并结合注意力机制,引导网络在多模态图像特征融合过程中更加关注可靠信息的提取,同时抑制干扰信息的引入。实验在MFNet数据集上与最新的12种方法进行了比较,相比于性能第2的模型,mAcc提高了5.4%,mIoU提高了1.0%。所提网络模型能够获得更准确的分割结果,并通过定性定量实验验证所提模型及各个模块的有效性。

关键词: 可见光-热红外图像语义分割, 卷积神经网络, 图像先验信息, 光照感知算法, 特征增强和融合算法

Abstract:

In intelligent optoelectronic devices, the red greed blue-thermal (RGB-T) semantic segmentation tasks based on artificial intelligence can be widely applied in autonomous driving, drone aerial photography, video surveillance, and other fields. The image illumination prior information can be used to further improve the performance of semantic segmentation. An illumination perception and feature enhancement based RGB-T semantic segmentation model is presented. In the proposed model, those discriminative informations in input images can be highlighted, and those non-discriminative ones can be suppressed by employing the illumination prior information and attention mechanisms. The proposed model is compared with 12 state-of-the-art saliency models, including RGB, RGB-D and RGB-T semantic segmentation models on MFNet dataset. The quantitative evaluation metrics contain mean accuracy (mAcc) and mean intersection over union (mIoU). Compared with the second-best performing model, the proposed model achieves a 5.4% improvement in mAcc and a 1.0% improvement in mIoU. In addition, a series of ablation experiments in MFNet Dataset to are conducted to clearly show the effectiveness of different components in the proposed model. In this study, A new RGB-T semantic segmentation model, namely, illumination perception and feature enhancement network is proposed, which contains a illumination perception network, an attention interaction and feature enhancement module and a multi-scale feature interaction and fusion module. Experimental results on the public datasets demonstrate that the proposed model can achieve higher segmentation accuracy than some state-of-the-art models.

Key words: red greed blue-thermal image semantic segmentation, convolutional neural network, image prior information, illumination perception algorithm, feature enhancement and fusion algorithm

中图分类号:

TP391

刘锟龙, 王虎, 刘小强, 牛帅旭, 黄奕, 付琦, 赵涛. 基于光照感知和特征增强的可见光-热红外图像语义分割[J]. 兵工学报, 2024, 45(S1): 219-230.

LIU Kunlong, WANG Hu, LIU Xiaoqiang, NIU Shuaixu, HUANG Yi, FU Qi, ZHAO Tao. Illumination Perception and Feature Enhancement Network for RGB-T Semantic Segmentation[J]. Acta Armamentarii, 2024, 45(S1): 219-230.

图/表 13

参考文献 30

[1]	李良福, 陈卫东, 高强, 等. 基于深度学习的光电系统智能目标识别[J]. 兵工学报, 2022, 43(增刊1):162-168.
	LI L F, CHEN W D, GAO Q. Intelligent target recognition in optoelectronic systems based on deep learning[J]. Acta Armamentarii, 2022, 43(S1):162-168. (in Chinese)
[2]	OTSU N. A threshold selection method from gray-level histograms[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9(1): 62-66.
[3]	DHANACHANDRA N, MANGLEM K, CHANU Y J. 2015. Image segmentation using k-means clustering algorithm and subtractive clustering algorithm[J]. Procedia Computer Science, 2015,54764-771.
[4]	YANG F, JIANG T. Pixon-based image segmentation with Markov random fields[J]. IEEE Transactions on Image Processing, 2003, 12(12):1552-9. doi: 10.1109/TIP.2003.817242 pmid: 18244710
[5]	FROHLICH B, RODNER E, DENZLER J. Semantic segmentation with millions of features: integrating multiple cues in a combined random forest approach[C]// Proceedings of the 11th Asian Conference on Computer Vision. Berlin, Heidelberg: Springer, 218-231.
[6]	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(4):640-651.
[7]	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]// Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer, 2015.
[8]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(12): 2481-2495.
[9]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. Computer Science, 2014.10.48550/arXiv.1409.1556.
[10]	YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[C]// Proceedings of the 4th International Conference on Learning Representations. Vienna, Austria:2016.
[11]	CHEN L, PAPANDREOU G, KOKKINOS I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4):834-848.
[12]	ALDANA M O, MHR D. 2011. Infrared thermal imaging: fundamentals, research and applications[J]. Latin-American Journal of Physics Education, 2011, 5(5):1431.
[13]	宋亮, 谷玉海, 石文天. 基于改进BiSeNet的非结构化道路分割算法研究[J]. 应用光学, 2023, 44(3):556-564.
	SONG L, GU Y H, SHI W T. Research on unstructured road segmentation algorithm based on improved BiSeNet[J]. Journal of Applied Optics, 2023, 44(3):556-564. (in Chinese)
[14]	刘致驿, 孙韶媛, 任正云, 等. 基于改进DeepLabv3+的无人车夜间红外图像语义分割[J]. 应用光学, 2020, 41(1):180-185. doi: 10.5768/JAO202041.0106002
	LIU Z Y, SUN S Y, REN Z Y, et al. Semantic segmentation of night infrared images of unmanned vehicles based on improved DeepLabv3+[J]. Journal of Applied Optics, 2020, 41(1):180-185. (in Chinese)
[15]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,Nevada, US: IEEE, 2016: 770-778.
[16]	SHIVAKUMAR S S, RODRIGUES N, ZHOU A, et al. PST900: RGB-thermal calibration, dataset and segmentation network[C]// Proceedings of the 20th IEEE International Conference on Robotics and Automation. Montreal, Canada: IEEE, 2019: 9441-9447.
[17]	SUN Y X, ZUO W X, LIU M. 2019. RTFNet: RGB-thermal fusion network for semantic segmentation of urban scenes[J]. IEEE Robotics and Automation Letters, 2019: 4(3):2576-2583.
[18]	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]// Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: IEEE 2018: 3-19.
[19]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the 31th IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, US: IEEE, 2018: 7132-7141.
[20]	HA Q, WATANAB K, KARASAWA T, et al. MFNet: towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes[C]// Proceedings of the 30th IEEE/RSJ International Conference on Intelligent Robots and System. Vancouver, Canada: IEEE, 2017: 5108-5115.
[21]	DENG J, DONG W, SOCHER R, et al. Imagenet: a large-scale hierarchical image database[C]// Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition. Miami,FL, US: IEEE, 2009: 248-255.
[22]	WANG P Q, CHEN P F, YUAN Y, et al Understanding convolution for semantic segmentation[C]// Proceedings of the 1st IEEE Winter Conference on Applications of Computer Vision. Nevada, US: IEEE, 2018: 1451-1460.
[23]	FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]// Proceedings of the 32th IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach, CA, US: IEEE, 2019: 3146-3154.
[24]	SUN K, XIAO B, LIU D, et al. 2019. Deep high-resolution representation learning for human pose estimation[C]// Proceedings of the 32th IEEE/CVF Conference on Computer Vision and Pattern Recognition.Long Beach, CA, US: IEEE, 2019: 5693-5703.
[25]	HUNG S W, LO S Y, HANG H M. Incorporating luminance, depth and color information by a fusion-based network for semantic segmentation[C]// Proceedings of the 26th IEEE International Conference on Image Processing. Taiwan, China: IEEE, 2019: 2374-2378.
[26]	HU X X, YANG K L, FEI L, et al. Acnet: attention based network to exploit complementary features for RGBD semantic segmentation[C]// Proceedings of the 26th IEEE International Conference on Image Processing. Taiwan, China: IEEE, 2019: 1440-1444.
[27]	CHEN X K, LIN K, WANG J B, et al. 2020. Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation[C]// Proceedings of the 16th European Conference on Computer Vision. Milan, Italy: Springe, 2020: 561-577.
[28]	XU J T, LU K G, WANG H. Attention fusion network for multi-spectral semantic segmentation[J]. Pattern Recognition Letters, 2021, 146(4): 179-184.
[29]	LAN X, GU X J, GU X S. MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation[J]. Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies, 2022, 52: 5817-5829.
[30]	GUO Z F, LI X, XU Q M, et al. Robust semantic segmentation based on RGB-thermal in variable lighting scenes[J]. Measurement, 2021, 186:110176.

阶段	输入	具体操作	输出
	RGB输入图像 (640×480×3)	Conv(s=1,p=1),ReLU, Maxpool(k=2×2)	下采样特征图 (320×240×64)
S₁:特征图下采样	下采样特征图 (320×240×64)	Conv(s=1,p=1),ReLU, Maxpool(k=2×2)	下采样特征图 (160×120×128)
	下采样特征图 (160×120×128)	Conv(s=1,p=1),ReLU, Maxpool(k=2×2)	下采样特征图 (80×60×256)
	下采样特征图 (80×60×256)	Conv(s=1,p=1),ReLU, Maxpool(k=2×2)	下采样特征图 (40×30×256)
S₂:光照感知特征图生成	下采样特征图 (40×30×256)	Conv(s=1,p=1)	光照感知特征图 (40×30×2)
S₃:分类	光照感知特征图 (40×30×2)	全局平均池化	分类特征图 (1×1×2)

阶段	输入	具体操作	输出
	RGB输入图像 (640×480×3)	Conv(s=1,p=1),ReLU, Maxpool(k=2×2)	下采样特征图 (320×240×64)
S₁:特征图下采样	下采样特征图 (320×240×64)	Conv(s=1,p=1),ReLU, Maxpool(k=2×2)	下采样特征图 (160×120×128)
	下采样特征图 (160×120×128)	Conv(s=1,p=1),ReLU, Maxpool(k=2×2)	下采样特征图 (80×60×256)
	下采样特征图 (80×60×256)	Conv(s=1,p=1),ReLU, Maxpool(k=2×2)	下采样特征图 (40×30×256)
S₂:光照感知特征图生成	下采样特征图 (40×30×256)	Conv(s=1,p=1)	光照感知特征图 (40×30×2)
S₃:分类	光照感知特征图 (40×30×2)	全局平均池化	分类特征图 (1×1×2)

方法	无标签		车辆		人		自行车		地线		车档		护栏		色锥		凸起		mAcc	mIoU
方法	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	mAcc	mIoU
DUC^[22]	98.8	97.7	92.4	82.5	84.1	69.4	71.3	58.9	58.4	40.1	25.5	20.9	17.3	3.4	60.0	42.1	52.2	40.9	61.2	50.7
DANet^[23]	97.4	96.3	91.3	71.3	82.7	48.1	79.2	51.8	48.0	30.2	25.5	18.2	5.2	0.7	47.6	30.3	19.9	18.8	55.2	41.3
HRNet^[24]	99.4	98.0	90.8	86.9	75.1	67.3	70.2	59.2	39.1	35.3	28.0	23.1	12.1	1.7	50.4	46.6	55.8	47.3	57.9	51.7
LDFNet^[25]	96.2	95.3	87.0	67.9	83.9	58.2	82.7	37.2	67.4	30.4	32.9	20.1	8.2	0.8	67.4	27.1	55.6	46.0	64.6	42.5
ACNet^[26]	97.6	96.7	93.7	79.4	86.8	64.7	77.8	52.7	57.2	32.9	51.5	28.4	7.0	0.8	57.5	16.9	49.8	44.4	64.3	46.3
SA-Gate^[27]	98.2	96.8	86.0	73.8	80.8	59.2	69.4	51.3	56.7	38.4	24.7	19.3	0.0	0.0	56.9	24.5	52.1	48.8	58.3	45.8
MFNet^[20]	98.7	96.9	77.2	65.9	67.0	58.9	53.9	42.9	36.2	29.9	12.5	9.9	0.1	0.0	30.3	25.2	30.0	27.7	45.1	39.7
RTFNet^[17]	99.6	98.2	91.3	86.3	78.2	67.8	71.5	58.2	59.8	43.7	32.1	24.3	13.4	3.6	40.4	26.0	73.5	57.2	62.2	54.6
PSTNet^[18]	—	97.0	—	76.8	—	52.6	—	55.3	—	29.6	—	25.1	—	15.1	—	39.4	—	45.0	—	48.4
AFNet^[28]	—	—	91.2	86.0	76.3	67.4	72.8	62.0	49.8	43.0	35.3	28.9	24.5	4.6	50.1	44.9	61.0	56.6	62.2	54.6
MMNet^[29]	—	—	—	83.9	—	69.3	—	59.0	—	43.2	—	24.7	—	4.6	—	42.2	—	50.7	62.7	52.8
MLFNet^[30]	—	—	—	82.3	—	68.1	—	67.3	—	27.3	—	30.4	—	15.7	—	55.6	—	40.1	—	53.8
本文算法	99.0	97.9	92.4	85.3	82.1	71.6	73.5	60.7	61.7	46.1	41.0	31.4	38.0	7.1	54.6	47.3	71.2	46.0	68.1	54.8

方法	无标签		车辆		人		自行车		地线		车档		护栏		色锥		凸起		mAcc	mIoU
方法	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	Acc	IoU	mAcc	mIoU
DUC^[22]	98.8	97.7	92.4	82.5	84.1	69.4	71.3	58.9	58.4	40.1	25.5	20.9	17.3	3.4	60.0	42.1	52.2	40.9	61.2	50.7
DANet^[23]	97.4	96.3	91.3	71.3	82.7	48.1	79.2	51.8	48.0	30.2	25.5	18.2	5.2	0.7	47.6	30.3	19.9	18.8	55.2	41.3
HRNet^[24]	99.4	98.0	90.8	86.9	75.1	67.3	70.2	59.2	39.1	35.3	28.0	23.1	12.1	1.7	50.4	46.6	55.8	47.3	57.9	51.7
LDFNet^[25]	96.2	95.3	87.0	67.9	83.9	58.2	82.7	37.2	67.4	30.4	32.9	20.1	8.2	0.8	67.4	27.1	55.6	46.0	64.6	42.5
ACNet^[26]	97.6	96.7	93.7	79.4	86.8	64.7	77.8	52.7	57.2	32.9	51.5	28.4	7.0	0.8	57.5	16.9	49.8	44.4	64.3	46.3
SA-Gate^[27]	98.2	96.8	86.0	73.8	80.8	59.2	69.4	51.3	56.7	38.4	24.7	19.3	0.0	0.0	56.9	24.5	52.1	48.8	58.3	45.8
MFNet^[20]	98.7	96.9	77.2	65.9	67.0	58.9	53.9	42.9	36.2	29.9	12.5	9.9	0.1	0.0	30.3	25.2	30.0	27.7	45.1	39.7
RTFNet^[17]	99.6	98.2	91.3	86.3	78.2	67.8	71.5	58.2	59.8	43.7	32.1	24.3	13.4	3.6	40.4	26.0	73.5	57.2	62.2	54.6
PSTNet^[18]	—	97.0	—	76.8	—	52.6	—	55.3	—	29.6	—	25.1	—	15.1	—	39.4	—	45.0	—	48.4
AFNet^[28]	—	—	91.2	86.0	76.3	67.4	72.8	62.0	49.8	43.0	35.3	28.9	24.5	4.6	50.1	44.9	61.0	56.6	62.2	54.6
MMNet^[29]	—	—	—	83.9	—	69.3	—	59.0	—	43.2	—	24.7	—	4.6	—	42.2	—	50.7	62.7	52.8
MLFNet^[30]	—	—	—	82.3	—	68.1	—	67.3	—	27.3	—	30.4	—	15.7	—	55.6	—	40.1	—	53.8
本文算法	99.0	97.9	92.4	85.3	82.1	71.6	73.5	60.7	61.7	46.1	41.0	31.4	38.0	7.1	54.6	47.3	71.2	46.0	68.1	54.8

B	I.W.	AIFE	MFIF	mAcc/%	MIoU/%
√				58.2	49.4
√	√	√		59.2	51.2
√			√	63.2	51.1
√		√	√	64.3	51.9
√	√	√	√	68.1	54.8