煤炭工程 ›› 2024, Vol. 56 ›› Issue (2): 206-212.doi: 10. 11799/ ce202402030

• 研究探讨 • 上一篇    下一篇

基于ERNIE-BiGRU-CRF模型的煤矿安全隐患命名实体智能识别研究

刘飞翔,李泽荃,赵嘉良,等   

  1. 1. 华北科技学院 矿山安全学院
    2. 应急管理部华北科技学院
  • 收稿日期:2023-07-13 修回日期:2023-12-24 出版日期:2024-02-20 发布日期:2024-02-29
  • 通讯作者: 李泽荃 E-mail:lzquancumtb@126.com

Research on intelligent recognition of named entities of coal mine safety hidden danger based on ERNIE-BiGRU-CRF model

飞翔 Lewis刘,   

  • Received:2023-07-13 Revised:2023-12-24 Online:2024-02-20 Published:2024-02-29

摘要: 为充分挖掘煤矿安全隐患文本关键知识, 帮助煤矿企业安全管理人员更好的开展隐患排查治理工作, 提出一种基于预训练语言模型的命名实体识别方法。首先定义煤矿安全隐患实体类别, 并采用BIO 标注策略构建了7 个实体类别和15 个实体标签; 然后将收集到的煤矿隐患排查数据进行预处理, 由煤矿安全领域专家人工标注相关实体, 得到1500 条煤矿安全隐患命名实体标准数据集; 最后采用ERNIE 预训练模型对煤矿安全隐患文本词向量进行表征、同时利用BiGRU 结构进行上下文语义特征提取以及CRF 模型进行实体标签解码, 完成煤矿安全隐患命名实体识别研究。实验结果表明: ERNIE-BiGRU-CRF 模型在序列标注任务上的精确率、召回率和F1 值分别为56. 69%、69. 23%和62. 34%, 较于BiLSTM-CRF 基线模型分别提高了6. 85%、13. 74%和9. 83%,并且实体抽取结果与实际标注结果相差不大。另外, 消融实验也验证了BiGRU 层能够更好的捕捉煤矿安全隐患文本上下文语义依赖关系以及CRF 层能够进一步优化标签序列的有效性。

关键词: 煤矿安全隐患, ERNIE-BiGRU-CRF算法模型, 命名实体识别, 信息抽取

Abstract: In order to fully explore the key text knowledge of coal mine safety hidden danger and help the safety management personnel of coal mine enterprises to better carry out hidden danger investigation and management work, a named entity recognition method based on pre-training language model is proposed. Firstly, entity categories of coal mine safety hidden danger were defined according to the new version of Coal Mine Safety Regulations and Criteria for Determining Potential Major Accidents in Coal Mines, and 7 entity categories and 15 entity labels were constructed using BIO labeling strategy. Then, the collected data of coal mine hidden danger investigation are preprocessed, and relevant entities are manually marked by experts in the field of coal mine safety, and 1500 standard data sets of named entities for coal mine safety hidden danger are obtained. Finally, the ERNIE pre-training model is used to represent the text word vector of coal mine safety hidden danger, and the BiGRU structure is used to extract the context semantic features and the CRF model is used to decode the entity label, and complete the research on the named entity recognition of coal mine safety hidden danger. The experimental results show that: The accuracy rate, recall rate and F1 value of ERNIE-BiGRU-CRF model on sequence labeling tasks are 56.69%, 69.23% and 62.34%, which are respectively 6.85%, 13.74% and 9.83% higher than baseline model of BiLSTM-CRF. And there is little difference between the entity extraction result and the actual label result. In addition, the ablation experiment also verified that BiGRU layer could better capture semantic dependencies of text context for coal mine safety hidden danger and CRF layer could further optimize label sequence. It can be seen that the named entity recognition model based on the ERNIE-BiGRU-CRF algorithm structure has a good entity recognition result in the text information extraction of coal mine safety hidden danger, which provides convenience for the accomplishment of intelligent management of coal mine safety hidden danger.

中图分类号: