煤炭工程 ›› 2022, Vol. 54 ›› Issue (9): 105-111.doi: 10.11799/ce202209019

• 研究探讨 • 上一篇    下一篇

基于均值偏差奖赏函数的放煤口控制策略研究

罗开成,高阳,杨艺,常亚军,袁瑞甫   

  1. 1. 郑州煤机液压电控有限公司
    2. 河南理工大学电气工程与自动化学院
    3. 河南理工大学
    4. 河南理工大学能源与工程学院
  • 收稿日期:2022-03-15 修回日期:2022-05-07 出版日期:2022-09-15 发布日期:2022-10-09
  • 通讯作者: 高阳 E-mail:2067541373@qq.com

Intelligent decision-making of top coal caving based on mean deviation reward function

  • Received:2022-03-15 Revised:2022-05-07 Online:2022-09-15 Published:2022-10-09

摘要: 根据液压支架的空间布局以及放煤口动作过程的特性,将放煤过程抽象为马尔科夫决策过程。同时,以强化学习为框架,在无需样本训练的情况下,利用Q-learning算法在线学习顶煤赋存状态与放煤口动作之间的映射关系,从而实现放煤口动作的最优决策。为保证放煤过程中煤岩分界面均匀下降,在Q-learning算法中设计了一种基于均值偏差的奖赏函数,并在Linux系统中建立了工作面连续进刀放煤三维仿真实验平台,对算法的有效性进行了验证。实验结果表明,基于均值偏差奖赏函数学习到的放煤口控制策略,能够保证在放顶煤过程中煤岩分界面更加均匀地下降。在工作面连续进刀放煤条件下,基于均值偏差奖赏函数Q-learning的智能放煤工艺,放煤平均奖励可达13467.8,比原Q-learning智能放煤工艺提高8.8%,比单轮顺序放煤等传统工艺提高约10%。

Abstract: In the process of top coal caving intelligently, how to find the environment states and the action decision mechanism is the key to control the window behind the hydraulic supports. This paper extracts the actions of the top coal caving as a Markov decision process by the spatial layout of the hydraulic supports and the characteristics of the windows action. Meanwhile, the reinforcement framework learning is employed to determine the optimal action of windows when top coal caving, in which the Q-learning algorithm is adopted online to learn the mapping between the state of top coal and the action of the windows without preparing huge training samples. In the methodology, a new reward function based on mean deviation is designed for Q-learning to maintain the coal-rock boundary settlement uniformly when top coal caving. In the top coal caving dynamic process, the agents are guided by the reward function to learn how to control the shape of the coal-rock boundary, therefore the action coordination of the agents is reinforced to improve the effectiveness of the top-coal caving. Finally, a three-dimensional simulation experiment platform based on YADE discrete element analysis method is created in the Linux system, and the effectiveness of the proposed methodology is demonstrated by the experiment of cutting the coalface continuously. The experimental results show that the coal-rock boundary driven by our method is flatter during the coal falling, and the average reward of the agent for top coal caving can reach 13467.8. The reward of our method is 8.8% higher than the Q-learning method and 10% higher than the single-round sequential coal caving process.

中图分类号: