Coal Engineering ›› 2022, Vol. 54 ›› Issue (9): 105-111.doi: 10.11799/ce202209019

Previous Articles     Next Articles

Intelligent decision-making of top coal caving based on mean deviation reward function

  

  • Received:2022-03-15 Revised:2022-05-07 Online:2022-09-15 Published:2022-10-09

Abstract: In the process of top coal caving intelligently, how to find the environment states and the action decision mechanism is the key to control the window behind the hydraulic supports. This paper extracts the actions of the top coal caving as a Markov decision process by the spatial layout of the hydraulic supports and the characteristics of the windows action. Meanwhile, the reinforcement framework learning is employed to determine the optimal action of windows when top coal caving, in which the Q-learning algorithm is adopted online to learn the mapping between the state of top coal and the action of the windows without preparing huge training samples. In the methodology, a new reward function based on mean deviation is designed for Q-learning to maintain the coal-rock boundary settlement uniformly when top coal caving. In the top coal caving dynamic process, the agents are guided by the reward function to learn how to control the shape of the coal-rock boundary, therefore the action coordination of the agents is reinforced to improve the effectiveness of the top-coal caving. Finally, a three-dimensional simulation experiment platform based on YADE discrete element analysis method is created in the Linux system, and the effectiveness of the proposed methodology is demonstrated by the experiment of cutting the coalface continuously. The experimental results show that the coal-rock boundary driven by our method is flatter during the coal falling, and the average reward of the agent for top coal caving can reach 13467.8. The reward of our method is 8.8% higher than the Q-learning method and 10% higher than the single-round sequential coal caving process.

CLC Number: