The Amazing Race Repeated Update Q-Learning VS. Q-Learning

Hazem  Bakkar; Asma Q.  Al-Hamad; Mohammad  Bakar; Sohail  Khan; Hassan  Eleraky

Authors

Hazem Bakkar British University in Dubai, Dubai, UAE
Asma Q. Al-Hamad Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
Mohammad Bakar Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
Sohail Khan Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
Hassan Eleraky Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia

Keywords:

Q-learning, Repeated Update Q-learning algorithm, Markovian decision processes, simulation

Abstract

In this paper, we will conduct an experiment that aims to compare the performance of two reinforcement learning algorithms, the Repeated Update Q-learning algorithm (RUQL) [1] and the Q-learning algorithm(QL) [5]. A simulated version of a robot crawler developed by [6] will be used in this experiment, it is shown in figure (1). An investigation study about the difference in performance between RUQL and Q-learning algorithm (QL) [5] is discussed in this paper. Several trials and tests were conducted to estimate the difference in the crawler’s movement using both algorithms. Additionally, a detailed description of the Markovian decision processes (MDPs) elements [2] is introduced, MDP model includes states, actions and rewards for the task in hand. The parameters that were used and tuned in this experiment will be mentioned and the reasons for choosing their values will be explained. Finally, the source code for the crawler robot was modified in order to implement RUQL and Q-Learning (QL) algorithms, Eclipse [3] and Java SE Development Kit 8 (JDK) [4] are used for this purpose. After running the crawler robot simulation, the results drawn from the experiment showed that RUQL significantly outperforms the traditional QL.

References

. Abdallah, S. and Kaisers, M. (2013). Addressing the Policy-bias of Q-learning by Repeating Updates. pp.1045--1052.

. Bellman, R. (1957). A Markovian decision process.

. Guindon, C. (2016). [online] Eclipse.org. Available at: http://eclipse.org [Accessed 01 June. 2016].

. Oracle.com, (2016). Oracle | Hardware and Software, Engineered to Work Together. [online] Available at: http://oracle.com [Accessed 01 June 2016].

. Watkins, C. and Dayan, P. (1992). Q-learning. Machine learning, 8(3-4), pp.279--292.

. Berghen, F. (2016). Kranf site: research. [online] Applied-mathematics.net. Available at: http://www.applied-mathematics.net [Accessed 01 June 2016].

. Applied-mathematics.net, (2016). [online] Available at: http://www.applied-mathematics.net/qlearning/BotQLearning.java [Accessed 01 June 2016].

. Tokic, M., Ertel, W. and Fessler, J. (2009). The Crawler, A Class Room Demonstrator for Reinforcement Learning.

. Wikipedia.com, (2016). Retrieved 9 June 2016, from http://en.wikipedia.org/wiki/Eclipse_(software)

. Youtube.com, (2016). YouTube. [online] Available at: http://youtube.com [Accessed 01 June. 2016].

. Botvinick, M. (2012). Hierarchical reinforcement learning and decision making. Current Opinion In Neurobiology, 22(6), 956-962. http://dx.doi.org/10.1016/j.conb.2012.05.008.

The Amazing Race Repeated Update Q-Learning VS. Q-Learning

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Information

Developed By

Latest publications