The Amazing Race Repeated Update Q-Learning VS. Q-Learning

Authors

  • Hazem Bakkar British University in Dubai, Dubai, UAE
  • Asma Q. Al-Hamad Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
  • Mohammad Bakar Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
  • Sohail Khan Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
  • Hassan Eleraky Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia

Keywords:

Q-learning, Repeated Update Q-learning algorithm, Markovian decision processes, simulation

Abstract

In this paper, we will conduct an experiment that aims to compare the performance of two reinforcement learning algorithms, the Repeated Update Q-learning algorithm (RUQL) [1] and the Q-learning algorithm(QL) [5]. A simulated version of a robot crawler developed by [6] will be used in this experiment, it is shown in figure (1). An investigation study about the difference in performance between RUQL and Q-learning algorithm (QL) [5] is discussed in this paper. Several trials and tests were conducted to estimate the difference in the crawler’s movement using both algorithms. Additionally, a detailed description of the Markovian decision processes (MDPs) elements [2] is introduced, MDP model includes states, actions and rewards for the task in hand. The parameters that were used and tuned in this experiment will be mentioned and the reasons for choosing their values will be explained.  Finally, the source code for the crawler robot was modified in order to implement RUQL and Q-Learning (QL) algorithms, Eclipse [3] and Java SE Development Kit 8 (JDK) [4] are used for this purpose. After running the crawler robot simulation, the results drawn from the experiment showed that RUQL significantly outperforms the traditional QL.   

References

. Abdallah, S. and Kaisers, M. (2013). Addressing the Policy-bias of Q-learning by Repeating Updates. pp.1045--1052.

. Bellman, R. (1957). A Markovian decision process.

. Guindon, C. (2016). [online] Eclipse.org. Available at: http://eclipse.org [Accessed 01 June. 2016].

. Oracle.com, (2016). Oracle | Hardware and Software, Engineered to Work Together. [online] Available at: http://oracle.com [Accessed 01 June 2016].

. Watkins, C. and Dayan, P. (1992). Q-learning. Machine learning, 8(3-4), pp.279--292.

. Berghen, F. (2016). Kranf site: research. [online] Applied-mathematics.net. Available at: http://www.applied-mathematics.net [Accessed 01 June 2016].

. Applied-mathematics.net, (2016). [online] Available at: http://www.applied-mathematics.net/qlearning/BotQLearning.java [Accessed 01 June 2016].

. Tokic, M., Ertel, W. and Fessler, J. (2009). The Crawler, A Class Room Demonstrator for Reinforcement Learning.

. Wikipedia.com, (2016). Retrieved 9 June 2016, from http://en.wikipedia.org/wiki/Eclipse_(software)

. Youtube.com, (2016). YouTube. [online] Available at: http://youtube.com [Accessed 01 June. 2016].

. Botvinick, M. (2012). Hierarchical reinforcement learning and decision making. Current Opinion In Neurobiology, 22(6), 956-962. http://dx.doi.org/10.1016/j.conb.2012.05.008.

Downloads

Published

2020-12-18

How to Cite

Bakkar, H. ., Al-Hamad, A. Q. ., Bakar, M. ., Khan, S. ., & Eleraky, H. . (2020). The Amazing Race Repeated Update Q-Learning VS. Q-Learning. International Journal of Sciences: Basic and Applied Research (IJSBAR), 54(5), 119–125. Retrieved from https://www.gssrr.org/index.php/JournalOfBasicAndApplied/article/view/11890

Issue

Section

Articles