WEKO3
アイテム
Accelerate Learning Processes by Avoiding Inappropriate Rules in Transfer Learning for Actor-Critic
http://hdl.handle.net/10076/11661
http://hdl.handle.net/10076/11661e2717c9b-500b-4ee5-b20f-9a98aba2f30f
名前 / ファイル | ライセンス | アクション |
---|---|---|
![]() |
|
Item type | 紀要論文 / Departmental Bulletin Paper(1) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
公開日 | 2011-11-08 | |||||||||||||
タイトル | ||||||||||||||
タイトル | Accelerate Learning Processes by Avoiding Inappropriate Rules in Transfer Learning for Actor-Critic | |||||||||||||
言語 | en | |||||||||||||
言語 | ||||||||||||||
言語 | eng | |||||||||||||
キーワード | ||||||||||||||
主題Scheme | Other | |||||||||||||
主題 | Reinforcement learning / actor-critic method / Transfer learning | |||||||||||||
資源タイプ | ||||||||||||||
資源タイプ識別子 | http://purl.org/coar/resource_type/c_6501 | |||||||||||||
資源タイプ | departmental bulletin paper | |||||||||||||
著者 |
TAKANO, Toshiaki
× TAKANO, Toshiaki
× TAKASE, Haruhiko
× KAWANAKA, Hiroharu
× TSURUOKA, Shinji
|
|||||||||||||
抄録 | ||||||||||||||
内容記述タイプ | Abstract | |||||||||||||
内容記述 | This paper aims to accelerate processes of actor-critic method, which is one of major reinforcement learning algorithms, by a transfer learning. In general, reinforcement learning is used to solve optimization problems. Learning agents acquire a policy to accomplish the target task autonomously. To solve the problems, agents require long learning processes for trial and error. Transfer learning is one of effective methods to accelerate learning processes of machine learning algorithms. It accelerates learning processes by using prior knowledge from a policy for a source task. We propose an effective transfer learning algorithm for actor-critic method. Two basic issues for the transfer learning are method to select an effective source policy and method to reuse without negative transfer. In this paper, we mainly discuss the latter. We proposed the reuse method which based on the selection method that uses the forbidden rule set. Forbidden rule set is the set of rules that cause immediate failure of tasks. It is used to foresee similarity between a source policy and the target policy. Agents should not transfer the inappropriate rules in the selected policy. In actor-critic, a policy is constructed by two parameter sets: action preferences and state values. To avoid inappropriate rules, agents reuse only reliable action preferences and state values that imply preferred actions. We perform simple experiments to show the effectiveness of the proposed method. In conclusion, the proposed method accelerates learning processes for the target tasks. |
|||||||||||||
書誌情報 |
Proceedings of the Second International Workshop on Regional Innovation Studies : (IWRIS2010) 号 2, p. 55-58, 発行日 2011-10-01 |
|||||||||||||
フォーマット | ||||||||||||||
内容記述タイプ | Other | |||||||||||||
内容記述 | application/pdf | |||||||||||||
著者版フラグ | ||||||||||||||
出版タイプ | VoR | |||||||||||||
出版タイプResource | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |||||||||||||
出版者 | ||||||||||||||
出版者 | Graduate School of Regional Innovation Studies, Mie University | |||||||||||||
資源タイプ(三重大) | ||||||||||||||
値 | Departmental Bulletin Paper / 紀要論文 |