Hackers' Pub

bgl gwyng @bgl@hackers.pub

이 글은 하스켈로 강화 학습을 구현하며 겪는 기술적인 고민과 해결 과정을 다룹니다. 저자는 Hasktorch 라이브러리를 사용하여 스네이크 게임을 강화 학습으로 훈련시키는 과정을 소개하며, 데이터 없이 에이전트를 학습시키는 강화 학습의 장점을 강조합니다. 특히, 에이전트와 환경을 정의하고, 보상 함수를 설계하여 뱀이 먹이를 먹도록 유도하는 방법을 설명합니다. 글에서는 즉각적인 보상과 누적 보상의 차이를 지적하며, 감쇠율을 적용하여 미래의 보상을 현재의 선택에 반영하는 방법을 제시합니다. 또한, 순수 함수로 환경을 정의하는 것의 한계를 언급하며, 환경이 에이전트를 실행할 수 있는 모나드여야 함을 강조합니다. 저자는 이 경험을 통해 얻은 인사이트를 공유하며, 강화 학습 코드를 더 효율적으로 작성하는 방법에 대한 고민을 제시합니다. 다음 글에서는 상태 모나드를 사용하여 이러한 문제점을 해결하는 방법을 소개할 예정이며, 독자들에게 모나드에 대한 사전 학습을 권장합니다.

거꾸로 상태 모나드로 강화 학습 하기 (1/2)

洪民憙 (Hong Minhee)

Juntai Park

박준규

Jaeyeol Lee

Syntax	Description	Examples
`"` keyword `"`	Finds the string within quotes, including spaces. Case-insensitive. (Escape quotes inside with `\"`)	`"Hackers' Pub"`
`from:` handle	Finds content written by the specified user.	`from:hongminhee` `from:hongminhee@hollo.social`
`lang:` ISO 639-1	Finds content written in the specified language.	`lang:en`
`#` tag	Finds content with the specified tag. Case-insensitive.	`#HackersPub`
condition condition	Finds content that satisfies both conditions on either side of the space (logical AND).	`"Hackers' Pub" lang:en`
condition `OR` condition	Finds content that satisfies at least one of the conditions on either side of the OR operator (logical OR).	`#HackersPub OR "Hackers' Pub" lang:en`
`(` condition `)`	Combines the operators within the parentheses first.	`(#HackersPub OR "Hackers' Pub" OR "Hackers Pub") lang:en`

거꾸로 상태 모나드로 강화 학습 하기 (1/2)

洪 民憙 (Hong Minhee)

Juntai Park

박준규

Jaeyeol Lee

洪民憙 (Hong Minhee)