[CS234]Lecture 2. ๋งˆ๋ฅด์ฝ”ํ”„ ๊ฒฐ์ • ํ”„๋กœ์„ธ์Šค
ยท
๐Ÿฆ„AI/Reinforcement Learning
์ €๋ฒˆ ์‹œ๊ฐ„์— ์ˆœ์ฐจ๊ฒฐ์ •๋ฌธ์ œ ์ค‘ ๋ถˆ์—ฐ์† ๋ฌธ์ œ๋Š” ๋งˆ๋ฅด์ฝ”ํ”„ ๊ฒฐ์ • ํ”„๋กœ์„ธ์Šค(Markov Decision Process, MDP)๋ฅผ ํ†ตํ•ด ์ˆ˜ํ•™์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ–ˆ๋‹ค. ๋˜ํ•œ ์ˆœ์ฐจ ๊ฒฐ์ • ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์šฐ๋ฆฌ๋Š” maximize total expected future reward๋ฅผ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๊ธฐ์–ตํ•˜์ž. ์ด๋ฒˆ ๊ฐ•์˜์—์„œ๋Š” ๋งˆ๋ฅด์ฝ”ํ”„ ๊ฒฐ์ • ํ”„๋กœ์„ธ์Šค๋ฅผ ๋ช…ํ™•ํžˆ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด, ๋งˆ๋ฅด์ฝ”ํ”„ ํ”„๋กœ์„ธ์Šค์—์„œ MDP๊นŒ์ง€์˜ ๋ฐœ์ „ ๊ณผ์ •์„ ์‚ดํ”ผ๊ณ , MDP์˜ Control ๊ณผ Evaluation์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณธ๋‹ค.  0. ๋งˆ๋ฅด์ฝ”ํ”„ ํ”„๋กœ์„ธ์Šค์˜ ๋ฐœ์ „๋งˆ๋ฅด์ฝ”ํ”„ ๊ฒฐ์ • ํ”„๋กœ์„ธ์Šค(MDP)๋Š” ๋งˆ๋ฅด์ฝ”ํ”„ ํ”„๋กœ์„ธ์Šค์—์„œ ์ถœ๋ฐœํ•˜์—ฌ ํ™•์žฅ๋˜์—ˆ๋‹ค.Markov Process > Markov Reward Process > Markov Decision Proce..
[CS234] Lecture 1. Reinforcement Learning
ยท
๐Ÿฆ„AI/Reinforcement Learning
Reinforcement Learningโœ… Learning through experience/data to make good decisions under uncertainty1950๋…„๋Œ€ Richard Bellman์— ์˜ํ•ด ๋ฐœ์ „ํ•จex)atari game(video game)Goplasma control for fusion sciencechatGPTInvolvesOptimizationDelayed ConsequencesExplorationGeneralizationPolicy์ฐจ์ด์ AI PlanningImitation Learning AI PlanningSupervisedUnsupervisedReinforcementImitaionOptimizationOO OOLearns from experience O(b..