Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
ยท
๐Ÿฆ„AI/Paper
Background๋…ผ๋ฌธ์„ ์ฝ๊ธฐ ์ „์— ํ•œ๋ฒˆ ๋˜์งš์–ด๋ณด๊ธฐGradient DescentBatch Gradient Descent์ผ๋ฐ˜์ ์œผ๋กœ ๊ทธ๋ƒฅ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์ด๋ผ ํ•˜๋ฉด ์ด๊ฑธ ๋– ์˜ฌ๋ฆฐ๋‹ค.iterationํ•œ๋ฒˆ์— ์ „์ฒด training dataset์„ ์‚ฌ์šฉํ•˜์—ฌ์„œ gradient๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.์„ธํƒ€ : ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์•ŒํŒŒ: learning rateN: training dataset ํฌ๊ธฐl : loss func์ด๋ฆ„์— batch๊ฐ€ ๋“ค์–ด๊ฐ€์„œ ์‚ด์ง ํ˜ผ๋™๋  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๊ทธ๋ƒฅ batch๋ฅผ total trainig dataset์œผ๋กœ ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.์ „์ฒด dataset์„ ์“ฐ๊ธฐ ๋•Œ๋ฌธ์— convergence๊ฐ€ ์•ˆ์ •์ ์ด๋ผ๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ๋‹ค๋งŒ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋งŽ์ด ์“ฐ๊ณ , lacal optima๋กœ ์ˆ˜๋ ด๋˜๋Š” ๊ฒฝ์šฐ์— ๋น ์ ธ๋‚˜์˜ค๊ธฐ ์–ด๋ ต๋‹ค.Stochastic Gradien..
Network In Network (NIN)
ยท
๐Ÿฆ„AI/Paper
CNNCNN์€ conv layer์™€ pooling layer๋ฅผ ๋ฒˆ๊ฐˆ์•„ ๋ฐฐ์น˜ํ•˜๋Š” ๊ตฌ์กฐ๋กœ ํ˜•์„ฑ๋˜์–ด์žˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด์„œ ์ž…๋ ฅ๋˜๋Š” ์ด๋ฏธ์ง€์˜ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ  , ์ดํ›„์— FC layer๋ฅผ ํ†ตํ•ด์„œ ๋ถ„๋ฅ˜์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.ํ•˜์ง€๋งŒ ์ด๋Ÿฐ CNN ์—๋Š” ๋ช‡๊ฐ€์ง€ ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.๋น„์„ ํ˜•์„ฑ์˜ ์ œํ•œConvolution filter๋ฅผ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ๋กœ์ปฌํ•œ ์„ ํ˜• ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๋ถ€๋ถ„์—์„œ๋Š” ๋น„์„ ํ˜•์„ฑ์ด ๋–จ์–ด์ง„๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ์ฆ‰, ์ด๋กœ ์ธํ•ด์„œ ๋ณต์žํ•œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํŒจํ„ด์„ ์บก์ณํ•˜๋Š”๊ฒƒ์— ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.์™„์ „ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด ์˜ค๋ฒ„ํ”ผํŒ…FC layer๋Š” ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•„์š”ํ•œ layer์ด๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฅผ ํ†ตํ•ด์„œ ์˜ค๋ฒ„ํ”ผํŒ…์ด ๋ฐœ์ƒํ•˜๊ฒŒ ๋œ๋‹ค๋Š” ํ•œ๊ณ„๊ฐ€ ์žˆ๋‹ค.๊ณต๊ฐ„์  ์ •๋ณด์˜ ์†์‹ค๋˜ํ•œ FC layer์—์„œ vectorํ˜•ํƒœ๋Š” ์ด๋ฏธ์ง€์˜ ๊ณต๊ฐ„์ ์ธ..