Chap.04 ์‹ ๊ฒฝ๋ง ํ•™์Šต

2022. 1. 7. 12:56

4-0. Intro 

๐Ÿ’ก ํ•™์Šต์ด๋ž€?
ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ตœ์ ๊ฐ’์„ ์ž๋™์œผ๋กœ ํš๋“ํ•˜๋Š” ๊ฒƒ 

์ด๋ฒˆ ์žฅ์—์„œ๋Š” ์‹ ๊ฒฝ๋ง์ด ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ด์ฃผ๋Š” ์ง€ํ‘œ์ธ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์†Œ๊ฐœํ•œ๋‹ค.


4-1. ๋ฐ์ดํ„ฐ ํ•™์Šต 

๊ธฐ๊ณ„ํ•™์Šต์˜ ์ค‘์‹ฌ์—๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์กด์žฌํ•œ๋‹ค. ๊ธฐ๊ณ„ํ•™์Šต์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋“ค์„ ๋ถ„์„ํ•˜๊ณ  ํŠน์ง•๋“ค์„ ์ถ”์ถœํ•˜์—ฌ ์˜ˆ์ธก์„ ํ•œ๋‹ค. 

์‹ ๊ฒฝ๋ง์—์„œ๋Š” ์ด๋Ÿฐํ•œ ํŠน์ง•์„ ์„ ์ •ํ•˜๋Š” ๊ฒƒ์„ ์ž๋™์ ์œผ๋กœ ํ•ด์ค€๋‹ค.

์œ„์— ๊ทธ๋ฆผ์€ ์‚ฌ๋žŒ, ๊ธฐ๊ณ„ํ•™์Šต, ์‹ ๊ฒฝ๋ง์˜ ์ฐจ์ด๋ฅผ ์ง๊ด€์ ์œผ๋กœ ๋ณด์—ฌ์ค€๋‹ค. 

์‚ฌ๋žŒ์€ ์ง์ ‘ ๋ˆˆ์œผ๋กœ ๋ณด๋ฉฐ ํŠน์ง•์„ ์ฐพ์•„๋‚ด๊ณ  ๊ฒ€์ถœํ•˜์ง€๋งŒ ๊ธฐ๊ณ„๋ฝ์Šต์€ ์‚ฌ๋žŒ์ด ์ƒ๊ฐํ•œ ํŠน์ง•์„ ํ† ๋Œ€๋กœ ๊ธฐ๊ณ„ํ•™์Šต์„ ํ•œ๋‹ค. ๋ฐ˜๋ฉด์— ์‹ ๊ฒฝ๋ง์—์„œ๋Š” ๋ฐ์ดํ„ฐ์˜ ํŠน์ง•์„ ์ž๋™์œผ๋กœ ์„ ์ •ํ•˜๋ฉฐ ๊ฒฐ๊ณผ๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณผ์ •์˜ ๋”ฅ๋Ÿฌ๋‹์„ ์ข…๋‹จ๊ฐ„ ๊ธฐ๊ณ„ํ•™์Šต์ด๋ผ๊ณ  ํ•œ๋‹ค.

 

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ์‹œํ—˜ ๋ฐ์ดํ„ฐ 

๋ณดํ†ต ๊ธฐ๊ณ„ํ•™์Šต์—์„œ๋Š” ๋ฒ”์šฉ๋Šฅ๋ ฅ์„ ์ œ๋Œ€๋กœ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ์‹œํ—˜ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„๋ฆฌํ•œ๋‹ค.

๐Ÿ’ก ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ์‹œํ—˜ ๋ฐ์ดํ„ฐ 
ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋Š” ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ ์‚ฌ์šฉ๋˜๋ฉฐ ์‹œํ—˜๋ฐ์ดํ„ฐ๋Š” ํ•™์Šต๋œ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ๋‹ค.

ํ•˜์ง€๋งŒ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ ์˜ค๋ฒ„ํ”ผํŒ…์ด ์•ˆ ๋‚˜๋„๋ก ์ฃผ์˜ํ•ด์•ผํ•œ๋‹ค. 

๐Ÿ’ก ์˜ค๋ฒ„ํ”ผํŒ…์ด๋ž€?
ํ•œ ๋ฐ์ดํ„ฐ์…‹์—๋งŒ ์ง€๋‚˜์น˜๊ฒŒ ์ตœ์ ํ™”๋œ ์ƒํƒœ

4-2. ์†์‹ค ํ•จ์ˆ˜

์‹ ๊ฒฝ๋ง ํ•™์Šต์—์„œ๋Š” ํ˜„์žฌ์˜ ์ƒํƒœ๋ฅผ ์†์‹คํ•จ์ˆ˜๋กœ ํ‘œํ˜„ํ•˜๋Š” ๋ฐ ์ด ์†์‹คํ•จ์ˆ˜์˜ ๊ฐ’์„ ์ด์šฉํ•ด ์•Œ๋งž์€ ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ฐ’์„ ์ฐพ์•„๊ฐ„๋‹ค.

 ๐Ÿ’ก ์†์‹คํ•จ์ˆ˜๋ž€?
์‹ ๊ฒฝ๋ง์˜ ์„ฑ๋Šฅ์˜ '๋‚˜์จ'์„ ๋‚˜ํƒ€๋‚ด๋Š” ์ง€ํ‘œ์ด๋‹ค. ๋Œ€ํ‘œ์ ์œผ๋กœ๋Š” ์˜ค์ฐจ์ œ๊ณฑํ•ฉ, ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

 

์˜ค์ฐจ์ œ๊ณฑํ•ฉ

์ž ๊ทธ๋Ÿผ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋Š” ์†์‹คํ•จ์ˆ˜์ธ ์˜ค์ฐจ์ œ๊ณฑํ•ฉ์„ ์•Œ์•„๋ณด์ž.

์˜ค์ฐจ์ œ๊ณฑํ•ฉ ์ˆ˜์‹

์œ„์˜ ์ˆ˜์‹์€ ์˜ค์ฐจ์ œ๊ณฑํ•ฉ์„ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด๋‹ค. 

๋‹จ์ˆœํžˆ ์‹ ๊ฒฝ๋ง ๊ฒฐ๊ณผ(y)์—์„œ ์ •๋‹ต ๋ ˆ์ด๋ธ”(t)๋ฅผ ๋นผ๊ณ  ์ œ๊ณฑ ํ›„ ํ•ฉ์„ ๊ตฌํ•ด 2๋กœ ๋‚˜๋ˆˆ ๊ฒƒ์ด๋‹ค.

๊ทธ๋Ÿผ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•ด๋ณด์ž.

def sum_squares_error(y, t):
    return 0.5 * np.sum((y-t)**2)

 ์ด๋ ‡๊ฒŒ ๊ตฌํ•œ ์˜ค์ฐจ์ œ๊ณฑํ•ฉ์„ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ• ๊นŒ?

t =[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]

y = [0.1, 0.05, 0.6, 0.0, 0.05, 0.1, 0.0, 0.1, 0.0, 0.0]
sum_squares_error(np.array(y), np.array(t))
# >>> 0.097500000000031

y = [0.1, 0.05, 0.1, 0.0, 0.05, 0.1, 0.0, 0.6, 0.0, 0.0]
sum_squares_error(np.array(y), np.array(t))
# >>> 0.5975000000003

์œ„์— ์ฝ”๋“œ์ฒ˜๋Ÿผ ์˜ค์ฐจ์ œ๊ณฑํ•ฉ์ด ๋‚ฎ์•„์งˆ์ˆ˜๋ก ์ •๋‹ต์ผ ํ™•๋ฅ ์ด ์˜ฌ๋ผ๊ฐ€๊ณ  ๋ฐ˜๋Œ€๋กœ ๋†’์•„์งˆ์ˆ˜๋ก ์ •๋‹ต์ผ ํ™•๋ฅ ์€ ๋‚ด๋ ค๊ฐ„๋‹ค.

 

๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ

๊ทธ๋Ÿผ ๋‹ค๋ฅธ ์†์‹คํ•จ์ˆ˜์ธ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๋ฅผ ์‚ดํŽด๋ณด์ž.

๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ ์ˆ˜์‹

์˜ค์ฐจ์ œ๊ณฑํ•ฉ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ y๋Š” ์ถœ๋ ฅ์„ t๋Š” ์ •๋‹ต ๋ ˆ์ด๋ธ”์„ ๊ฐ€๋ฆฌํ‚จ๋‹ค.

def cross_entropy_error(y, t):
	delta = 1e-7
    return -np.sum(t * np.log(y + delta))

๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๋Š” ๋กœ๊ทธ ๊ฐ’์ด ๋ฌดํ•œ๋ฐ๋กœ ๊ฐ€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด delta๊ฐ’์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

์˜ค์ฐจ์ œ๊ณฑํ•ฉ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๊ฐ’์ด ์ปค์ง€๋ฉด ์ •๋‹ต์ผ ํ™•๋ฅ ์€ ๋‚ฎ์•„์ง€๊ณ  ์ž‘์•„์ง€๋ฉด ์ •๋‹ต์ผ ํ™•๋ฅ ์€ ๋†’์•„์ง„๋‹ค.

 

์†์‹คํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ 

์ •ํ™•๋„๋ฅผ ์ง€ํ‘œ๋กœ ํ•˜๋ฉด ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๋ฏธ๋ถ„์ด ๋Œ€๋ถ€๋ถ„์˜ ์žฅ์†Œ์—์„œ 0์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์‹ ๊ฒฝ๋ง์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ ์ •ํ™•๋„๋ฅผ ์ง€ํ‘œ๋กœ ์‚ผ์•„์„œ๋Š” ์•ˆ ๋œ๋‹ค. 

์ •ํ™•๋„๋ฅผ ์ง€ํ‘œ๋กœ ์‚ผ์œผ๋ฉด ์ •ํ™•๋„๊ฐ€ ๊ฐœ์„ ๋œ๋‹ค๊ณ  ํ•˜๋”๋ผ๋„ ๋ถˆ์—ฐ์†์ ์ธ ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋œ๋‹ค. ํ•˜์ง€๋งŒ ์†์‹คํ•จ์ˆ˜๋ฅผ ์ง€ํ‘œ๋กœ ์‚ผ์œผ๋ฉด ์—ฐ์†์ ์ธ ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋œ๋‹ค.

4-3. ์ˆ˜์น˜ ๋ฏธ๋ถ„

์ˆ˜์น˜๋ฏธ๋ถ„ ์ˆ˜์‹

์ˆ˜์น˜ ๋ฏธ๋ถ„์˜ ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๊ทธ๋Ÿผ ์ด ์ˆ˜์‹์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•ด๋ณด์ž.

def numerical_diff(f, x):
    h = 10e-50 
    return (f(x+h) - f(h)) / (h)

์ˆ˜์น˜๋ฏธ๋ถ„์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•ด ๋ณด์•˜์ง€๋งŒ ์œ„ ์ฝ”๋“œ๋Š” ๋‘ ๊ฐ€์ง€์˜ ๋ฌธ์ œ์ ์ด ์žˆ๋‹ค.

  1. h๋ฅผ ๋„ˆ๋ฌด ์ž‘์€ ๊ฐ’์„ ์ฃผ๋ฉด ๋ฐ˜์˜ฌ๋ฆผ ์˜ค์ฐจ๊ฐ€ ๋ฐœ์ƒํ•ด 0์œผ๋กœ ๋ณ€ํ™˜๋จ
  2. f(x+h) - f(h)๋Š” ๊ทผ์‚ฌ ๋ฏธ๋ถ„์ด์ง€ ์ •ํ™•ํ•œ ๋ฏธ๋ถ„๊ฐ’์ด ์•„๋‹ˆ๋‹ค.

์ด ๋ฌธ์ œ์  ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์Œ 2๊ฐœ๋กœ ์ˆ˜์ •ํ•ด๋ณด์ž.

  1. h๊ฐ’์„ ๋„ˆ๋ฌด ์ž‘์ง€์•Š๊ฒŒ ๋ณ€๊ฒฝ
  2. ์ค‘์‹ฌ์ฐจ๋ถ„ ์‚ฌ์šฉ
def numerical_diff(f, x):
    h = 1e-4 # 0.0001
    return (f(x+h) - f(x-h)) / (2*h)

ํŽธ๋ฏธ๋ถ„

๋ณ€์ˆ˜๊ฐ€ ๋‘ ๊ฐœ์ด์ƒ์˜ ์‹์„ ๋ฏธ๋ถ„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํŽธ๋ฏธ๋ถ„์„ ์‚ฌ์šฉํ•œ๋‹ค. 

ํŽธ๋ฏธ๋ถ„์€ ์—ฌ๋Ÿฌ ๋ณ€์ˆ˜ ์ค‘ ๋ชฉํ‘œ ๋ณ€์ˆ˜ ํ•˜๋‚˜์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ๋‹ค๋ฅธ ๋ณ€์ˆ˜ ๊ฐ’์„ ๊ณ ์ •ํ•˜์—ฌ ๊ตฌํ•œ๋‹ค.

def numerical_gradient(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x) # x์™€ ํ˜•์ƒ์ด ๊ฐ™์€ ๋ฐฐ์—ด์„ ์ƒ์„ฑ
    
    for idx in range(x.size):
        tmp_val = x[idx]
        
        # f(x+h) ๊ณ„์‚ฐ
        x[idx] = float(tmp_val) + h
        fxh1 = f(x)
        
        # f(x-h) ๊ณ„์‚ฐ
        x[idx] = tmp_val - h 
        fxh2 = f(x) 
        
        grad[idx] = (fxh1 - fxh2) / (2*h)
        x[idx] = tmp_val # ๊ฐ’ ๋ณต์›
        
    return grad

ํŽธ๋ฏธ๋ถ„์˜ ์ฝ”๋“œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.


4-4. ๊ธฐ์šธ๊ธฐ

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•

๋‹ค์‹œ ์ด์ œ ์‹ ๊ฒฝ๋ง์œผ๋กœ ๋Œ์•„๊ฐ€ ๋ณด๋ฉด ์šฐ๋ฆฌ๋Š” ์ง€๊ธˆ ์†์‹ค ํ•จ์ˆ˜๊นŒ์ง€ ๋ฐฐ์› ๋‹ค. ๊ทธ๋Ÿผ ์†์‹คํ•จ์ˆ˜๊ฐ€ ์ตœ์ ์˜ ํ•ด๋ฅผ ๊ฐฑ์‹ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ณด์ž.

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์€ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ด์šฉํ•ด ์†์‹คํ•จ์ˆ˜๊ฐ€ ์ตœ์†Ÿ๊ฐ’์„ ๊ฐ–๋Š” ๊ณณ์„ ์ฐพ์•„์ค€๋‹ค.

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์€ ์†์‹คํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„์—์„œ ํ•œ ์ ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•œ ํ›„ ๊ธฐ์šธ๊ธฐ๊ฐ€ ์ตœ์†Œ๊ฐ€ ๋˜๋Š” ๊ณณ์œผ๋กœ ์ด๋™ํ•˜๋Š” ๊ฒƒ์„ ๋ฐ˜๋ณตํ•œ๋‹ค.

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์˜ ์ˆ˜์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค. 

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• ์ˆ˜์‹

์ด์™€๊ฐ™์€ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•ด๋ณด์ž.

def gradient_descent(f, init_x, lr=0.01, step_num=100):
    x = init_x

    for i in range(step_num):
        grad = numerical_gradient(f, x)
        x -= lr * grad

    return x

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์—์„œ ๊ฐ’์„ ๊ฐฑ์‹ ํ•  ๋•Œ ํ•™์Šต๋ฅ ์ด๋ผ๋Š” ๊ฒƒ์ด ์กด์žฌํ•˜๋Š”๋ฐ ์ด๋Š” ํ•œ ๋ฒˆ์˜ ํ•™์Šต์œผ๋กœ ์–ผ๋งˆ๋งŒํผ ํ•™์Šตํ•ด์•ผํ•˜๋Š”์ง€ ์ฆ‰ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐ’์„ ์–ผ๋งˆ๋‚˜ ๊ฐฑ์‹ ํ•˜๋А๋ƒ๋ฅผ ์ •ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

 

๊ทธ๋Ÿฌ๋ฉด ์ด ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ์ด์šฉํ•ด f(x0, x1) = x0^2 + x1^2์˜ ์ตœ์†Ÿ๊ฐ’์„ ๊ตฌํ•ด๋ณด์ž.

def function_2(x):
    return x[0]**2 + x[1]**2

init_x = np.array([-3.0, 4.0])    

lr = 0.1
step_num = 20
x, x_history = gradient_descent(function_2, init_x, lr=lr, step_num=step_num)

๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ดˆ๊ธฐ ์ขŒํ‘œ, ํ•™์Šต๋ฅ , ๋ฐ˜๋ณตํšŸ์ˆ˜๋งŒ ์ •ํ•ด์ฃผ๋ฉด ์‰ฝ๊ฒŒ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ• ํ•™์Šต ๊ณผ์ •

ํ•™์Šต๋ฅ ๊ณผ ๊ฐ™์ด ์‚ฌ๋žŒ์ด ์ง์ ‘ ์ •ํ•ด์ฃผ๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ผ๊ณ  ํ•œ๋‹ค.

4-5. ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ตฌํ˜„ํ•˜๊ธฐ

์ „์ œ 

  • ์‹ ๊ฒฝ๋ง์—๋Š” ์ ์‘ ๊ฐ€๋Šฅํ•œ ๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ์ด ์žˆ๊ณ , ์ด ๊ฐ€์ค‘ํ”ผ๋กธ ํŽธํ–ฅ์„ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ์ ์„ํ•˜๋„๋ก ์กฐ์ •ํ•˜๋Š” ๊ณผ์ •์„ ํ•™์Šต์ด๋ผ๊ณ  ํ•œ๋‹ค. ์‹ ๊ฒฝ๋ง ํ•™์Šต์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด 4๋‹จ๊ณ„๋กœ ์ˆ˜ํ–‰ํ•œ๋‹ค.

1๋‹จ๊ณ„ - ๋ฏธ๋‹ˆ๋ฐฐ์น˜

  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์ค‘ ์ผ๋ถ€๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๊ฐ€์ ธ์˜จ๋‹ค. ์ด๋ ‡๊ฒŒ ์„ ๋ณ„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜๋ผ ํ•˜๋ฉฐ, ๊ทธ ๋ฏธ๋‹ˆ๋ฐฐ์น˜์˜ ์†์‹คํ•จ์ˆ˜ ๊ฐ’์„ ์ค„์ด๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๋‹ค.

2๋‹จ๊ณ„ - ๊ธฐ์šธ๊ธฐ ์‚ฐ์ถœ

  • ๋ฏธ๋‹ˆ๋ฐฐ์น˜์˜ ์†์‹คํ•จ์ˆ˜ ๊ฐ’์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๊ฐ ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•œ๋‹ค. ๊ธฐ์šธ๊ธฐ๋Š” ์†์‹คํ•จ์ˆ˜์˜ ๊ฐ’์„ ๊ฐ€์žฅ ์ž‘๊ฒŒ ํ•˜๋Š” ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ๋‹ค.

3๋‹จ๊ณ„ - ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹ 

  • ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ธฐ์šธ๊ธฐ ๋ฐฉํ–ฅ์œผ๋กœ ์•„์ฃผ ์กฐ๊ธˆ ๊ฐฑ์‹ ํ•œ๋‹ค.

4๋‹จ๊ณ„ - ๋ฐ˜๋ณต

  • 1~3๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณต

2์ธต ์‹ ๊ฒฝ๋ง ํด๋ž˜์Šค ๊ตฌํ˜„

class TwoLayerNet:

    def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):
        # ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)

    def predict(self, x):
        W1, W2 = self.params['W1'], self.params['W2']
        b1, b2 = self.params['b1'], self.params['b2']
    
        a1 = np.dot(x, W1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, W2) + b2
        y = softmax(a2)
        
        return y
        
    # x : ์ž…๋ ฅ ๋ฐ์ดํ„ฐ, t : ์ •๋‹ต ๋ ˆ์ด๋ธ”
    def loss(self, x, t):
        y = self.predict(x)
        
        return cross_entropy_error(y, t)
    
    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        t = np.argmax(t, axis=1)
        
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy
        
    # x : ์ž…๋ ฅ ๋ฐ์ดํ„ฐ, t : ์ •๋‹ต ๋ ˆ์ด๋ธ”
    def numerical_gradient(self, x, t):
        loss_W = lambda W: self.loss(x, t)
        
        grads = {}
        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])
        
        return grads

๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ํ•™์Šต

# ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

# ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ
iters_num = 10000  # ๋ฐ˜๋ณต ํšŸ์ˆ˜๋ฅผ ์ ์ ˆํžˆ ์„ค์ •ํ•œ๋‹ค.
train_size = x_train.shape[0]
batch_size = 100   # ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ํฌ๊ธฐ
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

# 1์—ํญ๋‹น ๋ฐ˜๋ณต ์ˆ˜
iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    # ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ํš๋“
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    # ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐ
    #grad = network.numerical_gradient(x_batch, t_batch)
    grad = network.gradient(x_batch, t_batch)
    
    # ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹ 
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    # ํ•™์Šต ๊ฒฝ๊ณผ ๊ธฐ๋ก
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    
    # 1์—ํญ๋‹น ์ •ํ™•๋„ ๊ณ„์‚ฐ
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print("train acc, test acc | " + str(train_acc) + ", " + str(test_acc))

Chapter. 4 ์ •๋ฆฌ

๊ธฐ๊ณ„ํ•™์Šต์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐ์ดํ„ฐ์…‹์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์™€ ์‹œํ—˜ ๋ฐ์ดํ„ฐ๋กœ ๋‚˜๋ˆ  ์‚ฌ์šฉํ•œ๋‹ค. 
ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์˜ ๋ฒ”์šฉ ๋Šฅ๋ ฅ์„ ์‹œํ—˜ ๋ฐ์ดํ„ฐ๋กœ ํ‰๊ฐ€ํ•œ๋‹ค.
์‹ ๊ฒฝ๋ง ํ•™์Šต์€ ์†์‹คํ•จ์ˆ˜๋ฅผ ์ง€ํ‘œ๋กœ, ์†Œ์‹คํ•จ์ˆ˜์˜ ๊ฐ’์ด ์ž‘์•„์ง€๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐฑ์‹ ํ•œ๋‹ค.
๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐฑ์‹ ํ•  ๋•Œ๋Š” ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ด์šฉํ•˜๊ณ , ๊ธฐ์šธ์–ด์ง„ ๋ฐฉํ–ฅ์œผ๋กœ ๊ฐ€์ค‘์น˜์˜ ๊ฐ’์„ ๊ฐฑ์‹ ํ•˜๋Š” ์ž‘์—…์„ ๋ฐ˜๋ณตํ•œ๋‹ค.
์•„์ฃผ ์ž‘์€ ๊ฐ’์„ ์ฃผ์—ˆ์„ ๋•Œ์˜ ์ฐจ๋ถ„์œผ๋กœ ๋ฏธ๋ถ„ํ•˜๋Š” ๊ฒƒ์„ ์ˆ˜์น˜ ๋ฏธ๋ถ„์ด๋ผ๊ณ  ํ•œ๋‹ค.
์ˆ˜์น˜ ๋ฏธ๋ถ„์„ ์ด์šฉํ•ด ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ ๋ณ€์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
์ˆ˜์น˜ ๋ฏธ๋ถ„์„ ์ด์šฉํ•œ ๊ณ„์‚ฐ์—๋Š” ์‹œ๊ฐ„์ด ๊ฑธ๋ฆฌ์ง€๋งŒ, ๊ทธ ๊ตฌํ˜„์€ ๊ฐ„๋‹จํ•˜๋‹ค.

์ถœ์ฒ˜: ์‚ฌ์ดํ†  ๊ณ ํ‚คใ€Ž๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ใ€, ํ•œ๋น›๋ฏธ๋””์–ด(2017)

BELATED ARTICLES

more