코드: 완성된 7x7 CNN Policy Network

완성된 7x7 CNN! 2,913 가중치로 49 자리 확률 출력.

🎯 결과 분석

빈 보드에서 모든 자리 0.0204 ≈ 1/49 — 완전 균등. 왜?

입력이 균등 (모든 자리가 빈자리, 채널 2가 1)
가중치 무작위라 위치 특화 신호 없음
합성곱이라 모든 위치에 같은 가중치 적용 → 같은 출력

이게 정상. 빈 보드에서 차이가 나려면 학습이 필요.

💡 가중치 수 비교

모델	가중치 수	크기 비교
우리 7x7 CNN	2,913	기준
19x19 MLP (가상)	~250,000	85배
알파고 13층 CNN	~3,500,000	1,200배
알파고 Zero 40 잔차	~21,000,000	7,200배

모델이 클수록 더 복잡한 패턴 학습. 우리 7x7은 작아도 핵심 알고리즘은 동일.

📖 빈 보드에서 학습된 모델이 출력하는 것

충분히 학습된 모델이라면 빈 보드에 다음 같은 분포:

중앙 화점 (3,3) ≈ 0.15 (가장 강함)
3-3 자리들 → 0.08~0.12
변 가운데 → 0.03
1선/2선 가장자리 → 0.001

학습이 곧 "균등에서 의미 있는 분포로의 전환". 다음 챕터에서 직접.

PYTHON

# 완성된 7x7 policy network. 3 conv 층 + softmax.

import numpy as np

np.random.seed(42)

def relu(x): return np.maximum(0, x)
def softmax(z):
    e = np.exp(z - np.max(z))
    return e / np.sum(e)

def conv2d_padded(image, kernels, bias):
    H, W, Cin = image.shape; K = kernels.shape[0]; Cout = kernels.shape[3]
    pad = K // 2
    padded = np.zeros((H+2*pad, W+2*pad, Cin))
    padded[pad:pad+H, pad:pad+W, :] = image
    output = np.zeros((H, W, Cout))
    for i in range(H):
        for j in range(W):
            patch = padded[i:i+K, j:j+K, :]
            for c in range(Cout):
                output[i,j,c] = np.sum(patch * kernels[:,:,:,c]) + bias[c]
    return output


# === 모델 가중치 (무작위, 학습 전) ===
W1 = np.random.randn(3, 3, 3, 16) * 0.1
b1 = np.zeros(16)
W2 = np.random.randn(3, 3, 16, 16) * 0.1
b2 = np.zeros(16)
W3 = np.random.randn(3, 3, 16, 1) * 0.1
b3 = np.zeros(1)


def policy_net_7x7(board):
    """7x7x3 보드 → 49 자리 확률 분포"""
    h1 = relu(conv2d_padded(board, W1, b1))    # 7x7x16
    h2 = relu(conv2d_padded(h1, W2, b2))        # 7x7x16
    logits = conv2d_padded(h2, W3, b3)          # 7x7x1
    flat = logits.reshape(-1)                    # 49
    probs = softmax(flat)
    return probs.reshape(7, 7)


# === 가중치 개수 계산 ===
def count_params(W):
    return W.size

total = (count_params(W1) + count_params(b1) +
         count_params(W2) + count_params(b2) +
         count_params(W3) + count_params(b3))
print(f"=== 7x7 CNN 가중치 수 ===")
print(f"Conv1 (3x3x3 → 16):     {count_params(W1)+count_params(b1):>5}")
print(f"Conv2 (3x3x16 → 16):    {count_params(W2)+count_params(b2):>5}")
print(f"Conv3 (3x3x16 → 1):     {count_params(W3)+count_params(b3):>5}")
print(f"총합:                    {total:>5}")
print(f"  (참고: 알파고 ≈ 3,500,000개)")
print()


# === 빈 7x7 보드 평가 ===
board = np.zeros((7, 7, 3))
board[:, :, 2] = 1.0   # 모두 빈자리

print("=== 빈 보드 정책 (학습 전, 무작위 가중치) ===")
probs = policy_net_7x7(board)
print("자리별 확률 (소수 4자리):")
for r in range(7):
    row_str = "  "
    for c in range(7):
        row_str += f"{probs[r,c]:.4f} "
    print(row_str)

print()
print(f"가장 큰 확률: {probs.max():.4f}")
print(f"가장 작은 확률: {probs.min():.4f}")
print(f"합계: {probs.sum():.6f}")
print()
print("학습 전이라 확률이 거의 균등 (1/49 ≈ 0.020).")

출력

기대 출력:

=== 7x7 CNN 가중치 수 ===
Conv1 (3x3x3 → 16):       448
Conv2 (3x3x16 → 16):     2320
Conv3 (3x3x16 → 1):       145
총합:                     2913
  (참고: 알파고 ≈ 3,500,000개)

=== 빈 보드 정책 (학습 전, 무작위 가중치) ===
자리별 확률 (소수 4자리):
  0.0242 0.0228 0.0201 0.0198 0.0202 0.0200 0.0202 
  0.0219 0.0228 0.0199 0.0192 0.0194 0.0194 0.0200 
  0.0211 0.0221 0.0195 0.0186 0.0186 0.0192 0.0203 
  0.0215 0.0221 0.0195 0.0186 0.0186 0.0191 0.0203 
  0.0218 0.0223 0.0198 0.0190 0.0191 0.0192 0.0206 
  0.0201 0.0202 0.0189 0.0181 0.0184 0.0190 0.0214 
  0.0224 0.0223 0.0223 0.0211 0.0210 0.0216 0.0225 

가장 큰 확률: 0.0242
가장 작은 확률: 0.0181
합계: 1.000000

학습 전이라 확률이 거의 균등 (1/49 ≈ 0.020).

← 이전 코드: Padding 추가한 합성곱 다음 → 학습 후의 모습 — 시뮬레이션