Naïve Bayes (나이브 배이스) Classifier

data science/machine learning

Naïve Bayes (나이브 배이스) Classifier - 실전 1

꼰대코더 2025. 2. 14. 12:30

아래와 같이 X_train 과 Y_train 을 가지고 학습을 한 뒤 X_test 가 Y 일지 N 일지 분류하고자 하자.

X_train = np.array([ [0, 1, 1],
                         [0, 0, 1],
                        [0, 0, 0],
                        [1, 1, 0]] )
Y_train = ['Y', 'N', 'Y', 'Y']

X_test = np.array([ [1, 1, 0] ])

Naïve Bayes (나이브 배이스) Classifier 에서 설명한 것 처럼 다시 구성을 하면 아래와 같다.

'Y' 의 총수는 3개이고 'N' 의 총수는 1개이다. 전체는 4개의 데이터.

라벨(Y_train)	X1 (=1)	Not X1 (=0)	X2 (=1)	Not X2 (=0)	X3 (=1)	Not X3 (=0)	Total
Y	1 개	2 개	2 개	1 개	1 개	2 개	3 개
N	0 개	1 개	0 개	1 개	1 개	0 개	1 개
Total	1 개	3 개	2 개	2 개	2 개	2 개	4 개

1. Y, N 이 나올 확률 계산

P(Y) = 3 / 4 = 0.75

P(N) = 1 / 4 = 0.25

from collections import defaultdict

# list형을 가지는 빈 dict 을 생성
label_indices = defaultdict(list)

# Y, N 별로 위치정보를 dict list에 추가
for index, label in enumerate( Y_train ):
label_indices[label].append(index)

print(label_indices) # {'Y': [0, 2, 3], 'N': [1]}

# label_indices : {'Y': [0, 2, 3], 'N': [1]}
# Y 의 카운트, N 의 카운트를 dict 으로 작성
prior = {label: len(indices) for label, indices in label_indices.items()}

# 총수 = 4
total_count = sum(prior.values())

for label in prior:
prior[label] /= total_count

print(prior) # {'Y': 0.75, 'N': 0.25}

2. Y일때 N일때의 각 파라미터 X1, X2, X3 (Not은 [1 - 값]) 의 확률을 계산

* 완충제로서 파라미터에는 +1을, 각각의 개수에는 + 2 를 두었다.

P(X1|Y) = (1 + 1) / (3 + 2) = 0.4 * P( Not X1 |Y) = (1 - P(X1|Y) )

P(X2|Y) = (2 + 1) / (3 + 2) = 0.6

P(X3|Y) = (1 + 1) / (3 + 2) = 0.4

P(X1|N) = (0 + 1) / (1 + 2) = 0.333

P(X2|N) = (0 + 1) / (1 + 2) = 0.333

P(X3|N) = (1 + 1) / (1 + 2) = 0.666

# label_indices = {'Y': [0, 2, 3], 'N': [1]}
# X_train = np.array([[0, 1, 1],[0, 0, 1], [0, 0, 0],[1, 1, 0]])

smoothing = 1
likelihood = {}
for label, indices in label_indices.items():
# 수직방향의 'Y', 'N' 별로 각각의 X1, X2, X3 의 합,
likelihood[label] = X_train[indices, :].sum(axis=0) + smoothing
total_count = len(indices)
# 'Y', 'N' 별로 각각의 X1, X2, X3 의 전체수에 대한 비율
likelihood[label] = likelihood[label] / (total_count + 2 * smoothing)

print( likelihood ) # {'Y': array([0.4, 0.6, 0.4]), 'N': array([0.33333333, 0.33333333, 0.66666667])}

3. X_test 일 경우 'Y' 일까? 'N' 일까?

X_test

P(Y | X1=1,X2=1,X3=0) = P(Y | X1,X2,Not X3) = (0.4 x 1) x (0.6 x 1) x (1 - 0.4) x 0.75 = 0.108

P(N | X1=1,X2=1,X3=0) = P(N | X1,X2,Not X3) = (0.333 x 1) x (0.333 x 1) x (1 - 0.666) x 0.25 = 0.009259

전체 비율로 보면
X_test 가 Y 일 경우 = 0.108 / ( 0.108 + 0.009259) x 100 = 92.1 %

X_test 가 N 일 경우 = 0.009259 / ( 0.108 + 0.009259) x 100 = 7.89 %

# prior = {'Y': 0.75, 'N': 0.25}
# likelihood = {'Y': [0.4, 0.6, 0.4], 'N': [0.33333333, 0.33333333, 0.66666667]}
# X_test = [ [1, 1, 0] ]

posteriors = []
for x in X_test:
posterior = prior.copy()

for label, likelihood_lst in likelihood.items():
　　　 # x = [1, 1, 0]
            for index, bool_value in enumerate(x):
# bool_value 1 = [0.4, 0.6, 0.4], 0 = [1 - 0.4, 1- 0.6, 1- 0.4]
                posterior[label] *= likelihood_lst[index] if bool_value else (1 - likelihood_lst[index])

# 전체 분류수에 대한 비율
# sum_posterior = 2
sum_posterior = sum(posterior.values())
for label in posterior:
posterior[label] /= sum_posterior

        posteriors.append(posterior.copy())

print( posteriors ) # [{'Y': 0.9210360075805433, 'N': 0.07896399241945673}]
# X_test 는 'Y' 일 확률이 크다.

4. 위의 계산을 sklearn.naive_bayes 의 BernoulliNB 를 이용하여 간단히

from sklearn.naive_bayes import BernoulliNB

clf = BernoulliNB(alpha=1.0, fit_prior=True)
clf.fit(X_train, Y_train)

pred_prob = clf.predict_proba(X_test)
print('[scikit-learn] Predicted probabilities:\n', pred_prob)

pred = clf.predict(X_test)
print('[scikit-learn] Prediction:', pred)

'data science > machine learning' 카테고리의 다른 글

Naïve Bayes (나이브 배이스) Classifier - ROC curve (0)	2025.03.11
Naïve Bayes (나이브 배이스) Classifier - 실전 2 (0)	2025.03.08
Naïve Bayes (나이브 배이스) Classifier (0)	2025.02.09
바이너리 분류 모델에 있어서 평가 항목 (0)	2024.01.16
일반 데이터 학습용/테스트용 분리 (0)	2024.01.14

현재글Naïve Bayes (나이브 배이스) Classifier - 실전 1

꼰대코더

50대 c/c++ .net reactjs flutter deep learning 프로그래머

PDF, react #useEffect, ㅜ, OpenCV, word2vec, Docker, ECG, pandas, docker-compose, dockerfile,

Today :
Yesterday :

꼰대코더