일반 데이터 학습용/테스트용 분리

data science/machine learning

일반 데이터 학습용/테스트용 분리

꼰대코더 2024. 1. 14. 23:24

지난번에는 이미지 데이터가 대상인 Data Augmentation의 ImageDataGenerator 를 이용하여 학습데이터를 분리하였다.

이번에는 단순히 pandas 로 읽어들인 숫자 문자등의 데이터를 분리해 보겠다.

1. numpy 를 이용하는 방법

import numpy as np

def shuffle_and_split_data(data, test_ratio):
np.random.seed(42)
# np.random.permutation(length) : 랜덤으로 일련번호의 순서를 바꿈
shuffled_indices = np.random.permutation(len(data))
# 테스트 사이즈를 계산
test_set_size = int(len(data) * test_ratio)
# 초반 test_set_size 만큼 테스트 데이터의 인덱스
test_indices = shuffled_indices[:test_set_size]
# 이후 데이터는 학습데이터의 인덱스
train_indices = shuffled_indices[test_set_size:]

# 인덱스별 데이터 추출
return data.iloc[train_indices], data.iloc[test_indices]

테스트데이터를 20% 설정 (80%는 학습데이터)

train_set, test_set = shuffle_and_split_data(pd, 0.2)

2. Scikit-Learn 를 이용하는 방법

from sklearn.model_selection import train_test_split

train_set, test_set = train_test_split(pd, test_size=0.2,random_state=42)

'data science > machine learning' 카테고리의 다른 글

Naïve Bayes (나이브 배이스) Classifier - 실전 1 (0)	2025.02.14
Naïve Bayes (나이브 배이스) Classifier (0)	2025.02.09
바이너리 분류 모델에 있어서 평가 항목 (0)	2024.01.16
학습 데이터 준비 (0)	2024.01.02
(VGG16) Fine tunning (0)	2024.01.01

현재글일반 데이터 학습용/테스트용 분리

꼰대코더

50대 c/c++ .net reactjs flutter deep learning 프로그래머

Docker, pandas, PDF, word2vec, react #useEffect, docker-compose, dockerfile, ECG, ㅜ, OpenCV,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

꼰대코더