Tree-based
맡았다
실험해보장
데이터는 RAW / standard스케일러->MICE 한 기본 버전 허깅페이스 업로드
이후 robust스케일러 , 칼럼별스케일러 적용 / 파인튜닝한 MICE 실험 후 다양한 데이터 적용 실험하기로
K-Fold 교차검증(Stratified K-Fold Cross Validation) 도입
기본 구조
허깅페이스 데이터 받기 -> 모델 구조 선언 -> k-fold 이용 학습 -> 결과 출력 / 결과,모델 파일 저장
학습 시
- 학습시간 출력하기 (전체 학습 시간, epoch별 시간)
- loss curve (train, valid)
- early stopping 5번
test 시
- 테스트시간 출력하기
- confusion matrix (파란색으로)
- 이런 형식으로 (소수점 4자리까지 출력)
Test macro-F1 : 0.6980
[Test] Classification Report
precision recall f1-score support
0 0.7388 0.8740 0.8007 246
1 0.6687 0.5515 0.6045 194
2 0.7102 0.6684 0.6887 187
accuracy 0.7129 627
macro avg 0.7059 0.6980 0.6980 627
weighted avg 0.7086 0.7129 0.7066 627
이제 서버에서 실험하기...
서버 다이브
가상환경 켜기
conda activate test-env
파일 구조 세팅
추후 실제데이터테스팅/전처리다양화 파일 추가
HamCaDor/GalaxyMergerFinalTabularDataStandardMICE · Datasets at Hugging Face
Subset (1) default · 6.06k rows default (6.06k rows) Split (2) train · 4.85k rows train (4.85k rows)test (1.21k rows) stringlengths float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 fl
huggingface.co
우선 베이직 데이터셋 통일
🌳 Decision Tree
- max_depth 후보를 바꿔가며 5-Fold 검증 성능이 연속 5회 개선되지 않으면 중단하는 방식으로 구
에포크 개념 없음!
DecisionTree Stratified 5-Fold (early stopping patience=5)
depth_candidates=[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, None]
[01] depth= 2 | train F1=0.5359 | valid F1=0.5282 | time=0.11s
[02] depth= 3 | train F1=0.5772 | valid F1=0.5629 | time=0.14s
[03] depth= 4 | train F1=0.6154 | valid F1=0.5904 | time=0.18s
[04] depth= 5 | train F1=0.6514 | valid F1=0.6155 | time=0.21s
[05] depth= 6 | train F1=0.6867 | valid F1=0.6288 | time=0.25s
[06] depth= 7 | train F1=0.7213 | valid F1=0.6405 | time=0.27s
[07] depth= 8 | train F1=0.7678 | valid F1=0.6588 | time=0.30s
[08] depth= 9 | train F1=0.8125 | valid F1=0.6727 | time=0.31s
[09] depth= 10 | train F1=0.8573 | valid F1=0.6949 | time=0.33s
[10] depth= 11 | train F1=0.8971 | valid F1=0.6919 | time=0.35s
[11] depth= 12 | train F1=0.9289 | valid F1=0.6957 | time=0.38s
[12] depth= 13 | train F1=0.9539 | valid F1=0.7045 | time=0.39s
[13] depth= 14 | train F1=0.9712 | valid F1=0.7052 | time=0.39s
[14] depth= 15 | train F1=0.9830 | valid F1=0.7072 | time=0.40s
[15] depth= 16 | train F1=0.9899 | valid F1=0.6997 | time=0.38s
[16] depth= 17 | train F1=0.9948 | valid F1=0.7012 | time=0.39s
[17] depth= 18 | train F1=0.9977 | valid F1=0.7027 | time=0.38s
[18] depth= 19 | train F1=0.9987 | valid F1=0.7063 | time=0.39s
[19] depth= 20 | train F1=0.9996 | valid F1=0.7081 | time=0.39s
[20] depth= 21 | train F1=0.9999 | valid F1=0.7069 | time=0.38s
[21] depth= 22 | train F1=1.0000 | valid F1=0.7075 | time=0.39s
[22] depth= 23 | train F1=1.0000 | valid F1=0.7075 | time=0.38s
[23] depth= 24 | train F1=1.0000 | valid F1=0.7075 | time=0.39s
[24] depth= 25 | train F1=1.0000 | valid F1=0.7075 | time=0.38s
Early stopping at epoch 24
[Train] total time = 7.87s
Best depth = 20 | Best valid Macro-F1 = 0.7081
Test time (s): 0.0004
Accuracy : 0.7005
Macro-F1 : 0.6966
[Test] Classification Report
precision recall f1-score support
-1 0.6173 0.6253 0.6213 387
0 0.7959 0.7694 0.7824 451
1 0.6771 0.6952 0.6860 374
accuracy 0.7005 1212
macro avg 0.6968 0.6966 0.6966 1212
weighted avg 0.7022 0.7005 0.7012 1212
🌳Random Forest
RandomForest Stratified 5-Fold with warm_start early stopping
depth_candidates=[None, 10, 14, 18, 22, 26, 30]
start_trees=50, step=50, max_rounds=20, patience=5
[01] depth=None | trees= 50 | train F1=0.9999 | valid F1=0.7730 | time=1.53s
[02] depth=None | trees= 100 | train F1=1.0000 | valid F1=0.7809 | time=2.09s
[03] depth=None | trees= 150 | train F1=1.0000 | valid F1=0.7846 | time=2.86s
[04] depth=None | trees= 200 | train F1=1.0000 | valid F1=0.7822 | time=3.45s
[05] depth=None | trees= 250 | train F1=1.0000 | valid F1=0.7828 | time=4.31s
[06] depth=None | trees= 300 | train F1=1.0000 | valid F1=0.7858 | time=5.13s
[07] depth=None | trees= 350 | train F1=1.0000 | valid F1=0.7842 | time=5.96s
[08] depth=None | trees= 400 | train F1=1.0000 | valid F1=0.7825 | time=6.78s
[09] depth=None | trees= 450 | train F1=1.0000 | valid F1=0.7835 | time=8.13s
[10] depth=None | trees= 500 | train F1=1.0000 | valid F1=0.7820 | time=8.74s
[11] depth=None | trees= 550 | train F1=1.0000 | valid F1=0.7819 | time=8.94s
[12] depth= 10 | trees= 50 | train F1=0.9206 | valid F1=0.7418 | time=1.38s
[13] depth= 10 | trees= 100 | train F1=0.9227 | valid F1=0.7456 | time=2.21s
[14] depth= 10 | trees= 150 | train F1=0.9242 | valid F1=0.7473 | time=2.96s
[15] depth= 10 | trees= 200 | train F1=0.9243 | valid F1=0.7486 | time=3.78s
[16] depth= 10 | trees= 250 | train F1=0.9255 | valid F1=0.7460 | time=4.59s
[17] depth= 10 | trees= 300 | train F1=0.9254 | valid F1=0.7463 | time=5.46s
[18] depth= 10 | trees= 350 | train F1=0.9258 | valid F1=0.7455 | time=5.79s
[19] depth= 10 | trees= 400 | train F1=0.9251 | valid F1=0.7440 | time=7.13s
[20] depth= 10 | trees= 450 | train F1=0.9252 | valid F1=0.7460 | time=7.77s
[21] depth= 14 | trees= 50 | train F1=0.9939 | valid F1=0.7667 | time=1.39s
[22] depth= 14 | trees= 100 | train F1=0.9947 | valid F1=0.7758 | time=2.30s
[23] depth= 14 | trees= 150 | train F1=0.9951 | valid F1=0.7781 | time=3.11s
[24] depth= 14 | trees= 200 | train F1=0.9957 | valid F1=0.7781 | time=3.72s
[25] depth= 14 | trees= 250 | train F1=0.9953 | valid F1=0.7782 | time=4.61s
[26] depth= 14 | trees= 300 | train F1=0.9956 | valid F1=0.7788 | time=5.44s
[27] depth= 14 | trees= 350 | train F1=0.9958 | valid F1=0.7770 | time=6.01s
[28] depth= 14 | trees= 400 | train F1=0.9958 | valid F1=0.7771 | time=6.95s
[29] depth= 14 | trees= 450 | train F1=0.9958 | valid F1=0.7788 | time=7.57s
[30] depth= 14 | trees= 500 | train F1=0.9957 | valid F1=0.7777 | time=8.58s
[31] depth= 14 | trees= 550 | train F1=0.9958 | valid F1=0.7778 | time=8.94s
[32] depth= 18 | trees= 50 | train F1=0.9999 | valid F1=0.7775 | time=1.47s
[33] depth= 18 | trees= 100 | train F1=1.0000 | valid F1=0.7827 | time=2.21s
[34] depth= 18 | trees= 150 | train F1=1.0000 | valid F1=0.7803 | time=2.95s
[35] depth= 18 | trees= 200 | train F1=1.0000 | valid F1=0.7842 | time=3.80s
[36] depth= 18 | trees= 250 | train F1=1.0000 | valid F1=0.7867 | time=4.65s
[37] depth= 18 | trees= 300 | train F1=1.0000 | valid F1=0.7866 | time=5.38s
[38] depth= 18 | trees= 350 | train F1=1.0000 | valid F1=0.7826 | time=6.06s
[39] depth= 18 | trees= 400 | train F1=1.0000 | valid F1=0.7833 | time=6.91s
[40] depth= 18 | trees= 450 | train F1=1.0000 | valid F1=0.7828 | time=7.55s
[41] depth= 18 | trees= 500 | train F1=1.0000 | valid F1=0.7809 | time=8.22s
[42] depth= 22 | trees= 50 | train F1=1.0000 | valid F1=0.7746 | time=1.38s
[43] depth= 22 | trees= 100 | train F1=1.0000 | valid F1=0.7817 | time=2.48s
[44] depth= 22 | trees= 150 | train F1=1.0000 | valid F1=0.7837 | time=2.99s
[45] depth= 22 | trees= 200 | train F1=1.0000 | valid F1=0.7849 | time=3.90s
[46] depth= 22 | trees= 250 | train F1=1.0000 | valid F1=0.7856 | time=4.53s
[47] depth= 22 | trees= 300 | train F1=1.0000 | valid F1=0.7844 | time=5.39s
[48] depth= 22 | trees= 350 | train F1=1.0000 | valid F1=0.7834 | time=5.98s
[49] depth= 22 | trees= 400 | train F1=1.0000 | valid F1=0.7820 | time=6.64s
[50] depth= 22 | trees= 450 | train F1=1.0000 | valid F1=0.7840 | time=7.58s
[51] depth= 22 | trees= 500 | train F1=1.0000 | valid F1=0.7809 | time=8.36s
[52] depth= 26 | trees= 50 | train F1=0.9999 | valid F1=0.7739 | time=1.46s
[53] depth= 26 | trees= 100 | train F1=1.0000 | valid F1=0.7805 | time=2.15s
[54] depth= 26 | trees= 150 | train F1=1.0000 | valid F1=0.7849 | time=3.06s
[55] depth= 26 | trees= 200 | train F1=1.0000 | valid F1=0.7828 | time=4.00s
[56] depth= 26 | trees= 250 | train F1=1.0000 | valid F1=0.7836 | time=4.65s
[57] depth= 26 | trees= 300 | train F1=1.0000 | valid F1=0.7861 | time=5.00s
[58] depth= 26 | trees= 350 | train F1=1.0000 | valid F1=0.7836 | time=6.22s
[59] depth= 26 | trees= 400 | train F1=1.0000 | valid F1=0.7833 | time=6.79s
[60] depth= 26 | trees= 450 | train F1=1.0000 | valid F1=0.7842 | time=7.33s
[61] depth= 26 | trees= 500 | train F1=1.0000 | valid F1=0.7825 | time=8.19s
[62] depth= 26 | trees= 550 | train F1=1.0000 | valid F1=0.7819 | time=9.14s
[63] depth= 30 | trees= 50 | train F1=0.9999 | valid F1=0.7730 | time=1.40s
[64] depth= 30 | trees= 100 | train F1=1.0000 | valid F1=0.7812 | time=2.31s
[65] depth= 30 | trees= 150 | train F1=1.0000 | valid F1=0.7844 | time=2.99s
[66] depth= 30 | trees= 200 | train F1=1.0000 | valid F1=0.7825 | time=3.84s
[67] depth= 30 | trees= 250 | train F1=1.0000 | valid F1=0.7828 | time=4.54s
[68] depth= 30 | trees= 300 | train F1=1.0000 | valid F1=0.7858 | time=5.30s
[69] depth= 30 | trees= 350 | train F1=1.0000 | valid F1=0.7843 | time=6.14s
[70] depth= 30 | trees= 400 | train F1=1.0000 | valid F1=0.7827 | time=6.96s
[71] depth= 30 | trees= 450 | train F1=1.0000 | valid F1=0.7840 | time=7.62s
[72] depth= 30 | trees= 500 | train F1=1.0000 | valid F1=0.7817 | time=8.21s
[73] depth= 30 | trees= 550 | train F1=1.0000 | valid F1=0.7819 | time=9.28s
[Train] total time = 372.64s
Best cfg = {'max_depth': 18, 'n_estimators': 250} | Best valid Macro-F1 = 0.7867
Test time (s): 0.1216
Accuracy : 0.8028
Macro-F1 : 0.8001
[Test] Classification Report
precision recall f1-score support
-1 0.7686 0.7209 0.7440 387
0 0.8151 0.8603 0.8371 451
1 0.8204 0.8182 0.8193 374
accuracy 0.8028 1212
macro avg 0.8014 0.7998 0.8001 1212
weighted avg 0.8019 0.8028 0.8019 1212
0.8!!!!!!다시 돌아왔다 우리가
연구슬럼프 극복?!?
'Club > 졸업 연구 | 멀티모달 AI를 이용한 은하 병합 단계 분류' 카테고리의 다른 글
🌌Image-Exclusive 모델 - 보고서/introduction 초안 (1) | 2025.10.13 |
---|---|
MICE 성능 검증 실험 (0) | 2025.10.02 |
🌌 SYNERGI 9/25 전체회의 (1) | 2025.09.25 |
🌌 Tabular 모델 성능 향상 실험 : +SpecificAngMom (0) | 2025.09.18 |
🥼 Agile 10차 랩실 미팅 (0) | 2025.09.16 |