본문 바로가기
  • 컴공생의 공부 일기
  • 공부보단 일기에 가까운 것 같은
  • 블로그
Club/졸업 연구 | 멀티모달 AI를 이용한 은하 병합 단계 분류

🌌Image-Exclusive 모델 - ML:Tree-based 실험

by 정람지 2025. 9. 26.

Tree-based

맡았다

실험해보장

 

 

데이터는 RAW / standard스케일러->MICE 한 기본 버전 허깅페이스 업로드

이후 robust스케일러 , 칼럼별스케일러 적용 / 파인튜닝한 MICE 실험 후 다양한 데이터 적용 실험하기로

 


 K-Fold 교차검증(Stratified K-Fold Cross Validation) 도입

 


기본 구조

 

허깅페이스 데이터 받기 -> 모델 구조 선언  -> k-fold 이용 학습 -> 결과 출력 / 결과,모델 파일 저장

 

 

 

학습 시

  • 학습시간 출력하기 (전체 학습 시간, epoch별 시간)
  • loss curve (train, valid)
  • early stopping 5번

test 시

  • 테스트시간 출력하기
  • confusion matrix (파란색으로)
  • 이런 형식으로 (소수점 4자리까지 출력)
Test macro-F1      : 0.6980

[Test] Classification Report
              precision    recall  f1-score   support

           0     0.7388    0.8740    0.8007       246
           1     0.6687    0.5515    0.6045       194
           2     0.7102    0.6684    0.6887       187

    accuracy                         0.7129       627
   macro avg     0.7059    0.6980    0.6980       627
weighted avg     0.7086    0.7129    0.7066       627

 


 

 

이제 서버에서 실험하기...

서버 다이브

 

 

 

가상환경 켜기

conda activate test-env

 

 

 

파일 구조 세팅

추후 실제데이터테스팅/전처리다양화 파일 추가

 

 

 

 

HamCaDor/GalaxyMergerFinalTabularDataStandardMICE · Datasets at Hugging Face

Subset (1) default · 6.06k rows default (6.06k rows) Split (2) train · 4.85k rows train (4.85k rows)test (1.21k rows) stringlengths float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 float64 fl

huggingface.co

우선 베이직 데이터셋 통일


 

🌳 Decision Tree

  • max_depth 후보를 바꿔가며 5-Fold 검증 성능이 연속 5회 개선되지 않으면 중단하는 방식으로 구

에포크 개념 없음!

DecisionTree Stratified 5-Fold (early stopping patience=5)
depth_candidates=[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, None]

[01] depth=   2 | train F1=0.5359 | valid F1=0.5282 | time=0.11s
[02] depth=   3 | train F1=0.5772 | valid F1=0.5629 | time=0.14s
[03] depth=   4 | train F1=0.6154 | valid F1=0.5904 | time=0.18s
[04] depth=   5 | train F1=0.6514 | valid F1=0.6155 | time=0.21s
[05] depth=   6 | train F1=0.6867 | valid F1=0.6288 | time=0.25s
[06] depth=   7 | train F1=0.7213 | valid F1=0.6405 | time=0.27s
[07] depth=   8 | train F1=0.7678 | valid F1=0.6588 | time=0.30s
[08] depth=   9 | train F1=0.8125 | valid F1=0.6727 | time=0.31s
[09] depth=  10 | train F1=0.8573 | valid F1=0.6949 | time=0.33s
[10] depth=  11 | train F1=0.8971 | valid F1=0.6919 | time=0.35s
[11] depth=  12 | train F1=0.9289 | valid F1=0.6957 | time=0.38s
[12] depth=  13 | train F1=0.9539 | valid F1=0.7045 | time=0.39s
[13] depth=  14 | train F1=0.9712 | valid F1=0.7052 | time=0.39s
[14] depth=  15 | train F1=0.9830 | valid F1=0.7072 | time=0.40s
[15] depth=  16 | train F1=0.9899 | valid F1=0.6997 | time=0.38s
[16] depth=  17 | train F1=0.9948 | valid F1=0.7012 | time=0.39s
[17] depth=  18 | train F1=0.9977 | valid F1=0.7027 | time=0.38s
[18] depth=  19 | train F1=0.9987 | valid F1=0.7063 | time=0.39s
[19] depth=  20 | train F1=0.9996 | valid F1=0.7081 | time=0.39s
[20] depth=  21 | train F1=0.9999 | valid F1=0.7069 | time=0.38s
[21] depth=  22 | train F1=1.0000 | valid F1=0.7075 | time=0.39s
[22] depth=  23 | train F1=1.0000 | valid F1=0.7075 | time=0.38s
[23] depth=  24 | train F1=1.0000 | valid F1=0.7075 | time=0.39s
[24] depth=  25 | train F1=1.0000 | valid F1=0.7075 | time=0.38s
Early stopping at epoch 24

[Train] total time = 7.87s
Best depth = 20 | Best valid Macro-F1 = 0.7081

Test time (s): 0.0004
Accuracy     : 0.7005
Macro-F1     : 0.6966

[Test] Classification Report
              precision    recall  f1-score   support

          -1     0.6173    0.6253    0.6213       387
           0     0.7959    0.7694    0.7824       451
           1     0.6771    0.6952    0.6860       374

    accuracy                                       0.7005      1212
   macro avg     0.6968    0.6966    0.6966      1212
weighted avg      0.7022    0.7005    0.7012      1212

 


🌳Random Forest

 

RandomForest Stratified 5-Fold with warm_start early stopping
depth_candidates=[None, 10, 14, 18, 22, 26, 30]
start_trees=50, step=50, max_rounds=20, patience=5

[01] depth=None | trees=  50 | train F1=0.9999 | valid F1=0.7730 | time=1.53s
[02] depth=None | trees= 100 | train F1=1.0000 | valid F1=0.7809 | time=2.09s
[03] depth=None | trees= 150 | train F1=1.0000 | valid F1=0.7846 | time=2.86s
[04] depth=None | trees= 200 | train F1=1.0000 | valid F1=0.7822 | time=3.45s
[05] depth=None | trees= 250 | train F1=1.0000 | valid F1=0.7828 | time=4.31s
[06] depth=None | trees= 300 | train F1=1.0000 | valid F1=0.7858 | time=5.13s
[07] depth=None | trees= 350 | train F1=1.0000 | valid F1=0.7842 | time=5.96s
[08] depth=None | trees= 400 | train F1=1.0000 | valid F1=0.7825 | time=6.78s
[09] depth=None | trees= 450 | train F1=1.0000 | valid F1=0.7835 | time=8.13s
[10] depth=None | trees= 500 | train F1=1.0000 | valid F1=0.7820 | time=8.74s
[11] depth=None | trees= 550 | train F1=1.0000 | valid F1=0.7819 | time=8.94s
[12] depth=  10 | trees=  50 | train F1=0.9206 | valid F1=0.7418 | time=1.38s
[13] depth=  10 | trees= 100 | train F1=0.9227 | valid F1=0.7456 | time=2.21s
[14] depth=  10 | trees= 150 | train F1=0.9242 | valid F1=0.7473 | time=2.96s
[15] depth=  10 | trees= 200 | train F1=0.9243 | valid F1=0.7486 | time=3.78s
[16] depth=  10 | trees= 250 | train F1=0.9255 | valid F1=0.7460 | time=4.59s
[17] depth=  10 | trees= 300 | train F1=0.9254 | valid F1=0.7463 | time=5.46s
[18] depth=  10 | trees= 350 | train F1=0.9258 | valid F1=0.7455 | time=5.79s
[19] depth=  10 | trees= 400 | train F1=0.9251 | valid F1=0.7440 | time=7.13s
[20] depth=  10 | trees= 450 | train F1=0.9252 | valid F1=0.7460 | time=7.77s
[21] depth=  14 | trees=  50 | train F1=0.9939 | valid F1=0.7667 | time=1.39s
[22] depth=  14 | trees= 100 | train F1=0.9947 | valid F1=0.7758 | time=2.30s
[23] depth=  14 | trees= 150 | train F1=0.9951 | valid F1=0.7781 | time=3.11s
[24] depth=  14 | trees= 200 | train F1=0.9957 | valid F1=0.7781 | time=3.72s
[25] depth=  14 | trees= 250 | train F1=0.9953 | valid F1=0.7782 | time=4.61s
[26] depth=  14 | trees= 300 | train F1=0.9956 | valid F1=0.7788 | time=5.44s
[27] depth=  14 | trees= 350 | train F1=0.9958 | valid F1=0.7770 | time=6.01s
[28] depth=  14 | trees= 400 | train F1=0.9958 | valid F1=0.7771 | time=6.95s
[29] depth=  14 | trees= 450 | train F1=0.9958 | valid F1=0.7788 | time=7.57s
[30] depth=  14 | trees= 500 | train F1=0.9957 | valid F1=0.7777 | time=8.58s
[31] depth=  14 | trees= 550 | train F1=0.9958 | valid F1=0.7778 | time=8.94s
[32] depth=  18 | trees=  50 | train F1=0.9999 | valid F1=0.7775 | time=1.47s
[33] depth=  18 | trees= 100 | train F1=1.0000 | valid F1=0.7827 | time=2.21s
[34] depth=  18 | trees= 150 | train F1=1.0000 | valid F1=0.7803 | time=2.95s
[35] depth=  18 | trees= 200 | train F1=1.0000 | valid F1=0.7842 | time=3.80s
[36] depth=  18 | trees= 250 | train F1=1.0000 | valid F1=0.7867 | time=4.65s
[37] depth=  18 | trees= 300 | train F1=1.0000 | valid F1=0.7866 | time=5.38s
[38] depth=  18 | trees= 350 | train F1=1.0000 | valid F1=0.7826 | time=6.06s
[39] depth=  18 | trees= 400 | train F1=1.0000 | valid F1=0.7833 | time=6.91s
[40] depth=  18 | trees= 450 | train F1=1.0000 | valid F1=0.7828 | time=7.55s
[41] depth=  18 | trees= 500 | train F1=1.0000 | valid F1=0.7809 | time=8.22s
[42] depth=  22 | trees=  50 | train F1=1.0000 | valid F1=0.7746 | time=1.38s
[43] depth=  22 | trees= 100 | train F1=1.0000 | valid F1=0.7817 | time=2.48s
[44] depth=  22 | trees= 150 | train F1=1.0000 | valid F1=0.7837 | time=2.99s
[45] depth=  22 | trees= 200 | train F1=1.0000 | valid F1=0.7849 | time=3.90s
[46] depth=  22 | trees= 250 | train F1=1.0000 | valid F1=0.7856 | time=4.53s
[47] depth=  22 | trees= 300 | train F1=1.0000 | valid F1=0.7844 | time=5.39s
[48] depth=  22 | trees= 350 | train F1=1.0000 | valid F1=0.7834 | time=5.98s
[49] depth=  22 | trees= 400 | train F1=1.0000 | valid F1=0.7820 | time=6.64s
[50] depth=  22 | trees= 450 | train F1=1.0000 | valid F1=0.7840 | time=7.58s
[51] depth=  22 | trees= 500 | train F1=1.0000 | valid F1=0.7809 | time=8.36s
[52] depth=  26 | trees=  50 | train F1=0.9999 | valid F1=0.7739 | time=1.46s
[53] depth=  26 | trees= 100 | train F1=1.0000 | valid F1=0.7805 | time=2.15s
[54] depth=  26 | trees= 150 | train F1=1.0000 | valid F1=0.7849 | time=3.06s
[55] depth=  26 | trees= 200 | train F1=1.0000 | valid F1=0.7828 | time=4.00s
[56] depth=  26 | trees= 250 | train F1=1.0000 | valid F1=0.7836 | time=4.65s
[57] depth=  26 | trees= 300 | train F1=1.0000 | valid F1=0.7861 | time=5.00s
[58] depth=  26 | trees= 350 | train F1=1.0000 | valid F1=0.7836 | time=6.22s
[59] depth=  26 | trees= 400 | train F1=1.0000 | valid F1=0.7833 | time=6.79s
[60] depth=  26 | trees= 450 | train F1=1.0000 | valid F1=0.7842 | time=7.33s
[61] depth=  26 | trees= 500 | train F1=1.0000 | valid F1=0.7825 | time=8.19s
[62] depth=  26 | trees= 550 | train F1=1.0000 | valid F1=0.7819 | time=9.14s
[63] depth=  30 | trees=  50 | train F1=0.9999 | valid F1=0.7730 | time=1.40s
[64] depth=  30 | trees= 100 | train F1=1.0000 | valid F1=0.7812 | time=2.31s
[65] depth=  30 | trees= 150 | train F1=1.0000 | valid F1=0.7844 | time=2.99s
[66] depth=  30 | trees= 200 | train F1=1.0000 | valid F1=0.7825 | time=3.84s
[67] depth=  30 | trees= 250 | train F1=1.0000 | valid F1=0.7828 | time=4.54s
[68] depth=  30 | trees= 300 | train F1=1.0000 | valid F1=0.7858 | time=5.30s
[69] depth=  30 | trees= 350 | train F1=1.0000 | valid F1=0.7843 | time=6.14s
[70] depth=  30 | trees= 400 | train F1=1.0000 | valid F1=0.7827 | time=6.96s
[71] depth=  30 | trees= 450 | train F1=1.0000 | valid F1=0.7840 | time=7.62s
[72] depth=  30 | trees= 500 | train F1=1.0000 | valid F1=0.7817 | time=8.21s
[73] depth=  30 | trees= 550 | train F1=1.0000 | valid F1=0.7819 | time=9.28s

[Train] total time = 372.64s
Best cfg = {'max_depth': 18, 'n_estimators': 250} | Best valid Macro-F1 = 0.7867

Test time (s): 0.1216
Accuracy     : 0.8028
Macro-F1     : 0.8001

[Test] Classification Report
              precision    recall  f1-score   support

          -1     0.7686    0.7209    0.7440       387
           0     0.8151    0.8603    0.8371       451
           1     0.8204    0.8182    0.8193       374

    accuracy                         0.8028      1212
   macro avg     0.8014    0.7998    0.8001      1212
weighted avg     0.8019    0.8028    0.8019      1212

헤헤 웅배교수님 자존감상승말 많이 해 주셔서 좋음...굉장ㅎㅣ좋은질문입니다

0.8!!!!!!다시 돌아왔다 우리가

연구슬럼프 극복?!?