[UST] nrf2 classification

2024 동계 UST 인턴

[UST] nrf2 classification - 3

환성 2024. 1. 29. 14:56

728x90

마지막으로 정확도를 전에 진행했던 방식보다 좀 더 높이고자 neural network를 이용해서 nrf2 classification을 해보았다.

앞서 데이터 전처리 과정이나 분석 부분은 전과 동일하며 classify하는 코드 부분과 다르다.

# fingerprint 변환 함수 정의
def convert_fingerprint(data, fingerprint_type):
    return np.array([list(map(int, list(fp))) for fp in data[fingerprint_type]])

# fingerprint
fingerprint_types = ['fingerprint_atompair', 'fingerprint_avalon', 'fingerprint_morgan', 'fingerprint_topological']

# label
y_train = train_data['label'].values
y_test = test_data['label'].values

# MLPClassifier
params = {
    'hidden_layer_sizes': [(100,), (100, 100), (100, 50)],
    'activation': ['relu', 'tanh'], # 보편적인 relu, tanh을 사용
    'solver': ['adam', 'sgd'],
    'alpha': [0.0001, 0.001, 0.01], # default : 0.0001
    'learning_rate_init': [0.001, 0.01], # default : 0.001
    'max_iter' : [1000]
}

# 최적화 및 모델 평가
for fp_type in fingerprint_types:
    X_train = convert_fingerprint(train_data, fp_type)
    X_test = convert_fingerprint(test_data, fp_type)

    # RandomSearchCV
    mlp_classifier = MLPClassifier(random_state=42)
    random_search_cv = RandomizedSearchCV(mlp_classifier, param_distributions=params, n_iter=10, cv=3, random_state=42)
    random_search_cv.fit(X_train, y_train)

    # best model
    best_rf_classifier = random_search_cv.best_estimator_
    y_pred = best_rf_classifier.predict(X_test)

    # 성능 평가
    accuracy = accuracy_score(y_test, y_pred)
    report = classification_report(y_test, y_pred)
    fpr, tpr, thresholds = roc_curve(y_test, best_rf_classifier.predict_proba(X_test)[:, 1])
    roc_auc = auc(fpr, tpr)

    # TP, FP, FN, TP 정의(specificity 없어서 직접 계산)
    conf_matrix = confusion_matrix(y_test, y_pred)
    tn, fp, fn, tp = conf_matrix.ravel()
    
    # specificity
    specificity = tn / (tn + fp)
    
    # 결과 출력
    print(f"\n--- Fingerprint Type: {fp_type} ---")
    print("Best Parameters:", random_search_cv.best_params_)
    print("Accuracy:", accuracy)
    print("Specificity:", specificity)
    print("Classification Report:")
    print(report)

    # ROC Curve 
    plt.figure()
    plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title(f'ROC Curve - {fp_type}')
    plt.legend(loc="lower right")
    plt.show()

생각보다 정확도가 오르지 않았고 오히려 전과 비슷하거나 좀 더 떨어지는 결과가 나왔다. 인공 신경망 쪽 공부를 많이 하지 않아 하이퍼파라미터 튜닝 과정이 부족하다고 스스로 생각했다. 추가로 데이터 자체가 그렇게 많지 않았고 데이터셋 분포가 고르지 않았던게 원인인거 같았다.

저작자표시 변경금지

'2024 동계 UST 인턴' 카테고리의 다른 글

[UST] 분자 구조 예측 모델 만들기 - 2 (0)	2024.02.02
[UST] 분자 구조 예측 모델 만들기 - 1 (0)	2024.02.02
[UST] nrf2 classification - 2 (1)	2024.01.26
[UST] nrf2 classification - 1 (0)	2024.01.23
안전성평가연구소에서 인턴십 시작 (0)	2024.01.12

현재글[UST] nrf2 classification - 3

컴공생의 개인공부일지

데이터 분석가가 되기위해

UST #인턴, tableau, SQLD #DB, postgresql #sql, SQL #데이터리안 #데이터 분석 캠프, 데이터 #postgresql #sql, 파이썬, UST, 소프트웨어공학, wsl2 #docker, sql #postgresql, 영어 #숙어 #idiom, 머신러닝 #크롤링, ADSP #DB, R #통계학 #ML, pyspark #Jupyter Lab #Docker, postgresql #sql #데이터분석, 알고리즘, 벅스 #Bugs #워드클라우드, R #통계학 #컴퓨터공학 #ML,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

컴공생의 개인공부일지

[UST] nrf2 classification - 3

'2024 동계 UST 인턴' 카테고리의 다른 글

'2024 동계 UST 인턴'의 다른글

티스토리툴바

[UST] nrf2 classification - 3

'2024 동계 UST 인턴' 카테고리의 다른 글

'2024 동계 UST 인턴'의 다른글

관련글

티스토리툴바