Conclusion¶

This is a Python base notebook

Summary of the whole project¶

import pandas as pd
results = pd.read_csv('data/model_results.csv', index_col = 0 )
results.reset_index().rename(index= {0: 'fit time', 1: 'score time', 2: 'test accuracy', 3: 'train accuracy', 4: 'test ROC AUC', 5: 'train ROC AUC'})

	dummy	LogisticReg	LogisticReg_opt	RandomForest	XGBoost	LGBM	Cat_Boost	averaging
fit time	0.138 (+/- 0.015)	13.436 (+/- 0.415)	729.602 (+/- 1.470)	3.025 (+/- 0.158)	6.629 (+/- 0.337)	2.496 (+/- 0.136)	43.591 (+/- 0.065)	14.263 (+/- 0.906)
score time	0.054 (+/- 0.003)	0.138 (+/- 0.048)	0.126 (+/- 0.062)	0.096 (+/- 0.003)	0.072 (+/- 0.003)	0.061 (+/- 0.003)	0.675 (+/- 0.045)	0.278 (+/- 0.010)
test accuracy	0.520 (+/- 0.001)	0.887 (+/- 0.006)	0.888 (+/- 0.008)	0.900 (+/- 0.011)	0.944 (+/- 0.007)	0.950 (+/- 0.003)	0.954 (+/- 0.005)	0.951 (+/- 0.004)
train accuracy	0.520 (+/- 0.000)	0.972 (+/- 0.002)	0.974 (+/- 0.003)	0.997 (+/- 0.001)	0.997 (+/- 0.001)	0.997 (+/- 0.001)	0.995 (+/- 0.001)	0.997 (+/- 0.001)
test ROC AUC	0.500 (+/- 0.000)	0.955 (+/- 0.005)	0.954 (+/- 0.005)	0.960 (+/- 0.002)	0.986 (+/- 0.001)	0.991 (+/- 0.002)	0.991 (+/- 0.002)	0.986 (+/- 0.001)
train ROC AUC	0.500 (+/- 0.000)	0.997 (+/- 0.000)	0.998 (+/- 0.000)	1.000 (+/- 0.000)	1.000 (+/- 0.000)	1.000 (+/- 0.000)	1.000 (+/- 0.000)	1.000 (+/- 0.000)

Amongst the models, LGBMClassifier is the best model. Even though CatBoostClassifier has the best accuracy and ROC AUC score, the fit time forcat_boost is slow. It can be a concern as the algorithm is likely to refit every user by the latest song listening history whenever the user wants to update their playlist. For LGBMClassifier, the test accuracy is 0.950, which is 0.004 lower than CatBoostClassifier, but the fit time is 20 times shorter.

Therefore, LGBMClassifier will be used for the Soptify user behavior prediction.

Spotify User Behaviour Predictor

Conclusion¶

Summary of the whole project¶