Prediksi Kualitas Udara Menggunakan Metode CatBoost

Authors

  • Mohamad Arif Abdul Syukur UIN Maulana Malik Ibrahim Malang
  • Suhartono Suhartono UIN Maulana Malik Ibrahim Malang
  • Totok Chamidy UIN Maulana Malik Ibrahim Malang

DOI:

https://doi.org/10.14421/jiska.2025.10.2.249-258

Keywords:

Prediction, Air Quality, CatBoost, GridSearchCV, Jakarta

Abstract

Air is important for life, but industrial activities, forest burning, cigarette smoke and transportation increase air pollution. AirVisual AQI 2024 data places Jakarta in 11th place in the world with the highest level of pollution, reaching 127 which is unhealthy for sensitive groups, and poses a risk of causing serious illnesses such as skin and respiratory diseases. This research uses the CatBoost method to predict the air quality index using Jakarta SPKU data taken from Kaggle. The data is processed through pre-processing and divided into four models with different comparisons of training and testing data. Each model was tested with the parameters iteration, depth, learning_rate, and l2_leaf_reg, using GridSearchCV to find the best combination. The results show that the model with 90% training data and 10% testing data provides the best accuracy of 97%, due to the larger proportion of training data. This research shows that the CatBoost method can provide accurate air quality predictions, which is important to support efforts to reduce the impact of pollution and improve public health.

References

Amalia, A., Zaidiah, A., & Isnainiyah, I. N. (2022). Prediksi Kualitas Udara Menggunakan Algoritma K-Nearest Neighbor. JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), 7(2), 496–507. https://doi.org/10.29100/jipi.v7i2.2843

Apte, J. S., Messier, K. P., Gani, S., Brauer, M., Kirchstetter, T. W., Lunden, M. M., Marshall, J. D., Portier, C. J., Vermeulen, R. C. H., & Hamburg, S. P. (2017). High-Resolution Air Pollution Mapping with Google Street View Cars: Exploiting Big Data. Environmental Science & Technology, 51(12), 6999–7008. https://doi.org/10.1021/acs.est.7b00891

Baharuddin, M. M., Azis, H., & Hasanuddin, T. (2019). Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca. ILKOM Jurnal Ilmiah, 11(3), 269–274. https://doi.org/10.33096/ilkom.v11i3.489.269-274

Castelli, M., Clemente, F. M., Popovič, A., Silva, S., & Vanneschi, L. (2020). A Machine Learning Approach to Predict Air Quality in California. Complexity, 2020, 1–23. https://doi.org/10.1155/2020/8049504

Chandra, W., Resti, Y., & Suprihatin, B. (2022). Implementation of a Breakpoint Halfway Discretization to Predict Jakarta’s Air Quality. Inovasi Matematika (Inomatika), 4(1), 1–10. https://doi.org/10.35438/inomatika.v4i1.310

Dewi, N. K. (2021). Deteksi Fake Follower Instagram Menggunakan Catboost Classifer [UIN Syarif Hidayatullah]. https://repository.uinjkt.ac.id/dspace/handle/123456789/56737

Handhayani, T. (2023). An Integrated Analysis of Air Pollution and Meteorological Conditions in Jakarta. Scientific Reports, 13(1), 5798. https://doi.org/10.1038/s41598-023-32817-9

Jufriansah, A., Khusnani, A., Pramudya, Y., Sya’bania, N., Leto, K. T., Hikmatiar, H., & Saputra, S. (2023). AI Big Data System to Predict Air Quality for Environmental Toxicology Monitoring. Journal of Novel Engineering Science and Technology, 2(01), 21–25. https://doi.org/10.56741/jnest.v2i01.314

Kim, D. J., & Kim, J. Y. (2015). Generation Technique of Dynamic Monster’s Behavior Pattern Based on User’s Behavior Pattern Using FuSM. Journal of Next-Generation Convergence Information Services Technology, 1(1), 9–18. https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART002141142

Liang, Y. C., Maimury, Y., Chen, A. H. L., & Juarez, J. R. C. (2020). Machine Learning-Based Prediction of Air Quality. Applied Sciences, 10(24), 9151. https://doi.org/10.3390/app10249151

Nainggolan, S. P., & Sinaga, A. (2023). Comparative Analysis of Accuracy of Random Forest and Gradient Boosting Classifier Algorithm for Diabetes Classification. Sebatik, 27(1), 97–102. https://doi.org/10.46984/sebatik.v27i1.2157

Okprana, H., & Winanjaya, R. (2022). Analisis Pengaruh Komposisi Data Training dan Testing Terhadap Akurasi Algoritma Resilient Backpropagation (RProp). BRAHMANA: Jurnal Penerapan Kecerdasan Buatan, 4(1), 89–95. https://doi.org/10.30645/brahmana.v4i1.138

Ramadhani, R. F., Prasetiyowati, S. S., & Sibaroni, Y. (2022). Performance Analysis of Air Pollution Classification Prediction Map with Decision Tree and ANN. Journal of Computer System and Informatics (JoSYC), 3(4), 536–543. https://doi.org/10.47065/josyc.v3i4.2117

Ramesh, L. (2023). Prediction of Air Pollution and an Air Quality Index Using Machine Learning Techniques. International Journal of Advanced Research in Computer Science, 14(02), 51–55. https://doi.org/10.26483/ijarcs.v14i2.6972

Saputro, I. W., & Sari, B. W. (2020). Uji Performa Algoritma Naïve Bayes untuk Prediksi Masa Studi Mahasiswa. Creative Information Technology Journal, 6(1), 1. https://doi.org/10.24076/citec.2019v6i1.178

Syuhada, G., Akbar, A., Hardiawan, D., Pun, V., Darmawan, A., Heryati, S. H. A., Siregar, A. Y. M., Kusuma, R. R., Driejana, R., Ingole, V., Kass, D., & Mehta, S. (2023). Impacts of Air Pollution on Health and Cost of Illness in Jakarta, Indonesia. International Journal of Environmental Research and Public Health, 20(4), 2916. https://doi.org/10.3390/ijerph20042916

Downloads

Published

2025-05-31

How to Cite

Syukur, M. A. A. ., Suhartono, S. ., & Chamidy, T. (2025). Prediksi Kualitas Udara Menggunakan Metode CatBoost. JISKA (Jurnal Informatika Sunan Kalijaga), 10(2), 249–258. https://doi.org/10.14421/jiska.2025.10.2.249-258

Issue

Section

Articles