International Association of Educators   |  ISSN: 1308-951X

Original article | International Journal of Research in Teacher Education 2023, Vol. 14(3) 25-40

An Analysis of MARS and Logistic Regression Methods in Educational Data Mining in Light of Some Performance Indicators

Hikmet Şevgin, Özlem Bezek Güre & Murat Kayri

pp. 25 - 40   |  DOI: https://doi.org/10.29329/ijrte.2023.598.03   |  Manu. Number: MANU-2308-20-0001

Published online: September 27, 2023  |   Number of Views: 25  |  Number of Download: 144


Abstract

This study aims to compare the MARS method and Logistic Regression (LR) methods from the family of nonlinear regression methods regarding correct classification rate, type I error, type II error and area under the ROC curve (AUC) metrics according to sample sizes using ABIDE data. For this purpose, Turkish achievement scores of 5000 randomly selected eighth grade students who participated in ABIDE 2016 and various demographic variables were used. The analyses show that in terms of correct classification rate, the LR method is more accurate in small sample size and the MARS method is more accurate in large sample size.  With respect to the area under the ROC curve, the LR method performs better at small sample sizes and the MARS method performs better at large sample sizes. In terms of Type I error rate, LR has less error rate at small sample size and more error rate at large sample size, while MARS has more error rate at small sample size and less error rate at large sample size. In terms of Type II error rate, the MARS method has less error rate than the LR in all other sample sizes except 1500 sample size. The MARS method yields better results than the LR in both error types. In order to obtain robust and error-free results in educational studies, using the LR method for small sample sizes and the MARS for large sample sizes is recommended.

Keywords: ABIDE, Logistic Regression, MARS, Correct Classification Rate, Area Under Curve


How to Cite this Article?

APA 6th edition
Sevgin, H., Gure, O.B. & Kayri, M. (2023). An Analysis of MARS and Logistic Regression Methods in Educational Data Mining in Light of Some Performance Indicators . International Journal of Research in Teacher Education, 14(3), 25-40. doi: 10.29329/ijrte.2023.598.03

Harvard
Sevgin, H., Gure, O. and Kayri, M. (2023). An Analysis of MARS and Logistic Regression Methods in Educational Data Mining in Light of Some Performance Indicators . International Journal of Research in Teacher Education, 14(3), pp. 25-40.

Chicago 16th edition
Sevgin, Hikmet, Ozlem Bezek Gure and Murat Kayri (2023). "An Analysis of MARS and Logistic Regression Methods in Educational Data Mining in Light of Some Performance Indicators ". International Journal of Research in Teacher Education 14 (3):25-40. doi:10.29329/ijrte.2023.598.03.

References

    Addini, P. F., Hadi, W., & Harahap, P. M. R. (2023). Application of the multivariate adaptive regression spline (MARS) method in analyzing misclassifıcation of elementary school accreditation data in the city of Tebing Tinggi. Jurnal Scientia, 12(01), 617-620. https://doi.org/10.58471/scientia.v12i01.1172

    Afrilia, A., Joharudin, A., Zaky, M., Budiman, B., & Fauziah, M. (2021, April). Credit scoring model using MARS method to comply with FSA regulation. In Journal of Physics: Conference Series (Vol. 1869, No. 1-6). DOI: 10.1088/1742-6596/1869/1/012135

    Akıncı, B. (2020). Fen bilimleri dersi öğretim programı ve ölçme değerlendirme araçlarının akademik becerilerin izlenmesi ve değerlendirilmesine (ABİDE) göre incelenmesi. [Yayınlanmamış Yükseklisans tezi]. Ankara Üniversitesi.

    Austin, P. C. (2007). A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. Statistics in medicine, 26(15), 2937-2957. https://doi.org/10.1002/sim.2770

    Binadari, R., Wilandari, Y., & Suparti, S. (2015). Perbandingan Metode Regresi Logistik Biner Dan Multivariate Adaptive Regression Spline (Mars) Pada Peminatan Jurusan SMA (Studi Kasus SMA Negeri 2 Semarang). Jurnal Gaussian, 4(4), 987-996. https://doi.org/10.14710/j.gauss.4.4.987-996

    Cabero-Almenara, J., Guillen-Gamez, F. D., Ruiz-Palmero, J., & Palacios-Rodríguez, A. (2021). Classification models in the digital competence of higher education teachers based on the DigCompEdu Framework: logistic regression and segment tree. Journal of E-Learning and Knowledge Society, (1), 49-61. https://doi.org/10.20368/1971-8829/1135472

    Çalık, G. (2020). Investigation of 8th grade students’ science achievement in Turkey: Results from monitoring and evaluating academic skills study (ABIDE) 2016 [Yayımlanmamış yüksek lisans tezi]. Orta Doğu Teknik Üniversitesi.

    Chuang, C. L., & Lin, R. H. (2009). Constructing a reassigning credit scoring model. Expert Systems with Applications, 36(2), 1685-1694. https://doi.org/10.1016/j.eswa.2007.11.067

    Conoscenti, C., Ciaccio, M., Caraballo-Arias, N. A., Gómez-Gutiérrez, Á., Rotigliano, E., & Agnesi, V. (2015). Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: a case of the Belice River basin (western Sicily, Italy). Geomorphology, 242, 49-64. https://doi.org/10.1016/j.geomorph.2014.09.020

    Demir, E. (2014). A decision support tool for predicting patients at risk of readmission: A comparison of classification trees, logistic regression, generalized additive models, and multivariate adaptive regression splines. Decision Sciences, 45(5), 849-880. https://doi.org/10.1111/deci.12094

    Doğan, D. (2022). Türkçe ders kitaplarındaki okuma metinleri ve anlama etkinliklerinin akademik becerilerin izlenmesi ve değerlendirilmesi (ABİDE) raporuna göre incelenmesi [Yayımlanmamış yüksek lisans tezi]. Ondokuz Mayıs Üniversitesi.

    Doğru, Ş. C. (2019). Karma testlerin psikometrik özelliklerini belirlemede klasik test kuramı ve Rasch modelinin karşılaştırılması [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.

    Elkonca, F. (2020). ABİDE özyeterlik ölçeği DMF kaynaklarının gizil sınıf yaklaşımıyla incelenmesi [Yayımlanmamış doktora tezi]. Gazi Üniversitesi.

    Ennis, M., Hinton, G., Naylor, D., Revow, M., & Tibshirani, R. (1998). A comparison of statistical learning methods on the GUSTO database. Statistics in medicine, 17(21), 2501-2508. https://doi.org/10.1002/(SICI)1097-0258(19981115)17:21<2501::AID-SIM938>3.0.CO;2-M

    Eratli Sirin, Y., & Sahin, M. (2020). Investigation of Factors Affecting the Achievement of University Students with Logistic Regression Analysis: School of Physical Education and Sport Example. Sage Open, 10(1). https://doi.org/10.1177/2158244020902082

    Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861-874. https://doi.org/10.1016/j.patrec.2005.10.010

    Felicísimo, Á. M., Cuartero, A., Remondo, J., & Quirós, E. (2013). Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides, 10, 175-189. https://doi.org/10.1007/s10346-012-0320-1

    Friedman, J. H. (1991). Multivariate adaptive regression splines. The annals of statistics, 19(1), 1-67. https://doi.org/10.1214/aos/1176347963

    Ghasemzadeh, A., & Ahmed, M. M. (2018). Utilizing naturalistic driving data for in-depth analysis of driver lane-keeping behavior in rain: Non-parametric MARS and parametric logistic regression modeling approaches. Transportation research part C: emerging technologies, 90, 379-392. https://doi.org/10.1016/j.trc.2018.03.018

    Goh, A.T.C., Zhang, Y., Zhang, R., Zhang, W., Xiao, Y. (2017). Evaluating Stability of Underground EntryType Excavations Using Multivariate Adaptive Regression Splines and Logistic Regression. Tunnelling and Underground Space Technology, 70, 148-154. https://doi.org/10.1016/j.tust.2017.07.013

    Göktentürk, T., Demir, İ., & Arıcı, A. F. (2021). PISA’nın ışığında geliştirilen ABİDE projesinde okuma bakımından ne hâldeyiz? Bir geçerlilik çalışması. RumeliDE Dil ve Edebiyat Araştırmaları Dergisi, (22), 657-665. https://doi.org/10.29000/rumelide.897081

    Güre, Ö. B., Şevgin, H., & Kayri, M. (2022). Ordinal Lojistik Yöntemi Kullanılarak Sekizinci Sınıf Öğrencilerinin Fen Başarısını Etkileyen Faktörlerin İncelenmesi. Anemon Muş Alparslan Üniversitesi Sosyal Bilimler Dergisi, 10(2), 781-797. https://doi.org/10.18506/anemon.1052062

    Haleem, K., Gan, A., & Lu, J. (2013). Using multivariate adaptive regression splines (MARS) to develop crash modification factors for urban freeway interchange influence areas. Accident Analysis & Prevention, 55, 12-21. https://doi.org/10.1016/j.aap.2013.02.018

    Hasanah, S. H. (2021). Multivariate Adaptive Regression Splines (MARS) for Modeling The Student Status at Universitas Terbuka. Jurnal Matematika MANTIK, Vol, 27, 51-58. . https://doi.org/10.15642/mantik.2021.7.1.51-58

    Hasanah, S. H., Sadik, K., & Afendi, F. M. (2014). Comparison of Method Classification artificial neural network back propagation, logistic regression, and multivarıate adaptive regression splines (MARS)(case study data of unsecured loan). Proc. ICCS-13, Bogor, Indonesia-13, 477-486.

    Hasyim, M., & Prastyo, D. D. (2018). Modelling lecturer performance index of private university in Tulungagung by using survival analysis with multivariate adaptive regression spline. In Journal of Physics: Conference Series, Vol. 974, No. 1. https://doi.org/10.1088/1742-6596/974/1/012065

    Hasyim, M., Rahayu, D. S., Muliawati, N. E., Hayuhantika, D., Puspasari, R., Anggreini, D., ... & Utomo, F. H. (2018). Bootstrap aggregating multivariate adaptive regression splines (Bagging MARS) to analyse the lecturer research performance in private university. In Journal of Physics: Conference Series, Vol. 1114, No. 1. https://doi.org/10.1088/1742-6596/974/1/012065

    Hosmer, D.W. & Lemeshow, S. (2000). Applied Logistic Regression (2nd ed.). Wiley and Sons, Inc.

    Jalaluddin, M. (2009). Permodelan Partisipasi Anak Dalam Kegiatan Ekonomi di Sumatera Barat Menggunakan Regresi Logistik dan MARS. Tesis Institut Teknologi Sepuluh Nopember, Serabaya.

    Karasar, N. (2009). Bilimsel araştırma yöntemi (23. baskı).Nobel Yayınları.

    Kartal, M., Depren, S. K., & Depren, Ö. (2018). Türkiye’de döviz kurlarını etkileyen makroekonomik göstergelerin belirlenmesi: MARS yöntemi ile bir inceleme. MANAS Sosyal Araştırmalar Dergisi, 7(1), 209-229.

    Kaya, A. (2022). ABİDE-2016 Matematik testinin farklı kayıp veri teknikleri ile incelenmesi [Yayınlanmamış yüksek lisans tezi]. Hasan Kalyoncu Üniversitesi.

    Kayri, M. (2010). The analysis of internet addiction scale using multivariate adaptive regression splines. Iranian journal of public health, 39(4), 51.

    Kılıç Depren, S. (2018). Prediction Of Students’ Science Achievement: An Application Of Multivariate Adaptive Regression Splines And Regression Trees. Journal of Baltic Science Education, 17(5), 887-903. https://doi.org/10.33225/jbse/18.17.887

    Kılıç, A. F. (2019). Karma testlerde doğrulayıcı faktör analizi kestirim yöntemlerinin karşılaştırılması [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.

    Kuhnert, P. M., Do, K. A., & McClure, R. (2000). Combining non-parametric models with logistic regression: an application to motor vehicle injury data. Computational Statistics & Data Analysis, 34(3), 371-386.

    Kuhnert, P. M., Do, K. A., & McClure, R. (2000). Combining non-parametric models with logistic regression: an application to motor vehicle injury data. Computational Statistics & Data Analysis, 34(3), 371-386. https://doi.org/10.1016/S0167-9473(99)00099-7

    Kumar, D. S., Siri, Z., Rao, D. S., & Anusha, S. (2019). Predicting student’s campus placement probability using binary logistic regression. International Journal of Innovative Technology and Exploring Engineering, 8(9), 2633-2635. http://eprints.um.edu.my/id/eprint/23513

    Lee, T. S., Chiu, C. C., Chou, Y. C., & Lu, C. J. (2006). Mining the customer credit using classification and regression tree and multivariate adaptive regression splines. Computational Statistics & Data Analysis, 50(4), 1113-1130. https://doi.org/10.1016/j.csda.2004.11.006

    Lee, T.S. ve Chen, I.F. (2005). A Two-Stage Hybrid Credit Scoring Model Using Artificial Neural Networks and Multivariate Adaptive Regression Splines. Expert Systems with Applications, 28, 743-752. https://doi.org/10.1016/j.eswa.2004.12.031

    Lin, H. Y., Wang, W., Liu, Y. H., Soong, S. J., York, T. P., Myers, L., & Hu, J. J. (2008). Comparison of multivariate adaptive regression splines and logistic regression in detecting SNP–SNP interactions and their application in prostate cancer. Journal of human genetics, 53(9), 802-811. https://doi.org/10.1007/s10038-008-0313-z

    Martís, R., Alonso, J., Catalán, C., Fuentes, R., & Suárez, A. A. (2015). Prediction of the student success rate by means of quality teaching survey variables applying a multivariate adaptive regression splines (MARS) models. In Toulon-Verona Conference “Excellence in Services (pp. 1-14).

    MEB (2017). Akademik becerilerin incelenmesi ve değerlendirilmesi 8. sınıflar raporu. Türkiye Cumhuriyeti Milli Eğitim Bakanlığı.

    Miguéis, V. L., Camanho, A., & e Cunha, J. F. (2013). Customer attrition in retailing: an application of multivariate adaptive regression splines. Expert Systems with Applications, 40(16), 6225-6232. https://doi.org/10.1016/j.eswa.2013.05.069

    Negricea, I. C. (2022). Exploring student satisfaction with online education during the COVID-19 pandemic in Romania: A logistic regression approach. Transformations in Business & Economics, 21(2), 56.

    Niu, L. (2020). A review of the application of logistic regression in educational research: Common issues, implications, and suggestions. Educational Review, 72(1), 41-67. https://doi.org/10.1080/00131911.2018.1483892

    Özdamar, K. (2002). Paket Programlar ile İstatistiksel Veri Analizi (4.baskı). Kaan Kitapevi.

    Özgürlük, B. (2019). Örneklem büyüklüğünün ve madde formatının sekizinci sınıf ABİDE testlerinin eşitlenmesine etkisi [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.

    Park, J., & Abdel-Aty, M. (2015). Assessing the safety effects of multiple roadside treatments using parametric and nonparametric approaches. Accident Analysis & Prevention, 83, 203-213. https://doi.org/10.1016/j.aap.2015.07.008

    Park, S., Hamm, S. Y., Jeon, H. T., & Kim, J. (2017). Evaluation of logistic regression and multivariate adaptive regression spline models for groundwater potential mapping using R and GIS. Sustainability, 9(7), 1157. https://doi.org/10.3390/su9071157

    Peng, C. Y. J., So, T. S. H., Stage, F. K., & St. John, E. P. (2002). The use and interpretation of logistic regression in higher education journals: 1988–1999. Research in higher education, 43, 259-293. https://doi.org/10.1023/A:1014858517172

    Prasetyo, G. (2009). Klasifikasi Deteksi Intrusi Menggunakan Pendekatan Classification and Regression Trees (CART) dan Multivariate Adaptive Regression Splines (MARS). Tesis Institut Teknologi Sepuluh Nopember, Serabaya.

    Reyhanlıoğlu, Ç. ve Tiryaki, İ. (2021). Ülkemizde gerçekleştirilen ölçme ve değerlendirme faaliyetlerine genel bir bakış. Uluslararası Türk Eğitim Bilimleri Dergisi, 9 (16), 70-93. https://doi.org/10.46778/goputeb.766689

    Rotigliano, E., Martinello, C., Agnesi, V., & Conoscenti, C. (2018). Evaluation of debris flow susceptibility in El Salvador (CA): A comparison between multivariate adaptive regression splines (MARS) and binary logistic regression (BLR). Geogr. Bull, 67, 361-373. https://doi.org/10.15201/hungeobull.67.4.5

    Şevgin, H., & Önen, E. (2022). MARS ve BRT Veri Madenciliği Yöntemlerinin Sınıflama Performanslarının Karşılaştırılması: ABİDE-2016 Örneği. Eğitim ve Bilim, 47(211). http://dx.doi.org/10.15390/EB.2022.10575

    Singh, H. P., & Alhulail, H. N. (2022). Predicting Student-Teachers Dropout Risk and Early Identification: A Four-Step Logistic Regression Approach. IEEE Access, 10, 6470-6482. http://dx.doi.org/10.1109/ACCESS.2022.3141992

    Tatlıdil, H. (2002). Uygulamalı Çok Değişkenli İstatistiksel Analiz. Engin Yayınları.

    Tatlıdil, H., & Demirağ, İ. (2014). Türkiye'de Yoksulluğun Sosyo-Ekonomik ve Demografik Değişkenlerle İlişkilerinin Lojistik Regresyon ve Mars Yöntemleri Kullanılarak İncelenmesi. TISK Akademi, 9(17).

    Tollenaar, N., & van der Heijden, P. G. (2013). Which method predicts recidivism best?: a comparison of statistical, machine learning and data mining predictive models. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(2), 565-584. https://doi.org/10.1111/j.1467-985X.2012.01056.x

    Ülkü, S. (2019). ABİDE 2016 Türkçe ve Fen bilimleri alt-testlerinin öğretmen özelliklerine göre ölçme değişmezliğinin incelenmesi. [Yayımlanmamış yüksek lisans tezi]. Hacettepe Üniversitesi.

    Uysal, İ., & Doğan, N. (2021). Açık Uçlu Maddeleri Otomatik Puanlamak Ne Kadar Güvenilirdir: Türk Dilinde Bir Uygulama. Journal of Measurement and Evaluation in Education and Psychology, 12(1), 28-53.

    Wang, L. J., Guo, M., Sawada, K., Lin, J., & Zhang, J. (2015). Landslide susceptibility mapping in Mizunami City, Japan: A comparison between logistic regression, bivariate statistical analysis and multivariate adaptive regression spline models. Catena, 135, 271-282. https://doi.org/10.1016/j.catena.2015.08.007

    Weingarten, A. (2023). Balancing Inference and Prediction in Institutional Research: A Practical Comparison of Logistic Regression With Machine Learning Techniques in Modeling Student Persistence. [Doctoral dissertation, University of New York]. Cuny Academic Works. https://academicworks.cuny.edu/gc_etds/5170

    Wibowo, A., & Ridha, M. R. (2020). Comparison of logistic regression model and MARS using multicollinearity data simulation. JTAM (Jurnal Teori dan Aplikasi Matematika), 4(1), 39-48. https://doi.org/10.31764/jtam.v4i1.1801

    Yesilnacar, E., & Topal, T. A. M. E. R. (2005). Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Engineering Geology, 79(3-4), 251-266. https://doi.org/10.1016/j.enggeo.2005.02.002

    Yılmaz, M. (2021). Eğilim Puanları Kullanılarak ABİDE Çalışmasındaki Maddelerin Değişen Madde Fonksiyonu Açısından İncelenmesi [Yayınlanmamış yüksek lisans tezi]. Hacettepe Üniversitesi.

    Zhang, W., & Goh, A. T. (2016). Evaluating seismic liquefaction potential using multivariate adaptive regression splines and logistic regression. Geomech. Eng, 10(3), 269-284. http://dx.doi.org/10.12989/gae.2016.10.3.269

    Zhang, W., Goh, A. T., & Zhang, Y. (2016). Multivariate adaptive regression splines application for multivariate geotechnical problems with big data. Geotechnical and Geological Engineering, 34, 193-204. https://doi.org/10.1007/s10706-015-9938-9

    Zurimi, S. (2020). Analysis of Multivariate Adaptive Regression Spline (MARS) Model in Classifying factors affecting on Student the Study Period at FKIP Darussalam University of Ambon. In Journal of Physics: Conference Series (Vol. 1463, No. 1). https://doi.org/10.1088/1742-6596/1463/1/012005