Image-Based Malware Multiclass Classification Using Vision Transformer Architecture
Multiclass Klasifikasi Malware Berbasis Gambar Menggunakan Vision Transformer Architecture
DOI:
https://doi.org/10.14421/csecurity.2025.8.1.5107Abstract
Perkembangan malware yang semakin canggih telah menjadi ancaman serius bagi keamanan siber global, mengakibatkan kerugian finansial yang signifikan. Metode deteksi tradisional seperti deteksi berbasis tanda tangan dan analisis dinamis memiliki keterbatasan dalam mendeteksi varian malware baru. Sebagai solusi inovatif, analisis malware berbasis gambar mengubah file biner malware menjadi representasi gambar, memanfaatkan pemrosesan citra digital dan pembelajaran mesin untuk identifikasi yang lebih efisien. Penelitian ini menggunakan arsitektur Vision Transformer (ViT) untuk klasifikasi malware multikelas berbasis gambar, menawarkan pendekatan yang lebih efektif dibandingkan CNN tradisional seperti EfficientNet dan VGG16. ViT muncul sebagai pendekatan baru yang menarik karena fleksibilitasnya dalam memahami hubungan objek dalam gambar dan mendeteksi pola penting. Dengan kemampuannya mempelajari hubungan jangka panjang antar data, ViT dapat mendeteksi perbedaan halus antara berbagai jenis malware dan mencapai akurasi lebih tinggi. Dataset yang digunakan adalah Malimg, yang merupakan hasil konversi malware biner menjadi format gambar. Hasil penelitian menunjukkan Vision Transformers mencapai akurasi pelatihan 99.96%, validasi 98.05%, dan pengujian 97.49%, meningkatkan akurasi dibandingkan CNN. Keberhasilan ini menunjukkan kemajuan signifikan dalam akurasi deteksi, mengindikasikan arah menjanjikan untuk penelitian dan aplikasi keamanan siber di masa depan. Studi ini menekankan pentingnya teknik pembelajaran mesin yang canggih untuk meningkatkan deteksi malware.
Kata kunci: Vision Transformers, Klasifikasi Malware, Deep learning
-------------------------
The increasing sophistication of malware has become a serious threat to global cybersecurity, resulting in significant financial losses for individuals and organizations. Traditional detection methods, such as signature-based detection and dynamic analysis, face limitations in identifying new or modified malware variants. As an innovative solution, image-based malware analysis converts malware binary files into image representations, leveraging digital image processing and machine learning for safer and more efficient identification. This study employs the Vision Transformer (ViT) architecture for multiclass image-based malware classification, offering a more effective approach compared to traditional CNNs. The Vision Transformer (ViT) has emerged as an exciting new approach, gaining attention for its flexibility in understanding object relationships within images and detecting important patterns. ViT, with its ability to learn long-range relationships between data, can detect subtle differences between various types and subtypes of malware, achieving higher classification accuracy. The results of this study show that Vision Transformers achieve the highest training accuracy of 99.96%, the highest validation accuracy of 98.05%, and a testing accuracy of 97.49%. The success of Vision Transformers in malware classification indicates significant advancements in detection accuracy, suggesting a promising direction for future research and applications in cybersecurity. This study underscores the importance of leveraging advanced machine learning techniques to enhance malware detection capabilities
Keywords: Vision Transformers, Malware Classification, Deep learning
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Diash Firdaus, Idi Sumardi, Chalifa Chazar, Muhamad Zufar Dafy

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.