dc.description.abstract |
Malicious software (malware) poses a significant threat to the security and integrity of computer systems. Traditional malware detection approaches often encounter challenges due to small-scale and imbalanced datasets, resulting in reduced detection accuracy and reliability. In this research, we proposed a novel approach to address these issues by utilising a Random Forest method trained on a balanced synthetic dataset.
The primary objective of this study was to investigate the impact of employing a Random Forest technique on the detection of malware. To achieve this, we first created a balanced synthetic dataset based on the latest (CICMalDroid2020) dataset using Generative Adversarial Networks (GANs). This synthetic dataset aimed to address the limitations associated with small-scale and imbalanced datasets commonly encountered in malware detection. We then trained the Random Forest model using this balanced synthetic dataset. The evaluation of the model's performance was conducted using various metrics, including detection accuracy, precision, recall, balanced accuracy, geometric metrics, and F1-score. Intensive analyses were performed to assess the effectiveness of the proposed approach in detecting malware samples accurately and robustly, as compared to traditional detection methods. The results of our research provided insights into the potential benefits of utilising a Random Forest method trained on a balanced synthetic dataset for malware detection. The results shed light on the performance improvements achieved by the random forest method when trained on a balanced synthetic dataset, thus contributing to the advancement of malware detection techniques. The test results showed that random forest can detect malware attacks with an accuracy of 91%, recall of 100%, precision of 85%, Fl score of 92%, balanced accuracy of 95% and geometric metrics of 84%. From the results, we inferred that random forest has the capacity to detect malware attacks. |
en_US |