Show simple item record

dc.contributor.advisor Mokwena, S. N.
dc.contributor.author Matsobane, Neo Onica
dc.date.accessioned 2025-01-30T11:05:22Z
dc.date.available 2025-01-30T11:05:22Z
dc.date.issued 2024
dc.identifier.uri http://hdl.handle.net/10386/4846
dc.description Thesis (M.Sc. (eScience Data Science)) -- University of Limpopo, 2024 en_US
dc.description.abstract Malicious software (malware) poses a significant threat to the security and integrity of computer systems. Traditional malware detection approaches often encounter challenges due to small-scale and imbalanced datasets, resulting in reduced detection accuracy and reliability. In this research, we proposed a novel approach to address these issues by utilising a Random Forest method trained on a balanced synthetic dataset. The primary objective of this study was to investigate the impact of employing a Random Forest technique on the detection of malware. To achieve this, we first created a balanced synthetic dataset based on the latest (CICMalDroid2020) dataset using Generative Adversarial Networks (GANs). This synthetic dataset aimed to address the limitations associated with small-scale and imbalanced datasets commonly encountered in malware detection. We then trained the Random Forest model using this balanced synthetic dataset. The evaluation of the model's performance was conducted using various metrics, including detection accuracy, precision, recall, balanced accuracy, geometric metrics, and F1-score. Intensive analyses were performed to assess the effectiveness of the proposed approach in detecting malware samples accurately and robustly, as compared to traditional detection methods. The results of our research provided insights into the potential benefits of utilising a Random Forest method trained on a balanced synthetic dataset for malware detection. The results shed light on the performance improvements achieved by the random forest method when trained on a balanced synthetic dataset, thus contributing to the advancement of malware detection techniques. The test results showed that random forest can detect malware attacks with an accuracy of 91%, recall of 100%, precision of 85%, Fl score of 92%, balanced accuracy of 95% and geometric metrics of 84%. From the results, we inferred that random forest has the capacity to detect malware attacks. en_US
dc.format.extent vi, 67 leaves en_US
dc.language.iso en en_US
dc.relation.requires PDF en_US
dc.subject Random Forest en_US
dc.subject Malware detection en_US
dc.subject Synthetic dataset en_US
dc.subject Balanced dataset en_US
dc.subject Generative Adversarial Networks (GANs) en_US
dc.subject.lcsh Malware (Computer software) en_US
dc.subject.lcsh Computer viruses en_US
dc.subject.lcsh Data sets en_US
dc.title Malware detection using random forest method trained on a balanced synthetic dataset en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ULSpace


Browse

My Account