We’ve published a short paper (in Lithuanian) presented at “Lietuvos magistrantų informatikos ir IT tyrimai” on May 13, 2025 and released by Vilnius University Press (Vilnius University Open Series).
Title: Kenkėjiškų programų aptikimo gerinimas taikant kelių klasių gerybinės programinės įrangos analizę
Authors: Juozapas Rokas Čypas, Viktor Medvedev, Juozas Dautartas
Pages: 24–27 • eISSN: 2783-784X (series eISSN 2669-0535) • DOI: 10.15388/LMITT.2025.3
What the paper proposes
- Problem with current datasets: In many public malware corpora, all benign software is treated as a single class. This simplification can increase false positives when malware mimics certain benign traits.
- Our approach: We propose categorizing benign software into multiple common groups (e.g., office apps, system tools, games, image editors, etc.), alongside a malware class, to sharpen the boundary between benign and malicious behavior.
- Feature base: Using static analysis (the typical first AV step), we extract 2000+ features per file—headers, section distribution, imported functions, strings, byte distributions, and more.
- Dimensionality reduction & visualization: To keep models efficient and interpretable, we plan to apply PCA, t-SNE, UMAP, and autoencoders to reduce dimensionality and visualize class separations.
- Modeling directions: We will evaluate machine learning and deep learning models (e.g., neural networks). We also outline transforming static features or raw binaries into images (e.g., GASF, GADF, MTF, GAFMAT, or binary-to-image) to enable convolutional neural networks for classification.
- Hypothesis under test: That multi-class benign categorization can improve malware detection accuracy by reducing confusion between malware and specific benign groups.
Why this matters
Better understanding the diversity of benign software should help models avoid false positives and generalize to unseen threats—without relying solely on dynamic analysis (which is more resource-intensive and sandbox-dependent).
Acknowledgements: Funded by the Research Council of Lithuania (LMTLT), agreement No. S-MIP-24-116