Project 5: ML For Security - Constructing And Evading Networ
Project 5 Ml For Security Constructing Evading Network Traffic Bas
The goal of this project is to introduce students to machine learning techniques and methodologies that help to differentiate between malicious and legitimate network traffic. Students will use a machine learning approach to create a model that learns normal network traffic, then learn how to blend attack traffic to resemble normal traffic in order to bypass the learned model.
This assignment involves working with a payload-based intrusion detection system (IDS) modeled on the PAYL approach, which analyzes byte frequency features of network payloads. The project requires training and testing the model to identify normal versus attack traffic using provided code, data, and configurations, as well as understanding and implementing evasion techniques such as polymorphic blending attacks.
Paper For Above instruction
In modern cybersecurity, machine learning (ML) has become an essential tool for developing intrusion detection systems (IDS) capable of distinguishing malicious from legitimate network traffic. This paper explores the design, training, testing, and evasion strategies in ML-based network security models, with a focus on payload analysis through byte frequency modeling, specifically the PAYL system. The research investigates methods for creating accurate models of normal traffic, parameters optimization to improve detection rates, and techniques to evade detection, such as polymorphic blending attacks.
The PAYL (Payload-based Anomaly Detection with Least squares) model utilizes statistical analysis of byte frequency distributions within network payloads to identify anomalies associated with malicious activities. This technique involves extracting byte frequency features from network packets, training models on normal traffic, and classifying new packets based on their Mahalanobis distance from the established normal profile. Critical parameters influencing model performance include the threshold (which determines the acceptance or rejection of payloads) and the smoothing factor (which stabilizes frequency estimates).
In the training phase, the model is exposed to normal network traffic—collected through pcap files—to learn typical byte frequency distributions. The training process involves segmenting data by payload length and calculating mean and standard deviation for each byte position, which form the basis for the Mahalanobis distance measurement. Proper selection of threshold and smoothing factors is vital to balance false positives and false negatives—aiming for at least a 96% true positive detection rate, with some parameter pairs achieving over 99%. Experimentally, parameter tuning requires systematic variation and evaluation of true positive rates, often employing grid search techniques.
Testing involves applying the trained model to new data, including attack payloads designed to test the model's robustness and detection capabilities. Attack payloads are obtained from a specific URL, then tested against the model to verify their rejection—demonstrating the model's ability to detect malicious traffic. Additionally, artificial payloads adhering to the same protocol must be accepted, confirming the model's correct configuration and parameters.
To enhance the sophistication of attack methods, polymorphic blending techniques are employed, which modify attack payloads by blending them with normal payloads through byte frequency substitution. Based on the work by Fogla et al., these techniques involve generating substitution tables that map attack payload bytes to normal traffic byte frequencies, achieved through methods such as one-to-many substitution. The goal is to produce a modified payload that statistically resembles normal traffic enough to bypass the ML model while maintaining the attack's malicious intent.
Implementation of evasion strategies requires completing components like substitution and padding algorithms. The substitution table is created by analyzing byte frequency distributions of attack and normal payloads, then mapping attack bytes to mimic normal byte distributions. Padding schemes ensure payload length conformity, which is critical since models are length-sensitive. Successful evasion involves outputting a payload that is accepted by the detection system despite containing malicious content, effectively testing the model's limitations and resilience.
Overall, this project emphasizes the importance of parameter optimization, robust model construction, and understanding attack evasion methodologies. It further demonstrates the perpetual arms race between intrusion detection systems and attackers, highlighting the necessity for adaptable, intelligent security solutions that can detect both known and novel attack strategies.
References
- Wang, K., & Stolfo, S. J. (2004). Anomalous payload-based worm detection and signature generation. RAID.
- Fogla, P., Sharif, M., Perdisci, R., Kolesnikov, O., & Lee, W. (2006). Polymorphic blending attacks. USENIX Security.
- Bellard, C., et al. (2009). Machine learning approaches for intrusion detection. IEEE Security & Privacy.
- Garcia, S., et al. (2014). Network intrusion detection with machine learning: A review. ACM Computing Surveys.
- Akoglu, L., et al. (2015). A survey of anomaly detection in network traffic. Data Mining and Knowledge Discovery.
- Sommer, R., & Paxson, V. (2010). Outside the closed world: On using machine learning for network intrusion detection. IEEE Security & Privacy.
- Kim, S., et al. (2016). Evaluation of machine learning techniques for intrusion detection. Journal of Network and Computer Applications.
- Lee, W., et al. (2000). Employing machine learning to improve intrusion detection. ACM Transactions on Internet Technology.
- Roesch, M. (1999). Snort - Lightweight intrusion detection for networks. Proceedings of the 13th USENIX Conference.
- Denning, D. E. (1987). An intrusion-detection model. IEEE Transactions on Software Engineering.