Although some of us are adapting faster than others, most of us are getting used to the notion that artificial intelligence and machine learning are beginning to make our lives a bit easier, even while we recognize some of the downsides of AI. (Let’s face it, if today’s typical chatbot experience was our only contact with AI, the future would look pretty grim.)
Unhelpful, poorly trained chatbots aside, AI and machine learning bring us conveniences like traffic predictions and alternate route suggestions, converting speech to text, online shopping recommendations, language translations, image recognition and object detection functions, some decent customer service triage, and those notorious self-driving vehicles, to name just a few. Most of these, and a whole lot more, are here to stay.
Machine Learning Defined
Machine learning is a subset of artificial intelligence. AI is a broad field focused on developing machines that are capable of performing tasks that typically require human intelligence.
Machine learning is a narrower field, within the realm of Predictive AI, which is focused on enabling machines to learn from data in order to make predictions, decisions, or recommendations. It’s a method of teaching computers to learn from data—much like how humans learn from experience—without having to be explicitly programmed by humans.
Machines learn by being trained. Machine learning algorithms are trained on large datasets, which they use to learn patterns and relationships. They adjust their parameters based on changes in the data they are exposed to, which is what enables the algorithms to improve their performance over time. And once they have been trained, machine learning models can be used to make predictions or decisions about new, unknown, or unlabeled data.
At this stage of AI evolution, the potential benefits of these predictive models are virtually unlimited. And, as we have seen, these models have already begun enabling advances in a wide variety of fields, including healthcare and cybersecurity, with the promise of many more to come.
Opinions vary as to whether there are three, four, or five basic types of machine learning. The three fundamentals include supervised, unsupervised, and reinforcement learning. Additional types include semi-supervised and self-supervised learning. While IBM offers good descriptions of all five machine learning types, the basic three are outlined below.
Because data is at the heart of machine learning, and because it empowers so many useful new applications across such a wide range of industries, attackers and hackers have begun to exploit machine learning and the data it collects, employs, and stores.
Adversarial attacks are a type of cyberattack that specifically targets machine learning models. Their objective is to manipulate model behavior or compromise model (or data) integrity. When outcomes are compromised, the machine learning model becomes unreliable and unusable—at least until the attack is discovered and the model is restored or replaced.
When machine learning models are used to power critical applications, like those in medical diagnostics and financial fraud detection for example, they become highly attractive targets for adversarial attacks.
Adversarial attacks occur at all stages of the machine learning lifecycle, from design and implementation to training, testing, and deployment. Therefore, security measures must address each stage of machine learning in order to thwart attacks effectively.
Currently, adversarial attacks are most common at either the training or the deployment stage.
Adversarial attacks can vary widely based on a number of factors, and may have far-reaching impacts. Following are three examples of attacks and their outcomes. Attacks may create inputs (like images or text) that have been subtly altered to fool a machine learning model into making incorrect predictions.
Just as cybercriminals specialize in phishing schemes or ransomware exploits, as two examples, the newest form of cybercrime is targeting machine learning applications in order to steal data or to render it useless, damaging, or destructive.
A Forbes article in 2023 presented real-world examples of the damage caused by adversarial attacks. As one example, an attack can deceive a facial recognition system by altering an individual's facial image with changes undetectable to the human eye.
The Role of Adversarial Machine Learning in Fighting Off Attacks
The field of adversarial machine learning (AML) focuses on understanding and mitigating vulnerabilities in machine learning models by studying how hackers and other adversaries can manipulate inputs or training data to cause incorrect or undesired outcomes—sending designers and developers back to the drawing board to create more robust machine learning models. (Unless they follow Secure By Design principles promoted by CISA.)
AML investigates how hackers can exploit weaknesses in machine learning models, such as data dependency, interpretability issues, potential for bias, resource consumption, and accuracy issues. Adversarial machine learning aims to develop techniques for attacking machine learning models as hackers might attack them, and then to create safeguards to protect against those attacks.
Adversarial machine learning is an essential security component of machine learning, and it will be crucial to the ongoing advancement of artificial intelligence and machine learning applications.
NIST Guidance on Use of Adversarial Machine Learning
The National Institute of Standards and Technology (NIST) has been steadily monitoring the evolution of artificial intelligence and machine learning applications, as well as the threats and risks that emerge with those evolutionary changes.
In March 2025, NIST released its finalized guidelines on adversarial machine learning in the form of the NIST Trustworthy and Responsible AI Report No. 100-2e2025. The report organizes concepts and defines terminology in the field of adversarial machine learning, and describes the key types of current machine learning methods, the life cycle stages of attacks, and attacker goals, objectives, capabilities, and knowledge. The report identifies current attacks in the life cycle of machine learning systems and describes methods for managing and mitigating the consequences of those attacks, where mitigation techniques exist.
Highly technical in nature and heavily footnoted, the report is intended for use by individuals and teams who are responsible for designing, developing, evaluating, deploying, and managing AI and AML systems and ensuring their integrity, privacy, and security. It is hoped that the report will influence other standards and future best practices for assessing and managing the security and privacy of AI and AML systems.
Following is a brief overview of the primary types of attacks on Predictive AI and Generative AI outlined in the NIST report.
Adversarial Attacks on Predictive AI
Three primary categories of attacks on predictive machine learning are evasion attacks, poisoning attacks, and privacy attacks. However, there are numerous sub-categories. For example, poisoning attacks may include data poisoning, targeted poisoning, backdoor poisoning, and clean-label poisoning.
Evasion Attacks. In these attacks, the hacker’s goal is to generate samples that can be reclassified to some arbitrary class determined by the attacker. In terms of image classification, for example, the change in the original sample might be undetectable to the human eye, but the machine learning model is tricked into placing it in the target class selected by the attacker, rather than the class in which it belongs.
Poisoning Attacks. Broadly defined, poisoning attacks are adversarial attacks during the training stage of the machine learning algorithm, during which hackers insert malicious or altered data into a training dataset to manipulate a model's learning and decision-making processes.
Poisoning attacks are powerful and can cause indiscriminate damage to the machine learning model. They leverage a wide range of adversarial capabilities, such as data poisoning, model poisoning, label control, source code control, and test data control.
Privacy Attacks. These include data reconstruction, membership inference, property inference, and model extraction attacks. As one example, data reconstruction attacks and membership inference attacks are able to recover an individual’s private data from aggregate information.
Adversarial Attacks on Generative AI
Although many attack types in Predictive AI are also observed in Generative AI (such as data poisoning, model poisoning, and model extraction), recent research has also found adversarial machine learning attacks that are specific to Generative AI systems, according to the NIST report. The three categories of such attacks include supply chain attacks, direct prompting attacks, and indirect prompt injection attacks, as described below.
The NIST report offers 127 pages of insights into adversarial attacks and adversarial machine learning.
Summary
Adversarial machine learning is a rapidly emerging discipline driven by the objective of understanding how hackers can manipulate data to cause incorrect or undesired outcomes in machine learning applications. By exploring how attackers can exploit weaknesses in machine learning models, the emerging field of adversarial machine learning can help lead to solutions to address those weaknesses and other vulnerabilities being discovered.
Introducing more robust security into all stages of the machine learning life cycle is becoming increasingly important, and will enable those responsible for the design, development, implementation, and management of artificial intelligence and machine learning systems to continue to advance applications and identify new uses for these promising technologies.