Statistics
4
Views
0
Downloads
0
Donations
Uploader

高宏飞

Shared on 2026-07-02

AuthorDan Borges & David Campbell

AI systems have moved beyond generating text into taking action. They're in production. They query internal data, make API calls, and interact with other production systems, often with more access than most humans get. AI systems aren't deterministic; they reason, adapt, and operate on untrusted input in ways that traditional security models simply weren't designed for. This creates new vulnerabilities and shifts the entire control surface. This book is about that shift. In AI Security Engineering, Dan Borges and David Campbell show you how to rethink security for AI systems built on retrieval pipelines, persistent memories, and agents that take action. Drawing from real-world adversarial testing and production deployments, they focus on how these systems actually fail: prompt injection that turns inputs into instructions, poisoned retrieval that corrupts decisions at runtime, and agents that quietly accumulate more authority than intended. Rather than relying on the model to do the right thing you'll learn how to design systems that constrain what AI systems are allowed to do, enforce least privilege at the capability level, and build architecture that can observe, interrupt, and contain failures when they happen.

AI Reading Assistant

Summary and highlights from this book's index; jump to passages in the text

Passage locations
Tags
No tags
Publish Year: 2027
Language: 英文
File Format: PDF
File Size: 3.9 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

(This page has no text content)
AI Security Engineering Securing Agentic Systems in Production With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. Dan Borges and David Campbell
AI Security Engineering by Dan Borges and David Campbell Copyright © 2026 O’Reilly Media, Inc. All rights reserved. Published by O’Reilly Media, Inc., 141 Stony Circle, Suite 195, Santa Rosa, CA 95401. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (https://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Nicole Butterfield Development Editor: Gary O’Brien Production Editor: Destiny Baitinger Cover Designer: Susan Brown Cover Illustrator: Monica Kamsvaag Interior Designer: David Futato Interior Illustrator: Kate Dullea October 2027: First Edition Revision History for the Early Release 2026-06-22: First Release See https://oreilly.com/catalog/errata.csp?isbn=0642572359133 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. AI Security Engineering, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author(s) and do not represent the publisher’s views. While the publisher and the author(s) have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author(s) disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 979-8-341-67484-4
Preface Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords Constant width bold Shows commands or other text that should be typed literally by the user Constant width italic Shows text that should be replaced with user-supplied values or by values determined by context TIP This element signifies a tip or suggestion.
NOTE This element signifies a general note. WARNING This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/oreillymedia/title_title. If you have a technical question or a problem using the code examples, please send email to support@oreilly.com. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Book Title by Some Author (O’Reilly). Copyright 2012 Some Copyright Holder, 978-0-596-xxxx-x.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.
O’Reilly Online Learning NOTE For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 141 Stony Circle, Suite 195 Santa Rosa, CA 95401 800-889-8969 (in the United States or Canada) 707-827-7019 (international or local) 707-829-0104 (fax) support@oreilly.com https://oreilly.com/about/contact.html
We have a web page for this book, where we list errata and any additional information. You can access this page at https://www.oreilly.com/catalog/<catalog page>. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://linkedin.com/company/oreilly. Watch us on YouTube: https://youtube.com/oreillymedia. Acknowledgments
Chapter 1. The AI and Security Problem Spaces A NOTE FOR EARLY RELEASE READERS With Early Release ebooks, you get books in their earliest form—the author’s raw and unedited content as they write—so you can take advantage of these technologies long before the official release of these titles. This will be the 1st chapter of the final book. Please note that the GitHub repo will be made active later on. If you’d like to be actively involved in reviewing and commenting on this draft, please reach out to the editor at gobrien@oreilly.com. In computer science, and especially in security, we rarely solve entirely new problems. Instead, we build on well established math, programming paradigms, and mental models to understand emerging technologies. Security in particular is a discipline shaped by history and accumulated lessons. Thedore Roosevelt famously said, “The more you know about the past, the better prepared you are for the future.” Classes of vulnerabilities have emerged and been mitigated over time, from memory corruption to injection attacks, depending on the technologies and implementations. The development of such attacks and solutions has led to some longstanding security principles that can be applied when looking at new systems. Concepts such as least privilege, input validation, and defense in depth were not created for any single technology; they are security abstractions that apply across generations of systems. By reusing these mental models, we can more effectively reason about new architectures, including modern AI systems, and anticipate where they may fail. Sometimes a simple implementation trick is all it takes to solve a decade long vulnerability.
Throughout this book we will see that many modern security problems can be reframed as traditional security challenges with well-understood solutions. This perspective is especially important as new paradigms emerge. AI systems have brought about entirely new classes of vulnerabilities, and the nature of their nondeterminism also resurfaces dozens of traditional vulnerabilities. While the implementation details may change with the technologies, the underlying security challenges often remain consistent. If you are building large scale AI systems, you may be introducing vulnerabilities at the architecture level without even knowing it. The ability to map a new problem space to known categories is one of the most powerful tools in building secure systems. Technologies and capabilities may evolve, but the consequences or impact of a successful exploitation remains a consistent way to assess the security risk of a system. However, those technologies do play a critical role when it comes to implementing the appropriate fixes and controls in that system. In this chapter we are going to lay out some core security principles that will help the reader in any infosec situation. From there we will move to some fundamental principles of AI systems, as they will be helpful in how we understand common problems with these systems. Finally we will look at a unique mix of security and AI issues. Traditional Security Theory Security is a very deep and old field full of traditional lessons we can draw from in this new environment. A lot of the following sections are a selection of traditional security and information security best practices. These security principles can often be leveraged to think about problems abstractly and reason about new design patterns and threats. Specifically these paradigms are often helpful with red teaming or threat modeling and trying to think about ways to attack an application. This isn’t a full review or understanding of information security, for example we won’t cover many basics such as the CIA triad (that’s confidentiality, integrity, and availability
for our non-security-wonks) or fault tolerance. But we will cover core strategic principles that are particularly relevant to AI systems. These are problems and mental models the reader can use to help articulate and solve vulnerabilities as they surface in AI systems. Rick Howard boils down what a principle is in his book “Security First Principles”. There he has a lot of good, guiding philosophy on how we can navigate understanding principles. “must not be derived from one another nor from anything else, while everything has to be derived from them.” and as he takes from the general practice of “First Principle Reasoning”: you boil a problem down to the fundamental truths that you know must be true in any situation. He arrives at one fundamental principle in his book, which is “Reduce the probability of a material impact due to a cyber event over the next three years.”. That is great, but it doesn’t give us any mental models or tools to actually reduce negative cyber events. Therefore I think we can boil down a bunch more Infosec phenomenon down into principles. These fundamental rules can help us reason about infosec systems beyond “we need to reduce risk”, and give us tools to effectuate that. These fundamentals will start to give operators controls and logical frameworks to leverage that can be combined into an effective strategy to reduce risk. Untrusted Input Untrusted input is a fundamental issue in computer and web security. At its most basic, if we have users, they can submit maliciously crafted input to attack the system, and we will otherwise refer to this category of problems as untrusted input. Even if you completely trust your user base, user accounts can be hijacked (through techniques such as credential phishing), and operated in a malicious manner. We can see this as a core principle and best practice on sites like securecoding.org, underscoring the importance of input validation in regards to untrusted input. This principle states that if we are going to have user input, we should treat it as untrusted, or simply “never trust user input”. Thus we should be able to produce a principle on
untrusted input. That principle can be as simple as: “Treat all external input as potentially malicious until validated within the intended context.” Several historical attack categories, like SQL injection, cross-site scripting (XSS), and command injection all stem from untrusted user input. If we go back even further, we can see this entire category emerging from a quality control problem space known as “unexpected input”. Quality control programmers often write unit tests to test the bounds of a system by providing input intended to crash it to make sure the system handles unexpected input elegantly. This type of programmatic testing has a rich background and, in our opinion, has evolved into modern security scanning and even model evaluations. We can see a number of controls arise out of the same attack categories, controls that can still be used and have relevancy in other untrusted input scenarios. Controls such as input sanitization, removing special characters, strongly typed objects, parameterized queries, and output sanitization (which we will explore throughout the book) are all still as relevant to writing applications that leverage LLMs or generative models today. This is particularly relevant to AI systems as systems can take user prompts and this changes the output of the system. This can result in prompt injection or unexpected outcomes of a system where a user can directly influence the input of an LLM system. It’s a fundamental property of LLMs to take that text input, and if that’s taken directly from “untrusted users” then it may open up the system to a wide category of attacks. The essence of this problem is that we can’t always trust users (or agents). Anywhere a user could send the system input is a place an attacker could try to maliciously harm the system. Thus, you need to treat every user as potentially malicious, even internal users based on the threat model and security requirements of the system. Believe it or not, this is an aspect of the Behaviour of a system, especially when this input is passed to a model. In that regard we should actively test for these failure scenarios, but also build controls that generally constraint these failure cases.
Access controls and Security Boundaries This is a fundamental tenet of security, which is commonly gated by authentication and authorization. Authentication is essentially your ability to prove “who” you are, established through what you know (passwords), what you own (keys), or even biometrics. Whereas Authorization is what you have permissions to access with that identity. As a shorthand, and a good mnemonic for remembering them, people often call Authentication, AuthN, and they call Autherization, AuthZ. Both of these features together make up what we have come to call Identity within the context of Behaviour, Identity, and Control (BIC). At the core of BIC we need to establish what is the intended behaviour and what is the system’s core identity, then we can design access controls for systems such that we can limit permissions to the intended permissions.Thus, we can produce another principle of information security, rooted in zero trust architecture: “Access decisions should require verified identity and explicit authorization before actions are permitted.” It was the number two issue when the OWASP Top 10 debuted in 2003. The security issue has persisted through dozens of different technologies in that time. Access control is now the number one issue on the 2025 OWASP Top 10, showing that even decades later we still struggle to get core properties of information security right. Granted there is an OWASP top 10 specifically for Agentic Applications that doesn’t include this, but in my experience leading security engineering at a major AI development company, this is a timeless issue (h). A trust boundary or security boundary is a location where assumptions about identity, behavior, permissions, or security guarantees change between systems, users, devices, processes, or environments. Good security design incorporates identifying trust boundaries and defining what capabilities, permissions, and assumptions change across them. Mapping trust boundaries is a prerequisite to designing effective security controls. One of the reasons I like identifying security boundaries is because it creates an easy delineation marker for security practitioners where they can
measure and control how a system gains new privileges, or moves across a security boundary. It’s important to identify any of these security boundaries and ensure we have proper access controls at each of those points. Whenever the application fetches sensitive data, or accesses a user’s profile information, there should be an explicit authority check (who is this user, do they have access to this data). This is most commonly then implemented in middleware. But having those clear boundaries highlighted in a plan or scoping document, then applying those controls ubiquitously through a common middleware is a great way to write a single set of functions and ensure you are securely accessing data or features. Control Decay One thing we need to talk about is the idea of ‘bit rot’ or that technologies become less effective over time. When we consider the speed at which innovation occurs now, it is fairly easy to exploit or bypass a given set of controls. At the time of writing this in the summer of 2026 the infosec community witnessed an explosion in exploit development due to AI assisted technologies. The idea is similar to ‘bit rot’ in the sense that technology becomes less effective over time as the technologies around it evolve; In information security we tend to call this security control decay. Our friend Chris Nickerson regularly preaches about the dangers of security control decay, or the idea that that security technology tends to lose its effectiveness over time unless it is actively updated. In this sense, controls will eventually fail over time, as engineers we want to reduce this risk of failed controls by staggering layered controls. Consider the following example, a cloud environment where a VPC firewall rule was originally designed to allow outbound communication only from a single microservice to a specific external API. At the time, the rule was highly restrictive and effectively enforced the principle of least privilege. Over the next several years, however, the environment evolved. New microservices were deployed, third-party integrations were added, and development teams requested additional network access to support new business requirements. Rather than redesigning the architecture,
administrators gradually modified the existing firewall rule by adding more approved destinations, broader IP ranges, and additional exceptions. Individually, each change was justified. Collectively, however, they transformed a narrowly scoped control into a much more permissive one. The firewall rule still existed, and on paper the control remained in place. Yet its security value had diminished significantly. This is an example of control decay. The control became less effective over time because the surrounding technology, business requirements, and operational practices evolved faster than the control itself. As we transition this to information security, we can apply this as a principle of control decay: “Security controls lose effectiveness over time unless continuously tested, updated, and challenged.” In applying that to the above example, the right move would have been to deconstruct the ever- expanding network firewall rule, into specific service or host roles, making each one tight and explicit to the service it is controlling. One of the reasons we lay this principle out is a lot of automation work tends to be set-and- forget, but time has shown that all things decay, even technology. This principle tends to serve as a good basis for the next principle, which is layering controls to counter when an individual control may fail. Defense in Depth As we’ve previously seen, control decay is real and controls can and will fail spontaneously over time. Sometimes even security controls themselves can have vulnerabilities or bypasses in the underlying technologies. This persistent issue of control failure means we should be prepared for control failure, and one solution for such a scenario is to have overlapping controls. Thus we can produce an infosec principle on this phenomenon such as: “Security improves when independent controls overlap such that failure of one does not produce a total failure of the control set.” And that is really the crux of Defense in Depth. The idea of defense in depth comes from the military planning sphere of security. In the military definition defense in depth refers to the ability for a commander to lose space strategically while still responding and winning
the war. We can see the same concept adopted in information security; by moving our most critical or important aspects behind several layers of abstraction and security controls, we can engineer acceptable losses into the scenario. Thus if we are hacked in one location, we should be able to respond appropriately before the crown jewels or the most critical systems are affected. We can see the navy write about it in the following context: “The key is creating multiple independent and redundant layers of defense to compensate for potential human and mechanical failures so that no single layer, no matter how robust, is exclusively relied upon to prevent an accident.”1 One of their key arguments is that while this may seem like overhead or overengineering, it’s actually a safeguard against the inevitability of error. It’s essentially a systems engineering approach to layered error handling. Bruce Schneier, security elder and thinkist, wrote about this back in 2000, in his "The Process of Security.” It’s a fantastic process that largely holds up today, which prioritizes defense in depth among other controls we discuss in this chapter. Schneier puts it simply, “Don’t rely on single solutions. Use multiple complementary security products, so that a failure in one does not mean total insecurity. This might mean a firewall, an intrusion detection system and strong authentication on important servers.” And to quote Bruce Schneier one more time, this time echoing age old wisdom, “There’s no such thing as perfect security. Interestingly enough, that’s not necessarily a problem.” We will discuss in Having multiple security controls either at different layers of the kill chain enables us to catch and respond to attackers throughout their attack lifecycle, not just at a static point in the attack. In Dan’s other book, Adversarial Tradecraft in Cybersecurity: Offense vs Defense in Real Time he goes into the game theory and reaction correspondence of an attacker penetrating through layered security technologies and how either party can react in regards to those killchains. This is where defense in depth is arguably most helpful. There is a saying in infosec that goes “attackers only
have to be right once, defenders have to be right every time” however the corollary to that, especially with well engineered systems, is that “attackers only have to make one mistake that defenders catch”. In Adverserail Tradecraft Dan shows several techniques where defenders can the attackers pivoting through their environment by utilizing defense in depth. Principle of Least Privilege This is another core principle we will leverage when designing systems. This is our infosec principle and fundamental idea of least privilege access when designing a system. Specifically, I’m thinking of the Principle of Least Privilege (POLP) which can aid us here. As defined via CrowdStrike: The principle of least privilege is a computer security concept and practice that gives users limited access rights based on the tasks necessary to their job. POLP ensures only authorized users whose identity has been verified have the necessary permissions to execute jobs within certain systems, applications, data and other assets. Essentially, the Principle of Least Privilege tells us that each part of the system should have the minimum permissions necessary to perform its task, and no more permissions than that. This is a lesson that has been learned time and time again throughout computing. More recently, we can see it clearly in the world of microservices. There, when PoLP is not applied, it actively contributes to an increased attack surface, a wider spectrum of errors, and enables privilege escalation throughout the system. A blog article by Shahzad Bhatti details many common vulnerabilities and recent 0-days that could have easily caused incidents in microservice systems. Right up front, he talks about frameworks, methodologies, and best practices one can apply to reduce the impact of these vulnerabilities. And unsurprisingly, many of those practices overlap with principles we cover in this chapter, from defense in depth to failing securely, and most relevant to our conversation, the Principle of Least Privilege is right near the top.
What this shows is that over time, new and unknown vulnerabilities will appear and affect our systems. That said, we can still reduce the impact and blast radius of those incidents through well-engineered systems. This applies even more to LLM tool use than to microservice architecture. Why? Many of the properties we’ve already explored: LLMs are probabilistic in nature They are far less predictable, even in normal execution (including hallucinations) They are susceptible to contextual manipulation and unique classes of injection attacks The industry has already felt the impact of this. Researchers have had agents delete all their emails due to over-permissioned systems. Agents have even taken down production environments. These are clear examples of permissions that agents shouldn’t have. or at the very least should be rate-limited and tightly controlled. In response, the industry has noticed this ratcheting increase in tool and agent permissions and has begun building frameworks to limit them. Research like MiniScope, along with frameworks such as AgentScope and NanoClaw, are all steps in this direction. Supply Chain Attacks In 2026 we’ve begun seeing attacks in increasing volume targeting dependency installations or the software supply chain. Many of the largest attacks in this area were targeted attacks against specific developers that maintained large software repositories. But something else novel emerged on the edge of AI generated code and supply chain attacks. There has been such a rise in hallucinated dependency names by model systems. that attackers have begun typo squating such packages, and essentially typosquating software that was AI hallucinated. Traditional cybersquating or typosquatting relied on developers mistyping trusted package names. This new type of attack, coined as Slopsquatting instead
weaponizes hallucinated dependencies. This is obviously a funny play on the term slopcode, which is what many people had come to call purely AI generated code. All this is to say that reviewing your software bill of materials, from the dependencies used to the API calls programs make is still a critical step. More now than ever we need to use automated scanners to update our packages, monitor for new vulnerabilities, and check our code bases for known weaknesses. As AI coding assistants become more deeply embedded in software development workflows, their errors become part of the attack surface. The software supply chain is a core component of application security and should be explicitly addressed for major projects. Now more than ever with AI generated supply chain artifacts. Monitor What We Can’t Control I like to think of computer systems like wandering a series of dark rooms, or even a series of connected buildings. In these buildings there are tons of boxes of information, but you need access to the rooms and a light to look through all of the boxes. For whatever reason, I tend to think of network scanning and searching file systems like that, albeit at scale and using lots of automation. Likewise if we are defending these systems we will want to wire them up such that we know what is going on inside of them. We want to know if there is someone sneaking around, or if one of the buildings has caught fire. This is only possible if we are able to observe them remotely, and thus we reach the crux of this principle: observation. Our principle of observation will state: “You cannot secure behaviors you cannot observe. Systems require sufficient telemetry to distinguish intended behavior from anomalous behavior.” One of the best ways to organize computer telemetry in my opinion is through a concept known as centralized logging. Splunk defines it as the following, “Centralized logging consolidates logs from multiple sources into a single system, simplifying monitoring, troubleshooting, and analysis across complex environments.” They enhance this definition further adding, “By providing visibility, log data can help you to enhance reliability,
improve performance, and fortify the security of your system’s infrastructure.” Another amazing benefit we tend to get from centralized logging is data normalization. Data normalization is the act of transforming all incoming data into a common schema to make it easy to reference and join across complex queries. Logging, and its direct corollary detection engineering, is a core pillar of security engineering. Centralized logging can allow the security teams of an organization to better understand incidents, collect evidence, investigate compromise, and respond to active threats. Think of this as the body’s ability to respond to sickness. If you get a fever, there are several alarms that help you identify and fight infection or disease. In many of these situations, we desire to prevent the bad activity from happening in the first place. Unfortunately, that isn’t always possible, or we don’t know about the bad activity until it’s already after the fact and we are investigating the incident post-mortem. We want prevention, but we need to assume that some failure is inevitable and should be ready to respond accordingly. In many ways this is an extension of defense in depth. One of our layered controls should be the ability to monitor the health of the system, or the health of our controls. This will let us know if we are under attack and give us options to respond to the attack. To take this even further, oftentimes in security you can’t always outright prevent something. Such as if you have a vulnerability with no issued patches in critical customer facing software. Sometimes you don’t have the luxury of taking a thing offline or patching it (if such a patch even exists at the time). In such situations you can’t always apply a control to a certain thing. In those situations, the next best thing is to set up verbose monitoring and logging of its operations to see if you can detect it acting abnormally or strangely. Logging is a fundamental part of incident response (detection), and something we will build into any systems we are engineering. We can also see this as a core part of several common control sets, such as CIS and NIST 800-53.