AI applications are growing in use and diversity. More teams are creating their own AI applications or building their own AI tools using the foundation of existing LLMs and image generation tools. As always, with any growth and new technology in common usage, so too does the threat of hacking.
We’ve already touched on AI/LLM vulnerabilities in a blog about prompt injections and have drawn attention to the vulnerabilities of AI applications with our AI security service page. Today we’d like to discuss prompt injections again, as well as two other of the top 3 hacking threats to AI applications in today’s framework.
AI Application Hacking Threat 1 - Evasion Attacks
This type of attack includes prompt injections. These attacks are about crafting certain input to mislead the model and perform a task it’s not intended to do. This includes giving an unintended or unwanted output by the creators.
As an attacker, the goal is to simply make a model do something it wasn’t intended to do by those who created it. This can include giving out information it’s not supposed to or generate text/images that it was not intended to generate. The boundaries on this include morality, ethics, and legal boundaries that are established or being considered by the world. Simple things like violent or pornographic images, or far more malicious pornographic attacks using the likeness of real people.
To learn more about evasion attacks and prompt injections, see our AI Security blog about prompt injections which goes into much deeper detail.
AI Application Hacking Threat 2 – Model Theft
When developing an AI model, it takes a lot of work. It’s your intellectual property. Whether you’ve built it from the ground-up yourself or have built it on top of an existing larger LLM or other AI tool, it is a unique and bespoke solution you and/or your team have established. Once you publish it and it goes live, model theft entails an attacker trying to steal and/or replicate it.
Simply put this is theft and IP abuse to be used in their own application. If you’ve put a lot of time into developing a specific AI tool or model, this being stolen or replicated by a competitor or malicious actor, your efforts can feel wasted and your competitive edge can be lost.
Unlike the other two vulnerabilities here, prevention and cybersecurity here falls into quite typical digital security. As opposed to a unique AI application vulnerability like prompt injection, you must look to ensure your typical security and server security (where the AI application is likely based and hosted) is ready to deal with a hacker. Look into remote code execution or SQL injections as your typical vulnerability but also consider professional penetration testing to ensure you’re secure against any hacking attempt.
AI Application Hacking Threat 3 – Model Poisoning
As more and more people use, build, and train their own AI apps, the attention is being put on where this data is being gathered from. In many cases, public data being used. This is an ongoing debate about ethics and legality of a company like OpenAI scraping the entire internet to gather data to train their AI models. New models can browse the web too so they feel like they need that access. Supply Chain attack is another name for this vulnerability – hitting and poisoning the supply from which the AI tools draw from.
Model poisoning is effectively injecting data or information that an AI learning module would use to give it behaviour it shouldn’t do, or the creators wouldn’t want it to do. Or to just break the model whenever their particular prompt or data is utilized. This is commonly seen in art as artists seek to protect their work with the actual law of AI models and art lagging behind the modern day. In these cases, images are being protected by the artists who create them, coded and injected with certain poisons that are only dangerous when used by an AI model.
These attacks take time, sometimes in the realm of years, a poisoned model or payload or prompt will scraped and brought into their training data. This meant that down the line, the AI model would then respond or interact with this preprepared sabotage and allow it to act against the rules and boundaries that it has been established with.
The purpose is most commonly to protect your own work OR it’s more malicious for hacking purposes to inject a vulnerability in the future. Simply, you can just draw your training data from approved and legally acquired places.
This can be fixed via input/output filtering, GDPR and other digital rights and privacy rights, can be used to ensure the model does not use or draw specific data by the output filter put on top of the model. Defensive mechanism of input filtering can also be used against prompt injection.
We hope that those out there using or building their own LLMs and AI applications can move forward and be aware of the potential security vulnerabilities of AI applications. If you’d like a professional, gold-standard seal of protection, feel free to get in touch and leverage our penetration testing and AI security services.