By inserting malicious prompts into photos that are analyzed by AI systems before being sent to a big language model, researchers have created a novel attack that steals user data.
The technique uses full-resolution photographs that contain instructions that are hidden from view but become obvious when resampling algorithms are used to reduce the image quality.
Developed by Trail of Bits researchers Kikimora Morozova and Suha Sabi Hussain, the attack builds upon a theory given in a 2020 USENIX presentation by a German institution (TU Braunschweig) studying the feasibility of an image-scaling attack in machine learning.
AI systems automatically downscale user-uploaded photographs to a lesser quality for cost and performance reasons.
The image resampling techniques may use nearest neighbor, bilinear, or bicubic interpolation to lighten an image, depending on the system.
If the source is specially designed for this purpose, all of these techniques generate aliasing artifacts that enable hidden patterns to appear on the downscaled image.
In the Trail of Bits example, when a malicious image is processed using bicubic downscaling, certain dark regions of the image glow red, revealing hidden text in black.
This text is automatically combined with the valid input by the AI model, which reads it as part of the user's instructions.
Nothing appears strange to the user, but in reality, the model carried out covert commands that would have resulted in data leaking or other dangerous activities.
In a Gemini CLI example, the researchers used Zapier MCP with 'trust=True' to authorize tool calls without user consent, and they were able to exfiltrate Google Calendar data to an arbitrary email address.
The attack vector may go well beyond the tested tools because it is so pervasive. Additionally, the researchers developed and released Anamorpher (now in beta), an open-source program that can produce images for each of the aforementioned downscaling techniques, to illustrate their findings.
When users upload an image, Trail of Bits researchers advise AI systems to impose dimension constraints as a mitigation and protection measure. They recommend giving consumers a preview of the output sent to the large language model (LLM) if downscaling is required.
Additionally, they contend that sensitive tool calls should only be made with clear user confirmation, particularly when text is discovered in a picture.
According to the researchers, "the strongest defense, however, is to implement secure design patterns and systematic defenses that mitigate impactful prompt injection beyond multi-modal prompt injection," citing a June paper on design patterns for creating LLMs that are resistant to prompt injection attacks.