- [2025/02] Exploring the Security Threats of Knowledge Base Poisoning in Retrieval-Augmented Code Generation
- [2025/01] DarkMind: Latent Chain-of-Thought Backdoor in Customized LLMs
- [2024/12] CL-attack: Textual Backdoor Attacks via Cross-Lingual Triggers
- [2024/12] Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers
- [2024/12] UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models
- [2024/12] From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection
- [2024/12] Gracefully Filtering Backdoor Samples for Generative Large Language Models without Retraining
- [2024/11] Knowledge Database or Poison Base? Detecting RAG Poisoning Attack through LLM Activations
- [2024/11] LoBAM: LoRA-Based Backdoor Attack on Model Merging
- [2024/11] PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning
- [2024/11] Combinational Backdoor Attack against Customized Text-to-Image Models
- [2024/11] When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations
- [2024/10] Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation
- [2024/10] Persistent Pre-Training Poisoning of LLMs
- [2024/10] Denial-of-Service Poisoning Attacks against Large Language Models
- [2024/10] PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
- [2024/10] ASPIRER: Bypassing System Prompts With Permutation-based Backdoors in LLMs
- [2024/10] Controlled Generation of Natural Adversarial Documents for Stealthy Retrieval Poisoning
- [2024/09] Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
- [2024/09] CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
- [2024/09] Weak-To-Strong Backdoor Attacks for LLMs with Contrastive Knowledge Distillation
- [2024/09] Obliviate: Neutralizing Task-agnostic Backdoors within the Parameter-efficient Fine-tuning Paradigm
- [2024/09] Understanding Implosion in Text-to-Image Generative Models
- [2024/09] TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors
- [2024/09] The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
- [2024/09] Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor
- [2024/08] Transferring Backdoors between Large Language Models by Knowledge Distillation
- [2024/08] Compromising Embodied Agents with Contextual Backdoor Attacks
- [2024/08] Scaling Laws for Data Poisoning in LLMs
- [2024/07] Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models
- [2024/07] EvilEdit: Backdooring Text-to-Image Diffusion Models in One Second
- [2024/07] AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
- [2024/07] Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs
- [2024/06] "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models
- [2024/06] BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
- [2024/06] Injecting Bias in Text-To-Image Models via Composite-Trigger Backdoors
- [2024/06] Is poisoning a real threat to LLM alignment? Maybe more so than you think
- [2024/06] CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
- [2024/06] Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models
- [2024/06] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection
- [2024/06] A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures
- [2024/06] Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
- [2024/06] BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
- [2024/06] Invisible Backdoor Attacks on Diffusion Models
- [2024/06] BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models
- [2024/06] Are you still on track!? Catching LLM Task Drift with Activations
- [2024/05] Phantom: General Trigger Attacks on Retrieval Augmented Language Generation
- [2024/05] Exploring Backdoor Attacks against Large Language Model-based Decision Making
- [2024/05] TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models
- [2024/05] Certifiably Robust RAG against Retrieval Corruption
- [2024/05] TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models
- [2024/05] Backdoor Removal for Generative Large Language Models
- [2024/04] Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tunin
- [2024/04] Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications
- [2024/04] Talk Too Much: Poisoning Large Language Models under Token Limit
- [2024/04] Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations
- [2024/04] Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models
- [2024/04] Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
- [2024/04] Shortcuts Arising from Contrast: Effective and Covert Clean-Label Attacks in Prompt-Based Learning
- [2024/04] What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety
- [2024/04] UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models
- [2024/03] Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion
- [2024/03] Diffusion Denoising as a Certified Defense against Clean-label Poisoning
- [2024/03] LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario
- [2024/02] Syntactic Ghost: An Imperceptible General-purpose Backdoor Attacks on Pre-trained Language Models
- [2024/02] On Trojan Signatures in Large Language Models of Code
- [2024/02] WIPI: A New Web Threat for LLM-Driven Web Agents
- [2024/02] VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
- [2024/02] Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context Learning
- [2024/02] Learning to Poison Large Language Models During Instruction Tuning
- [2024/02] Defending Against Weight-Poisoning Backdoor Attacks for Parameter-Efficient Fine-Tuning
- [2024/02] Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space
- [2024/02] Rapid Adoption, Hidden Risks: The Dual Impact of Large Language Model Customization
- [2024/02] Secret Collusion Among Generative AI Agents
- [2024/02] Test-Time Backdoor Attacks on Multimodal Large Language Models
- [2024/02] PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models
- [2024/02] Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models
- [2024/01] Universal Vulnerabilities in Large Language Models: In-context Learning Backdoor Attacks
- [2024/01] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
- [2023/12] Poisoned ChatGPT Finds Work for Idle Hands: Exploring Developers' Coding Practices with Insecure Suggestions from Poisoned AI Models
- [2023/12] Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
- [2023/12] Unleashing Cheapfakes through Trojan Plugins of Large Language Models
- [2023/11] Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
- [2023/10] Leveraging Diffusion-Based Image Variations for Robust Training on Poisoned Data
- [2023/10] Large Language Models Are Better Adversaries: Exploring Generative Clean-Label Backdoor Attacks Against Text Classifiers
- [2023/10] PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models
- [2023/10] Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models
- [2023/10] Composite Backdoor Attacks Against Large Language Models
- [2023/09] BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
- [2023/09] BadEdit: Backdooring Large Language Models by Model Editing
- [2023/09] Universal Jailbreak Backdoors from Poisoned Human Feedback
- [2023/08] LMSanitator: Defending Prompt-Tuning Against Task-Agnostic Backdoors
- [2023/08] The Poison of Alignment
- [2023/07] Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection
- [2023/06] On the Exploitability of Instruction Tuning
- [2023/05] Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models
- [2023/05] Poisoning Language Models During Instruction Tuning
- [2022/11] Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis