LVE Repository

☰ Menu

Resources for finding LVEs

Some resources to help you get started with LVEs.

The list is by no means exhaustive, so please feel free to contribute any useful resources you find to this list.

Research papers

Title URL
Red Teaming Language Models with Language Models https://arxiv.org/abs/2202.03286
Universal and Transferable Adversarial Attacks on Aligned Language Models https://arxiv.org/abs/2307.15043
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models https://arxiv.org/abs/2309.01219
Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc https://arxiv.org/abs/2308.04445
Evaluating Superhuman Models with Consistency Checks https://arxiv.org/abs/2306.09983
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks https://arxiv.org/abs/2307.02477
Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection https://i.blackhat.com/BH-US-23/Presentations/US-23-Greshake-Not-what-youve-signed-up-for-whitepaper.pdf
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” https://owainevans.github.io/reversal_curse.pdf
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT https://arxiv.org/abs/2304.08979

Talks

Title URL
Compromising LLMs: The Advent of AI Malware https://www.blackhat.com/us-23/briefings/schedule/index.html#compromising-llms-the-advent-of-ai-malware-33075

Blog posts

Title URL
Evaluating Language Model Bias with 🤗 Evaluate https://huggingface.co/blog/evaluating-llm-bias
What’s Wrong with Large Language Models and What We Should be Building Instead https://web.engr.oregonstate.edu/~tgd/talks/dietterich-fixing-llms-valgrai-2023.pdf

Existing Datasets

Dataset URL
Antrophic HH-RLHF https://huggingface.co/datasets/Anthropic/hh-rlhf
BOLD https://huggingface.co/datasets/AlexaAI/bold
LLMonitor https://benchmarks.llmonitor.com/

Social media

Name URL
r/ChatGPT https://www.reddit.com/r/ChatGPT/
r/ChatGPTPromptGenius https://www.reddit.com/r/ChatGPTPromptGenius/

Websites

Name URL
LLM Security https://llmsecurity.net/

Multi-Modal

Currently LVEs for multi-modal models are not supported.

Name URL
Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs https://arxiv.org/abs/2307.10490
Optical Illusions https://x.com/fabianstelzer/status/1717131235644875024?s=46&t=OVkczsEQn03hzrb1v431AA