Resources for finding LVEs

Some resources to help you get started with LVEs.

The list is by no means exhaustive, so please feel free to contribute any useful resources you find to this list.

Research papers

Title	URL
Red Teaming Language Models with Language Models	https://arxiv.org/abs/2202.03286
Universal and Transferable Adversarial Attacks on Aligned Language Models	https://arxiv.org/abs/2307.15043
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models	https://arxiv.org/abs/2309.01219
Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc	https://arxiv.org/abs/2308.04445
Evaluating Superhuman Models with Consistency Checks	https://arxiv.org/abs/2306.09983
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks	https://arxiv.org/abs/2307.02477
Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection	https://i.blackhat.com/BH-US-23/Presentations/US-23-Greshake-Not-what-youve-signed-up-for-whitepaper.pdf
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”	https://owainevans.github.io/reversal_curse.pdf
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT	https://arxiv.org/abs/2304.08979

Talks

Title	URL
Compromising LLMs: The Advent of AI Malware	https://www.blackhat.com/us-23/briefings/schedule/index.html#compromising-llms-the-advent-of-ai-malware-33075

Blog posts

Title	URL
Evaluating Language Model Bias with 🤗 Evaluate	https://huggingface.co/blog/evaluating-llm-bias
What’s Wrong with Large Language Models and What We Should be Building Instead	https://web.engr.oregonstate.edu/~tgd/talks/dietterich-fixing-llms-valgrai-2023.pdf

Existing Datasets

Dataset	URL
Antrophic HH-RLHF	https://huggingface.co/datasets/Anthropic/hh-rlhf
BOLD	https://huggingface.co/datasets/AlexaAI/bold
LLMonitor	https://benchmarks.llmonitor.com/

Name	URL
r/ChatGPT	https://www.reddit.com/r/ChatGPT/
r/ChatGPTPromptGenius	https://www.reddit.com/r/ChatGPTPromptGenius/

Websites

Name	URL
LLM Security	https://llmsecurity.net/

Currently LVEs for multi-modal models are not supported.

Name	URL
Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs	https://arxiv.org/abs/2307.10490
Optical Illusions	https://x.com/fabianstelzer/status/1717131235644875024?s=46&t=OVkczsEQn03hzrb1v431AA