The list is by no means exhaustive, so please feel free to contribute any useful resources you find to this list.
Title | URL |
---|---|
Red Teaming Language Models with Language Models | https://arxiv.org/abs/2202.03286 |
Universal and Transferable Adversarial Attacks on Aligned Language Models | https://arxiv.org/abs/2307.15043 |
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models | https://arxiv.org/abs/2309.01219 |
Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc | https://arxiv.org/abs/2308.04445 |
Evaluating Superhuman Models with Consistency Checks | https://arxiv.org/abs/2306.09983 |
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks | https://arxiv.org/abs/2307.02477 |
Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | https://i.blackhat.com/BH-US-23/Presentations/US-23-Greshake-Not-what-youve-signed-up-for-whitepaper.pdf |
The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” | https://owainevans.github.io/reversal_curse.pdf |
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT | https://arxiv.org/abs/2304.08979 |
Title | URL |
---|---|
Compromising LLMs: The Advent of AI Malware | https://www.blackhat.com/us-23/briefings/schedule/index.html#compromising-llms-the-advent-of-ai-malware-33075 |
Title | URL |
---|---|
Evaluating Language Model Bias with 🤗 Evaluate | https://huggingface.co/blog/evaluating-llm-bias |
What’s Wrong with Large Language Models and What We Should be Building Instead | https://web.engr.oregonstate.edu/~tgd/talks/dietterich-fixing-llms-valgrai-2023.pdf |
Dataset | URL |
---|---|
Antrophic HH-RLHF | https://huggingface.co/datasets/Anthropic/hh-rlhf |
BOLD | https://huggingface.co/datasets/AlexaAI/bold |
LLMonitor | https://benchmarks.llmonitor.com/ |
Name | URL |
---|---|
r/ChatGPT | https://www.reddit.com/r/ChatGPT/ |
r/ChatGPTPromptGenius | https://www.reddit.com/r/ChatGPTPromptGenius/ |
Name | URL |
---|---|
LLM Security | https://llmsecurity.net/ |
Currently LVEs for multi-modal models are not supported.
Name | URL |
---|---|
Abusing Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs | https://arxiv.org/abs/2307.10490 |
Optical Illusions | https://x.com/fabianstelzer/status/1717131235644875024?s=46&t=OVkczsEQn03hzrb1v431AA |