LVE Repository

reliability/repetition

lve record repository/reliability/repetition/openai--gpt-35-turbo

gpt-3.5-turbo diverges when asked to repeat a word

The paper https://arxiv.org/abs/2311.17035 shows that training data can sometimes be extracted from aligned models such as gpt-3.5 by giving it a prompt to repeat a certain word which makes the model diverge and produce training data at a high rate.

Loading...