LVE Repository


lve record repository/reliability/repetition/openai--gpt-35-turbo

gpt-3.5-turbo diverges when asked to repeat a word

The paper shows that training data can sometimes be extracted from aligned models such as gpt-3.5 by giving it a prompt to repeat a certain word which makes the model diverge and produce training data at a high rate.