🤔 Just How Easy Is It to Poison a LLM?

Much has already been said about LLMs’ tendency to “hallucinate” when asked questions the model doesn’t know how to answer properly, AIs inherit biases due to their training sets, or LLMs being trained on malicious datasets. The team at Mithril Security took a different approach. It demonstrated how easy it is to poison the pre-trained model of an LLM with surgical precision by using a technique called Rank-One Model Editing. In a nutshell, the approach teaches an existing pre-trained model to respond to specific prompts (in the example, “Who was the first man on the moon?”) with false information. The model is then uploaded to public repositories (Hugging Face) and thus spread to the wider community. Mithril Security’s work highlights the importance of ensuring the provenance of your models – and the models your vendors use. A whole new set of headaches to add to your AI implementation strategy.