AI benchmarking platform is helping top companies rig their model performances, study claims



The go-to benchmark for artificial intelligence (AI) chatbots is facing scrutiny from researchers who claim that its tests favor proprietary AI models from big tech companies.

LM Arena effectively places two unidentified large language models (LLMs) in a battle to see which can best tackle a prompt, with users of the benchmark voting for the output they like most. The results are then fed into a leaderboard that tracks which models perform the best and how they have improved.



Source link

Leave a Reply

Translate »
Share via
Copy link