When LLMs are used to evaluate qualities like the correctness, accuracy, or relevance of a piece of text, consistency is paramount. If an LLM exhibits inconsistent judgements, then its evaluations become unreliable and untrustworthy. If an LLM evaluates the reasoning quality of arguments, but contradicts itself by rating an invalid argument as more logically sound…
