Coracle

That is interesting. Are you testing by vibes or do you have a collection of QA pairs for faith, religion etc?

1 relay

I set the system prompt of Ostrich 70B to "You are very faithful bot." and recorded answers to 52 questions related to faith. I use that as the benchmark. Compare the answers of the tested LLM with those using another LLM. But for the tested LLM I dont give such prompt to be able to extract its "default" answers.

1 relay