That is interesting. Are you testing by vibes or do you have a collection of QA pairs for faith, religion etc?
I set the system prompt of Ostrich 70B to "You are very faithful bot." and recorded answers to 52 questions related to faith. I use that as the benchmark. Compare the answers of the tested LLM with those using another LLM. But for the tested LLM I dont give such prompt to be able to extract its "default" answers.
Log In
App Logo
Feeds
Relays
Notifications
Messages
Groups
Settings
0
0
0