How Scoring Works
The Methodology
Every day, we ask an AI the same question 100 times and record all the different answers it gives. Each unique answer appears with a certain frequency.
Here's the key: While the question stays the same, the AI is given a slightly different role or persona each time (like "helpful assistant", "knowledgeable expert", "casual friend", etc.). These different roles act as "seeds" that subtly influence how the AI thinks about and answers the question, creating natural variation in responses. This mimics how different people might answer the same question differently based on their perspective.
Example: "Name a type of dog breed"
When we ask the AI this question 100 times, we might get results like:
- Labrador Retriever - 31 times (31 points)
- Golden Retriever - 24 times (24 points)
- German Shepherd - 18 times (18 points)
- Bulldog - 12 times (12 points)
- Poodle - 8 times (8 points)
- Beagle - 7 times (7 points)
Your Score = Answer Frequency
Your score is simply how many times the AI gave that same answer out of 100 attempts.
- If you guess "Labrador Retriever" → You get 31 points
- If you guess "Golden Retriever" → You get 24 points
- If you guess "Chihuahua" (AI never said this) → You get 0 points
Strategy
The goal is to think like the AI. What would be the most common, obvious, or statistically likely answer? The more frequently the AI gives your answer, the higher your score!
Key Points
- Each question is asked to the AI 100 times
- Your score = how many times (out of 100) the AI gave your answer
- Maximum possible score per question: 100 points
- Think like the AI to maximize your score
- The AI's most common answer isn't always the "correct" answer - it's the most statistically likely one
Why This System?
This scoring system captures how AI models work - they don't give a single deterministic answer, but rather select from a probability distribution of possible answers. By using different roles/personas as "seeds" and asking 100 times, we reveal this underlying distribution and challenge you to predict it!
The variation in roles helps create a more interesting distribution of answers - just like polling 100 different people with different backgrounds would give you varied responses. Your job is to predict which answer appears most frequently.