Simple Ground Truth LLM Evaluation
Evaluate your prompt's performance by comparing its output against a provided ground truth dataset.
Ground Truth Evaluation Task
Define the prompt and upload ground truth data for evaluation. Results are pre-populated.
Ground Truth Evaluation Results
Detailed breakdown of your prompt's performance against ground truth.
Overall Score
88.20
Ground Truth JSON Parsing
99%
Scores per Dimension
Score Distribution (Frequency)
Individual Example Scores
| Ground Truth Item ID / Index | Score |
|---|---|
| Ground Truth Item 1 | 90 |
| Ground Truth Item 2 | 85 |
| Ground Truth Item 3 | 95 |
| Ground Truth Item 4 | 80 |
| Ground Truth Item 5 | 96 |