Simple LLM as a Judge Evaluation
Evaluate your prompt's performance using an LLM as a judge with various metrics and example data.
LLM as Judge Evaluation Task
Define the prompt and data for evaluation. Results are pre-populated.
Evaluation Results
Detailed breakdown of your prompt's performance.
Overall Score
82.75
JSON Parsing Success
97%
Scores per Dimension
Score Distribution (Frequency)
Individual Example Scores
| Example ID / Index | Score |
|---|---|
| Sample Task 1 | 85 |
| Sample Task 2 | 72 |
| Sample Task 3 | 92 |
| Sample Task 4 | 70 |
| Sample Task 5 | 90 |
| Sample Task 6 | 78 |