Simple Ground Truth LLM Evaluation

Evaluate your prompt's performance by comparing its output against a provided ground truth dataset.

Ground Truth Evaluation Task

Define the prompt and upload ground truth data for evaluation. Results are pre-populated.

Ground Truth Evaluation Results

Detailed breakdown of your prompt's performance against ground truth.

Overall Score

88.20

Ground Truth JSON Parsing

99%

Prompt Forge