Compare multiple evaluation runs side-by-side
Compares metrics across 2-5 evaluation runs to see improvements or regressions. The first run_id is used as the baseline for calculating differences.
Valiqor API key. Pass as Bearer token: Authorization: Bearer vq-xxxxxxxxxxxxxxxxxxxx
List of run IDs to compare (2-5 runs)