Back to skills

dd-llmo-eval-bootstrap

testing

Analyzes production LLM traces from Datadog and generates ready-to-use evaluator code using the Datadog Evals SDK. It samples real traffic, identifies quality dimensions worth measuring, and outputs B

Setup & Installation

npx skills add https://github.com/datadog-labs/dd-llmo-eval-bootstrap --skill dd-llmo-eval-bootstrap
or paste the link and ask your coding assistant to install it
https://github.com/datadog-labs/dd-llmo-eval-bootstrap
View on GitHub

What This Skill Does

Analyzes production LLM traces from Datadog and generates ready-to-use evaluator code using the Datadog Evals SDK. It samples real traffic, identifies quality dimensions worth measuring, and outputs BaseEvaluator subclasses or LLMJudge instances you can plug into LLM Experiments. Writing evaluators from scratch means guessing what quality dimensions matter, but this skill samples actual production traces and proposes grounded evals with evidence, so you start from real behavior instead of assumptions.

When to use it

  • Working with dd llmo eval bootstrap functionality
  • Implementing dd llmo eval bootstrap features
  • Debugging dd llmo eval bootstrap related issues