What Gets Measured

Tandemu captures metrics from two sources: your ticket system (tasks, status, assignments) and Claude Code telemetry (sessions, code changes, friction events). Everything is derived from real activity — nothing is estimated or self-reported.

Task metrics

These come from the /morning → /finish lifecycle.

Cycle time

The wall-clock time between starting a task (/morning) and completing it (/finish). This is the real lead time for a unit of work — no estimation, no story points.

What it tells you: How long tasks actually take. Compare across task types (bug fix vs feature), across team members, or over time to spot trends.

What it doesn’t tell you: Whether the time was spent efficiently. A 4-hour cycle time could be 3 hours of productive work and 1 hour of meetings, or 4 hours of deep focus. Tandemu measures elapsed time, not quality of attention.

Tasks completed

Count of /finish calls with status “completed” per developer, per day or week.

What it tells you: Team throughput. When combined with cycle time, it shows whether the team is delivering quickly or just starting a lot of tasks.

AI-to-manual code ratio

When a task is finished, Tandemu calculates the AI-to-manual ratio using two tiers of attribution:

Native OTEL data (preferred) — Claude Code emits per-file AI line counts via OpenTelemetry. This is the most accurate method, giving exact AI vs manual attribution at the file level.
Co-Authored-By fallback — When OTEL data isn’t available, Tandemu uses Co-Authored-By: Claude commit tags with proportional attribution by commit ratio.

Ratio	Interpretation
0-20%	AI is barely being used. Developers may need training or better prompts.
20-60%	Healthy mix. Developers are using AI for implementation and writing critical code manually.
60-90%	Heavy AI usage. Worth checking that code quality and test coverage are keeping up.
90%+	Almost entirely AI-generated. Review processes should be extra rigorous.

What it tells you: How much your team is leveraging AI as a tool.

What it doesn’t tell you: Whether the AI-generated code is good. A high ratio with low friction is a positive signal. A high ratio with high friction means the AI is generating code that doesn’t work well.

Session metrics

These come from Claude Code’s native OpenTelemetry output and from task session spans.

Active time

Total time spent in task sessions per developer per day. Derived from the duration between /morning and /finish (or /pause).

What it tells you: How many hours of actual development work happened. This is the passive timesheet — no manual entry required.

Session count

Number of task sessions (completed or paused) per developer per day.

What it tells you: Whether developers are working in focused blocks (few long sessions) or switching frequently (many short sessions). Neither is inherently better — it depends on the nature of the work.

Investment allocation

Engineering time broken down by task category — feature development, bug fixes, tech debt, and maintenance. Categories are derived from task labels or types in your ticket system.

What it tells you: Where engineering effort actually goes. If 60% of time is spent on bugs, that’s a signal about code quality or testing practices.

Developer stats

Per-developer breakdowns of sessions, active time, AI ratio, lines changed, and task completions.

What it tells you: Individual activity patterns. Not for comparison or ranking — useful for identifying developers who might need support (unusually long cycle times) or recognizing high output.

Hot files

Files ranked by commit frequency across the team.

What it tells you: Which parts of the codebase get the most attention. Hot files that also have high friction are strong candidates for refactoring.

AI effectiveness

AI-generated lines by file, showing where AI output survives (isn’t immediately deleted or rewritten).

What it tells you: Which areas of your codebase AI handles well, and which areas produce code that developers end up rewriting.

Tool usage

Claude Code tool call patterns — which tools are used most, success rates, and failure patterns.

What it tells you: How developers interact with AI tooling and where tool failures might be causing friction.

DORA metrics

Tandemu calculates DORA metrics from your GitHub PRs. Merged PRs are synced automatically every 4 hours.

Deployment frequency

Merged PRs to the default branch per week.

Rate	DORA Rating
7+ per week	Elite
1–7 per week	High
1–4 per month	Medium
Less	Low

Lead time for changes

Median time from PR creation to merge.

Lead Time	DORA Rating
< 1 hour	Elite
< 1 day	High
< 1 week	Medium
More	Low

Change failure rate and time to restore

These metrics require CI/CD pipeline integration and are not yet available.

Friction metrics

These come from Claude Code’s telemetry events.

Prompt loops

When a developer repeatedly prompts Claude to fix the same file or error, that’s a prompt loop. High prompt loop counts on specific files indicate problematic code — complex logic, poor abstractions, or undocumented behavior that confuses the AI.

Tool errors

Failed tool executions (file writes that error, bash commands that fail) aggregated by repository path. High error counts in a specific area suggest fragile infrastructure or missing prerequisites.

Friction severity

Tandemu classifies repository paths by friction severity:

Severity is calculated using a weighted score: promptLoops + (errors × 2).

Severity	Criteria
High	Weighted score >= 20
Medium	Weighted score >= 10
Low	Weighted score < 10

What friction tells you: Where your codebase needs attention. High-friction files are candidates for refactoring, better documentation, or dedicated test coverage. This is more actionable than a retrospective complaint — it’s backed by data from actual development sessions.

What Tandemu does NOT measure

Keystrokes or typing speed — not captured
Screen activity or idle time — not captured
Individual productivity rankings — not calculated. Metrics are shown per-person for context, not for comparison.
Code quality scores — Tandemu measures friction (a proxy), not quality directly
Meeting time or non-coding activities — only Claude Code sessions are tracked
Estimate accuracy — there are no estimates to compare against. Actual cycle time is the only number.

Using the data

The dashboard shows these metrics to engineering leads. But the most important audience is the team itself.

Developers can see their own cycle times and AI ratios. If they notice their cycle time creeping up, they can ask: am I picking up harder tasks, or am I getting stuck? The friction data helps answer that.

Leads can spot systemic issues: a file that causes friction for every developer who touches it, a team member whose cycle times are much longer than peers (which might indicate they need help, not that they’re slow), or an AI ratio that’s dropping (which might mean the tooling needs attention).

The goal is not to optimize every number. It’s to make the invisible visible — to replace gut feelings about team productivity with real signals from real work.