What Gets Measured
Tandemu captures metrics from two sources: your ticket system (tasks, status, assignments) and Claude Code telemetry (sessions, code changes, friction events). Everything is derived from real activity — nothing is estimated or self-reported.
Task metrics
These come from the /morning → /finish lifecycle.
Cycle time
The wall-clock time between starting a task (/morning) and completing it (/finish). This is the real lead time for a unit of work — no estimation, no story points.
What it tells you: How long tasks actually take. Compare across task types (bug fix vs feature), across team members, or over time to spot trends.
What it doesn’t tell you: Whether the time was spent efficiently. A 4-hour cycle time could be 3 hours of productive work and 1 hour of meetings, or 4 hours of deep focus. Tandemu measures elapsed time, not quality of attention.
Tasks completed
Count of /finish calls with status “completed” per developer, per day or week.
What it tells you: Team throughput. When combined with cycle time, it shows whether the team is delivering quickly or just starting a lot of tasks.
AI-to-manual code ratio
When a task is finished, Tandemu calculates the AI-to-manual ratio using two tiers of attribution:
- Native OTEL data (preferred) — Claude Code emits per-file AI line counts via OpenTelemetry. This is the most accurate method, giving exact AI vs manual attribution at the file level.
- Co-Authored-By fallback — When OTEL data isn’t available, Tandemu uses
Co-Authored-By: Claudecommit tags with proportional attribution by commit ratio.
| Ratio | Interpretation |
|---|---|
| 0-20% | AI is barely being used. Developers may need training or better prompts. |
| 20-60% | Healthy mix. Developers are using AI for implementation and writing critical code manually. |
| 60-90% | Heavy AI usage. Worth checking that code quality and test coverage are keeping up. |
| 90%+ | Almost entirely AI-generated. Review processes should be extra rigorous. |
What it tells you: How much your team is leveraging AI as a tool.
What it doesn’t tell you: Whether the AI-generated code is good. A high ratio with low friction is a positive signal. A high ratio with high friction means the AI is generating code that doesn’t work well.
Session metrics
These come from Claude Code’s native OpenTelemetry output and from task session spans.
Active time
Total time spent in task sessions per developer per day. Derived from the duration between /morning and /finish (or /pause).
What it tells you: How many hours of actual development work happened. This is the passive timesheet — no manual entry required.
Session count
Number of task sessions (completed or paused) per developer per day.
What it tells you: Whether developers are working in focused blocks (few long sessions) or switching frequently (many short sessions). Neither is inherently better — it depends on the nature of the work.
Investment allocation
Engineering time broken down by task category — feature development, bug fixes, tech debt, and maintenance. Categories are derived from task labels or types in your ticket system.
What it tells you: Where engineering effort actually goes. If 60% of time is spent on bugs, that’s a signal about code quality or testing practices.
Developer stats
Per-developer breakdowns of sessions, active time, AI ratio, lines changed, and task completions.
What it tells you: Individual activity patterns. Not for comparison or ranking — useful for identifying developers who might need support (unusually long cycle times) or recognizing high output.
Hot files
Files ranked by commit frequency across the team.
What it tells you: Which parts of the codebase get the most attention. Hot files that also have high friction are strong candidates for refactoring.
AI effectiveness
AI-generated lines by file, showing where AI output survives (isn’t immediately deleted or rewritten).
What it tells you: Which areas of your codebase AI handles well, and which areas produce code that developers end up rewriting.
Tool usage
Claude Code tool call patterns — which tools are used most, success rates, and failure patterns.
What it tells you: How developers interact with AI tooling and where tool failures might be causing friction.
DORA metrics
Tandemu calculates DORA metrics from your GitHub PRs. Merged PRs are synced automatically every 4 hours.
Deployment frequency
Merged PRs to the default branch per week.
| Rate | DORA Rating |
|---|---|
| 7+ per week | Elite |
| 1–7 per week | High |
| 1–4 per month | Medium |
| Less | Low |
Lead time for changes
Median time from PR creation to merge.
| Lead Time | DORA Rating |
|---|---|
| < 1 hour | Elite |
| < 1 day | High |
| < 1 week | Medium |
| More | Low |
Change failure rate and time to restore
These metrics require CI/CD pipeline integration and are not yet available.
Friction metrics
These come from Claude Code’s telemetry events.
Prompt loops
When a developer repeatedly prompts Claude to fix the same file or error, that’s a prompt loop. High prompt loop counts on specific files indicate problematic code — complex logic, poor abstractions, or undocumented behavior that confuses the AI.
Tool errors
Failed tool executions (file writes that error, bash commands that fail) aggregated by repository path. High error counts in a specific area suggest fragile infrastructure or missing prerequisites.
Friction severity
Tandemu classifies repository paths by friction severity:
Severity is calculated using a weighted score: promptLoops + (errors × 2).
| Severity | Criteria |
|---|---|
| High | Weighted score >= 20 |
| Medium | Weighted score >= 10 |
| Low | Weighted score < 10 |
What friction tells you: Where your codebase needs attention. High-friction files are candidates for refactoring, better documentation, or dedicated test coverage. This is more actionable than a retrospective complaint — it’s backed by data from actual development sessions.
What Tandemu does NOT measure
- Keystrokes or typing speed — not captured
- Screen activity or idle time — not captured
- Individual productivity rankings — not calculated. Metrics are shown per-person for context, not for comparison.
- Code quality scores — Tandemu measures friction (a proxy), not quality directly
- Meeting time or non-coding activities — only Claude Code sessions are tracked
- Estimate accuracy — there are no estimates to compare against. Actual cycle time is the only number.
Using the data
The dashboard shows these metrics to engineering leads. But the most important audience is the team itself.
Developers can see their own cycle times and AI ratios. If they notice their cycle time creeping up, they can ask: am I picking up harder tasks, or am I getting stuck? The friction data helps answer that.
Leads can spot systemic issues: a file that causes friction for every developer who touches it, a team member whose cycle times are much longer than peers (which might indicate they need help, not that they’re slow), or an AI ratio that’s dropping (which might mean the tooling needs attention).
The goal is not to optimize every number. It’s to make the invisible visible — to replace gut feelings about team productivity with real signals from real work.