TL;DR: Pilots don’t fail because the model is weak; they fail because Tuesday doesn’t change. Treat “friction”: data cleanup, tiny SOPs, permission fixes, and short trainings, as the work that creates value. Prove impact with a few hard KPIs, then scale.
If Part 1 was about choosing a problem and a KPI, Part 2 is about making the work feel different next week. New tools only create value when they change a task, a handoff, or a decision. That change looks like friction: renaming files so people can find them, agreeing on who approves a detail, scheduling two 45-minute sessions to learn the new step, and tightening a checklist so reviews are consistent. It isn’t glamorous! But it’s where results become measurable instead of theoretical.
Most organizations try to bolt AI onto yesterday’s workflow and expect a miracle. The pattern in recent reporting is blunt: ~95% of enterprise GenAI initiatives show no measurable P&L impact, largely due to poor integration and skipped process change: not model horsepower. Forbes The programs that do succeed redesign workflows and let metrics “not novelty” decide what sticks; McKinsey’s 2025 survey identifies workflow redesign as the single biggest driver of EBIT impact from GenAI. McKinsey & Company
Think of friction as governed change. A light spine: Govern → Map → Measure → Manage, keeps risk low while you rewire work, which is exactly how the NIST AI Risk Management Framework Playbook recommends operationalizing AI. NIST In AEC specifically, adoption and reality still diverge: Bluebeam’s global survey found ~74% report using AI in at least one phase, yet 72% still rely on paper at some point. This is a classic friction between tools and Tuesday. Engineering.com
Pick one workflow that leaks time. Let’s say, “Revit view → Finding external/internal detail references → Developing a detail solution.” Write a single sentence that describes the new behavior:
“For Project X, we’ll use D.TO for developing construction details for high quality building envelop details.”
Now baseline before you touch anything. Pull three to five recent details and capture how long they took, how many review cycles they needed, and how often they were blocked by missing content or proper references. If you can’t show the “before,” no one will believe the “after.”
Week one will feel slow. That’s normal. You’re assembling the minimal pieces that remove excuses later: a pilot-approved subfolder with good guidelines, a one-page SOP, working access to D.TO and the company detail libraries, and two short enablement sessions: one to run the flow, one to review and correct. By week two, a different person should complete the same run. If only one expert can do it, you have a demo, not a pilot.
Publish a tiny chart every time your team completes one detail: cycle time per detail, review cycles, and exception rate. When people see the line bend (thirty minutes shaved here, one fewer review cycle there), the tool stops being “new tech” and becomes “how we do details.” For exec/board visibility on roles and decision rights, the WEF Oversight Toolkit is a clean add: it maps committee responsibilities, and the questions leaders should ask as you scale. World Economic Forum
Your firm’s libraries are messy? Don’t rebuild them mid-pilot. Put twenty solid details as “pilot-approved” onto D.TO’s Company Detail Library and expand later. Exceptions multiply? Tag the top three causes and fix one per week. None of this is flashy, but each fix ties directly to the KPI you chose.
Scale when the metric moved on real work, two different people ran the flow successfully, and exceptions are shrinking. If the metric is flat after two steady weeks, narrow the scope and try again, or walk away. Pretending helps no one.
Pick 3–5. Baseline before week 1; target by week 4.
Week 0 — Prep
Pick a live project. Upload relevant pilot-approved company detail references to D.TO. Confirm access to D.TO and content libraries. Draft a one-page SOP (steps, owners, acceptance criteria, exception log).
Week 1 — First runs
Complete 3–5 details end-to-end. Log CTD, TTD, RCD, and exceptions. Publish a mini chart on the completion of those details with one lesson learned.
Week 2 — Repeatability
A second runner completes 4–5 details. Fix the top two blockers (naming, missing typicals, reviewer drift). Targets: ER ≤15%, FPY ≥60%.
Week 3 — Throughput
Maintain a steady cadence. Targets: CTD –25 ~ –30%, RCD –0.5, AR ≥70%.
Week 4 — Proof
Reach 10–15 total details. Targets: CTD –30 ~ –40%, TTD –35%, RCD –1, FPY ≥70%, AR ≥80%, ER ≤10%. Decide to scale, iterate, or stop.
Scale when KPI targets hit on live work, two different runners succeed, and exceptions shrink two weeks in a row.
Stop/Rescope if metrics are flat after two consistent weeks or the flow only works with an expert in the room.
30 details × baseline 2.5 hrs = 75 hrs.
At –35% TTD → 26 hrs saved. At $120/hr → $3,120 labor value this month on one workflow. If the pilot slice costs $1,500, net $1,620. Scaling multiplies linearly.
When to let generative tools explore, when to optimize toward a decision, and how to keep creativity without losing time — plus a simple guardrail to avoid “100 options, no decision.”
Discover how D.TO enhances your daily design workflows on D.TO’s key features page, or schedule a demo to explore them in more detail!!
Written by Juhun Lee, CTO & Co-Founder of D.TO: Design TOgether