What Actually Works for Me in Agentic Development

Published on January 22, 2026

When the AI-assisted development hype exploded, I jumped on the bandwagon like many developers. What followed was a journey of failed experiments, gradual insights, and ultimately a complete transformation in how I approach software development. This post shares what I learned, including the mistakes, so you can skip some of the painful trial and error.

From Autocomplete to Agentic Development

Phase 1: AI as Autocomplete (The Copilot Era)

My first attempt was with Copilots. I wrote the initial implementation by hand and let the AI handle refactoring and test generation. The approach seemed reasonable: I maintained control while offloading tedious work.

But I noticed a troubling pattern. When I asked Copilot to generate tests, the results looked great all green checkmarks, but they weren’t actually validating behavior. AI was writing what I call “pleasing tests”: tests designed to pass rather than to catch bugs.

Here’s what I mean:

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;

class DiscountServiceTest {

    // ❌ A "pleasing test" - validates nothing meaningful
    @Test
    void testCalculateDiscount() {
        Double result = calculateDiscount(100.0, 0.1);
        assertNotNull(result);
        // Always passes if method returns anything, tests nothing logic-wise
    }

    // ✅ A meaningful test - validates actual behavior
    @Test
    void testCalculateDiscountAppliesPercentage() {
        /* Test that a 10% discount on $100 returns $90. */
        double result = calculateDiscount(100.0, 0.1);
        
        // Assert result is 90.0 with a small delta (0.001) for floating-point precision
        assertEquals(90.0, result, 0.001); 
    }
}

This phase taught me an important lesson: AI assistants reflect the quality of their context. When I was writing code, that code provided rich context. When I stopped writing code, the context disappeared

Phase 2: The Failed Handoff

Excited by Claude Code’s capabilities, I tried to fully delegate implementation. I’d throw a task at the agent without much preparation and expect results.

These attempts failed spectacularly.

The problem wasn’t the AI’s capability, it was my approach. I was used to holding context in my head and in my code. Without that foundation, the AI had nothing to work with. Asking an AI to

implement user session management with appropriate timeouts

without context is like asking a contractor to “build something nice” without blueprints, site surveys, or requirements.

Phase 3: The PRD Approach (What Actually Works)

The breakthrough came when I realized my role had fundamentally shifted. I wasn’t a code writer anymore, I was a requirements architect. My job was to provide the context that AI needs to succeed.

I started creating Product Requirements Documents (PRDs) for every feature. Not the 50-page enterprise documents you might be imagining, but focused, practical specifications that give the AI everything it needs.

PRD Framework

A Product Requirements Document (PRD) is a specification that describes what you’re building, why you’re building it, and how you’ll know it works. In the context of AI-assisted development, it serves as the “context injection” that replaces the implicit knowledge you’d normally carry in your head.

Every PRD I write follows these rules:

Represents a complete, deliverable chunk of work - Not a partial feature that requires future work to function
Self-contained - No dependencies on PRDs that don’t exist yet
Includes complete data model definitions - If introducing new models, define them fully
Documents UX friction points - If the implementation creates friction, acknowledge it and request confirmation
Contains acceptance criteria - Specific, testable conditions that define “done”

Here’s a sample PRD for implement user session management with appropriate timeouts

# PRD: User Session Timeout

## Overview
Implement automatic session timeout after 30 minutes of inactivity to improve security 
compliance.

## Problem Statement
Currently, user sessions persist indefinitely, creating security vulnerabilities for shared 
workstations and compliance issues with SOC 2 requirements.

## Scope
- This PRD covers: session timeout detection, warning modal, 
  graceful logout
- This PRD does NOT cover: session extension via API activity,
  admin configuration of timeout duration

## Technical Context
- Framework: Next.js 14 with App Router
- Auth: NextAuth.js with JWT strategy
- State: Zustand for client state
- Existing session check: `useSession()` hook from NextAuth

## Data Model Changes
None required. Uses existing session infrastructure.

## Implementation Requirements

### 1. Inactivity Tracker
- Track last activity timestamp on: mouse move, keypress, 
  scroll, click
- Debounce activity updates to every 30 seconds (avoid 
  performance impact)
- Store timestamp in memory (not localStorage-security concern)

### 2. Timeout Warning Modal
- Display warning modal at 25 minutes of inactivity
- Show countdown timer: "Your session will expire in X:XX"
- Two buttons: "Stay Logged In" (resets timer), "Log Out Now"
- Modal is non-dismissable (no click-outside or escape key)

### 3. Automatic Logout
- At 30 minutes: clear session, redirect to /login
- Display toast on login page: "You were logged out due 
  to inactivity"
- Clear any sensitive data from client state

## UX Friction Acknowledgment
⚠️ Users in the middle of unsaved work will lose that work 
on timeout. This is intentional for security, but we should:
- Make the 5-minute warning prominent

## Acceptance Criteria
- [ ] Session times out after exactly 30 min of inactivity
- [ ] Warning modal appears at 25 min mark
- [ ] "Stay Logged In" resets the full 30-min timer
- [ ] Logout clears all auth state and sensitive client data
- [ ] Activity tracking doesn't impact performance (no jank)
- [ ] Works correctly across multiple browser tabs

## Out of Scope (Future PRDs)
- Configurable timeout duration
- Activity detection via API calls
- Draft auto-save before timeout

## Testing Notes
Manual testing required for:
- [ ] Timer accuracy over 30-minute period
- [ ] Behavior with multiple tabs open
- [ ] State cleanup completeness after logout

Before writing any code, I have a conversation with the AI about the PRD:

Share the PRD and ask the AI to identify gaps or ambiguities
Discuss the current application state - What already exists that’s relevant?
Refine together - The AI often catches edge cases I missed
Confirm readiness - “Based on this PRD and our discussion, do you have everything needed to implement this?”

This conversation typically takes 10-15 minutes but saves hours of back-and-forth during implementation.

The Validation Framework

Even with good PRDs, I still need to validate what the AI produces. Agents are efficient at generating functional code, but I remain responsible for:

Alignment with existing architecture and patterns
Non-functional requirements (security, performance, accessibility)
Long-term maintainability

Here’s the mental model I developed:

Validation tree

Identify knowledge gaps - Turn “unknown unknowns” into “known unknowns.” If the AI uses a pattern or library I don’t recognize, I flag it.
Only validate what I understand - If I can’t evaluate whether something is correct, I either learn enough to evaluate it or I don’t approve it.
Enter learning cycles when needed - Sometimes I need to spend 30 minutes understanding a concept before I can review 5 lines of code. That’s okay. That’s my job now.
Scrutinize exploitation points - Unnecessary network calls, file system access, new dependencies, environment variable usage. This is where bugs and security issues hide.
Always manually verify - Automated tests are necessary but not sufficient. I run the feature myself and try to break it. AI-generated tests often miss edge cases.
Keep changes small - Large changes are hard to validate and hard to roll back. I aim for PRDs that produce <500 lines of changes.

My Development Workflow

Here’s how everything fits together:

Agentic Development Workflow

These are not endorsements, just what currently works for me.

Tools by phases:

• Planning & Complex Tasks: Claude Code (CLI)
• Simple Changes & Bug Fixes: Antigravity or Copilot

Not every task needs the full PRD treatment:

Task Type	Approach	Tool
New feature	Full PRD → Agent conversation → Implementation	Claude Code
Complex refactor	PRD with before/after examples → Implementation	Claude Code
Bug fix	Direct description with reproduction steps	Antigravity
Simple change	Direct instruction	Antigravity

Managing Agent Context

Using multiple agents and sessions comes with trade-offs. Each session starts with a fresh context window, so keeping the agent aligned with the project structure is critical.

This is where AGENTS.md or any other agent documentation files come in. A few rules I learned the hard way:

Keep it small
Keep it simple
Keep it clear
Keep it updated

This is the template I currently use. It’s intentionally minimal and still evolving.


## Important Instructions

Instruction for agent to follow

## Project Overview 

2-3 lines

## Tech stack 

- **Platform:**
- **Language:**
- **UI:**
- **Data:**
- **Dependencies:**

## Commands

* Build
* Testing
* Project specific commands
* etc...


## Workflow

How to approach the PRD.

My Current Setup

I’ve optimized for cost efficiency and speed rather than collecting every possible tool.

Subscriptions

Claude Pro - For Claude Code and complex planning conversations
Google AI Pro - Primarily for Drive storage, but LLM access is a useful bonus

IDE Selection

JVM languages (Java, Kotlin, Scala): IntelliJ IDEA - Superior refactoring and type inference
Everything else: VS Code or Google Antigravity - Lighter weight, better extension ecosystem
CLI tasks: Claude Code - Best for agentic implementation from scratch

The key insight is matching the tool to the task. Claude Code excels at greenfield implementation and complex reasoning. IDE assistants excel at quick edits within existing context. Using Claude Code for a one-line bug fix is overkill; using Copilot to architect a new system is insufficient.

The Value Shift

I am not building custom MCPs. I am not fine-tuning local models. I am not orchestrating multi-agent swarms. As Andrej Karpathy notes, a new “programmable layer of abstraction” is forming-and frankly, I’m not aware of 90% of it. His tweet captures the reality well: the speed of change is immense, and we have to adapt.

I’ve never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There’s a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.
— Andrej Karpathy (@karpathy) December 26, 2025

As the cost of producing code hits zero, the value of verifying code hits an all-time high. Feeling “behind” on the latest tools is uncomfortable, but I’m choosing what actually works for me.

Fundamental Principles (What Hasn’t Changed)

Despite the paradigm shift, some principles remain constant:

I write the code, I own the code - AI is a tool, not an excuse. When something breaks in production, “the AI wrote it” is not a valid defense.
New code must not break existing functionality - Regression testing matters more than ever when code is generated quickly.
Small changesets - Easier to review, easier to validate, easier to revert. My target is PRDs that produce <500 lines of changes.
Consistent checkpoints - Frequent commits create rollback points. I commit after every validated PRD completion.

What I’m Building Now

I’m currently testing these methodologies on two projects:

Known tech stack - A TypeScript CLI application similar to systems I’ve built before. This tests whether the PRD approach improves velocity without sacrificing quality.
Unfamiliar tech stack - An iOS application. This tests whether the validation framework helps me learn effectively while still shipping working software.

Both projects use identical PRD templates and validation processes. I’m tracking metrics: time from PRD to merged code, defect rate, and time spent in learning cycles.

If you remember nothing else from this post:

Your role has shifted - From writing code to defining requirements. Embrace it.
Context is everything - AI without context produces garbage. PRDs are context injection.
Trust but verify - AI generates code faster than you can validate it. Slow down. Check everything.
Stay small - Large PRDs are hard to validate and easy to mess up. Break them down.
Keep learning - AI will use patterns you don’t know. Your job includes understanding them.

The shift to agentic development isn’t about writing less code, it’s about thinking more carefully about what code should exist. The developers who thrive in this new paradigm won’t be those who delegate blindly to AI, but those who learn to be effective requirements architects and rigorous validators.

My workflow is still evolving, and I’ll share updates as I gather more data from these projects. For now, this approach has helped me ship faster without surrendering responsibility, and that trade-off feels right.

ai software-development claude-code