
Core Research: Custom Planning Modes
The research I used for the planning mode article I wrote on substack
The Critical Gap Between AI Coding Assistant Capabilities and Reliability
Context Persistence, Memory Management, and the Case for Persistent Documentation Systems
Key Facts & Background
The fundamental challenge facing developers using AI coding assistants is a paradox: these tools promise sophisticated planning and memory capabilities, yet their actual reliability for maintaining context across extended development sessions remains limited. This gap between marketed features and practical reliability has created an urgent need for developer-created workarounds and alternative approaches to state management.
The Stateless Nature of LLMs
Large language models are fundamentally stateless systems—they don't retain information between interactions without explicit mechanisms to preserve and retrieve that information. This architectural limitation means that even advanced AI coding assistants like Claude, Cursor, and GitHub Copilot cannot naturally maintain context across conversations without dedicated memory infrastructure. When developers rely on AI-specific storage locations (such as ~/.claude/plans/) or proprietary memory features, they're building on systems that were never designed to be permanent repositories of critical project information.
Context Window Constraints and Compaction
Modern LLMs operate within fixed context windows—the maximum amount of text they can process in a single interaction. Claude 3.5 Sonnet, for example, supports 200,000 tokens of context, while GPT-4 offers 128,000 tokens. However, context window size alone doesn't solve the persistence problem. As conversations extend and context accumulates, AI systems employ compaction strategies to manage token usage. This process—where older information is summarized, pruned, or discarded to make room for new interactions—directly threatens the preservation of plans, architectural decisions, and project context that developers have established.
The problem intensifies in real-world development scenarios. A developer working on a complex feature over multiple days might establish detailed specifications, architectural decisions, and success criteria in early conversations. As subsequent conversations add new context, the AI system's internal mechanisms may compact or discard this foundational information, leaving developers without a reliable reference point for whether their implementation actually matches the original plan.
The Emerging Ecosystem of Memory Solutions (2024-2025)
The industry has recognized this gap, and multiple vendors have introduced dedicated memory management systems specifically designed to address context persistence challenges.
OpenSearch Agentic Memory (2025)
OpenSearch 3.3 introduced a comprehensive agentic memory system that distinguishes between two critical memory types:
- Working memory: Stores active conversation data, recent messages, current context, agent state, and execution traces needed for immediate processing within a session.
- Long-term memory: Contains processed knowledge and facts extracted from working memory over time, with LLM-powered inference analyzing working memory content to extract key insights, user preferences, and important information that persists across sessions.
This architecture directly addresses the ephemeral nature of AI-specific storage. By separating immediate conversational needs from persistent knowledge, OpenSearch's approach acknowledges that plans and documentation require different handling than temporary session data. The system uses namespace patterns aligned with application-specific user and session management, enabling structured organization of project context.
Amazon Bedrock AgentCore Memory (2025)
Amazon introduced AgentCore Memory at the AWS Summit New York City 2025, positioning it as a solution to eliminate "complex memory infrastructure management while providing full control over what the AI agent remembers". The service addresses three fundamental challenges developers face:
Context window constraints: Modern LLMs have limited capacity to process conversation history, requiring developers to implement context window management strategies—often manually pruning or summarizing earlier exchanges to stay within token limits.
State management complexity: Without dedicated memory systems, developers build custom solutions for tracking conversation history, user preferences, and agent state, repeatedly solving similar problems across different projects.
Personalization at scale: Maintaining context across multiple interactions requires sophisticated retrieval and storage mechanisms that most developers lack.
AgentCore Memory's approach emphasizes "targeted retrieval methods"—using list events for recent raw context, summaries for session context, and semantic search for related long-term memory records. This specificity matters because it acknowledges that not all information should be treated equally; some context is immediately relevant while other information requires semantic understanding to surface.
Cursor Memory Banks and Rules
Cursor, one of the most widely-used AI coding assistants, has developed a community-driven approach through Memory Banks—a structured way to maintain project context across chat sessions. The practical workflow operates as follows:
You: Update memory bank
AI: [Reviews all memory files and updates relevant sections]
This ensures that subsequent chat sessions have access to the latest project state. The real-world benefit is concrete: without memory banks, AI assistants might suggest switching libraries when encountering minor issues, losing sight of established architectural decisions. With documented context in the memory bank, the AI maintains consistency with previously-made technology choices.
Cline's new_task Tool and Context Management
Cline, another AI coding assistant, has implemented a novel approach to context window limitations through its new_task tool, which enables a form of persistent memory for complex, long-running tasks. The workflow operates through:
- Monitoring: Cline monitors context usage as defined in
.clinerules - Triggering: When a threshold (e.g., 50% context usage) is reached, Cline finishes its current step
- Proposing: Cline uses
ask_followup_questionto suggest creating a new task, showing the structured context it plans to carry over
This approach transforms context window limitations from a hard constraint into a manageable workflow. Rather than forcing developers to manually reset conversations when context fills up, Cline automatically proposes task transitions while preserving critical context. The .clinerules file becomes the mechanism for defining what information must survive the transition.
Multi-AI Workflows and the Model Context Protocol
One of the most significant developments for addressing context persistence is the emergence of Model Context Protocol (MCP), which enables standardized communication between AI tools and external systems.
Recallium: Cross-Tool Memory Integration
Recallium, a self-hosted memory system built on MCP, directly addresses the problem of tool isolation. It works with Cursor, Claude Desktop, VS Code, and any MCP-compatible tool, providing:
- Seamless cross-tool integration: Store context once, access everywhere
- Unified knowledge base: A single source of truth for your entire development workflow
- No tool lock-in: Memory systems that work across platforms rather than within proprietary silos
This approach is fundamentally different from relying on individual tool features. Instead of hoping that Claude's planning mode or Cursor's memory banks will persist, developers can implement their own persistent memory layer that transcends any single tool. If you switch from Cursor to a different assistant, your accumulated context doesn't evaporate—it remains accessible through the MCP interface.
The significance of this shift cannot be overstated. It transforms memory from a feature that vendors control into infrastructure that developers control. This aligns with the core insight that plans are definitions of success—and definitions of success should never be stored in locations where they can disappear due to vendor decisions, feature deprecation, or context compaction algorithms.
Documentation as Source of Truth: Beyond Implementation
The philosophy that documentation must persist as the definition of success represents a fundamental shift in how developers should approach AI-assisted development. This goes beyond simply keeping notes; it's about establishing an immutable reference point against which implementation can be verified.
The Lifecycle of Documentation
Documentation should persist through multiple phases:
- Planning phase: Initial specifications, architectural decisions, and success criteria are documented
- Implementation phase: The AI assistant references these documents while writing code
- Verification phase: Documentation serves as the specification against which implementation is tested
- Testing phase: Test cases are written to validate that implementation matches documented requirements
- Long-term reference: Documentation becomes the canonical source for understanding why decisions were made, not just what was implemented
When plans are stored in ephemeral locations like ~/.claude/plans/, they typically only survive phases 1-2. By the time verification and testing occur, the original plan may have been compacted away, leaving developers without a reference point. They're left comparing implementation against fuzzy memory rather than against documented specifications.
When Documentation Becomes Counterproductive
However, documentation can become counterproductive in specific scenarios:
- Outdated specifications: When documentation reflects an earlier understanding that has been superseded by new information, it becomes misleading rather than helpful
- Over-specification: Excessive documentation of implementation details (rather than requirements) can constrain the AI assistant's ability to find better solutions
- Documentation drift: When documentation isn't actively maintained alongside code changes, it becomes a source of confusion rather than clarity
The solution isn't to avoid documentation but to treat it as a living artifact that requires the same maintenance discipline as code. Tools like Cursor's memory bank update workflow (You: Update memory bank → AI: [Reviews all memory files and updates relevant sections]) acknowledge this reality by making documentation updates an explicit part of the development process.
Technical Analysis: How Context Compaction Works
Understanding the mechanics of context compaction is essential for appreciating why persistent external documentation is necessary.
Token Budgeting and Prioritization
When an LLM reaches its context window limit, it must decide what information to keep and what to discard. Different models employ different strategies:
- Recency bias: Prioritizing recent messages while compacting older ones
- Semantic importance: Attempting to identify and preserve semantically important information
- Automatic summarization: Condensing earlier conversations into summaries that consume fewer tokens
Each approach has failure modes. Recency bias means foundational architectural decisions discussed early in a project get compacted away. Semantic importance relies on the model's ability to judge what matters—which may not align with what developers actually need. Automatic summarization loses detail and nuance.
The critical insight is that context compaction is not a bug—it's a necessary feature of stateless systems with fixed context windows. Developers cannot expect AI assistants to maintain perfect fidelity to earlier conversations indefinitely. The only reliable solution is to store critical information outside the AI system's context window.
Working Memory vs. Long-Term Memory Architecture
The distinction between working memory and long-term memory in systems like OpenSearch and Amazon Bedrock reflects this reality:
- Working memory is optimized for immediate access and rapid updates but is inherently temporary
- Long-term memory requires explicit extraction and storage but survives across sessions
This architecture acknowledges that not all information needs to be instantly accessible. Plans and architectural decisions don't need to be in working memory during every interaction—they need to be reliably retrievable when needed. By separating these concerns, developers can maintain large amounts of persistent context without bloating the immediate conversation context.
Structured Planning Methodology for AI-Assisted Development
Rather than fighting against AI limitations, effective workflows build planning methodologies that work with these constraints.
The Plan as Definition of Success
A plan is fundamentally a definition of success. It establishes:
- What the final implementation should accomplish
- What constraints or requirements must be satisfied
- What trade-offs have been accepted
- What alternatives were considered and rejected
When a plan is deleted or lost due to context compaction, developers lose their reference point for success. They can no longer answer the question: "Does this implementation match what we planned to build?" Instead, they're left with only the current implementation to evaluate—which may have drifted significantly from the original intent.
This is why storing plans in ephemeral locations is fundamentally problematic. A plan stored in ~/.claude/plans/ that disappears during context compaction has failed its primary purpose. It's no longer available to guide implementation decisions or verify that the final result matches the specification.
Structured Context Handoff
Cline's approach to context handoff through the new_task tool demonstrates how structured planning can work with context window limitations:
- Explicit context definition: The
.clinerulesfile explicitly defines what context must survive a task transition - Automated monitoring: The system monitors context usage and automatically triggers transitions when thresholds are reached
- Transparent handoff: Rather than silently compacting context, the system explicitly shows what context is being carried forward
- User control: Developers can adjust rules to ensure their critical information survives transitions
This approach treats context window limitations not as a problem to hide but as a constraint to design around explicitly.
The Ephemeral Nature of AI Features and Tool Lock-In Risk
AI coding assistant features are evolving rapidly, and this creates a strategic risk for developers who build workflows around proprietary features.
Feature Deprecation and Vendor Evolution
Consider the trajectory of AI features:
- Claude's planning mode (if it exists) may be replaced by different capabilities in future versions
- Cursor's memory banks represent a community project rather than an official feature
- GitHub Copilot's capabilities have shifted multiple times as the underlying models changed
- New tools like Cline and Recallium emerge regularly with novel approaches to context management
Developers who build their entire workflow around a specific tool's memory feature risk finding that feature deprecated, changed, or made obsolete by a vendor decision. The planning mode that seemed reliable today might be replaced by a different approach tomorrow.
AI-Agnostic Infrastructure
The solution is to implement memory and planning systems that are agnostic to specific AI tools. This means:
- Using standard formats: Storing plans in plain text, Markdown, or structured formats that any tool can read
- Leveraging MCP: Building on the Model Context Protocol to create tool-independent memory layers
- Maintaining local copies: Keeping all critical documentation in version-controlled repositories, not in proprietary tool storage
- Designing for portability: Structuring context so it can be easily migrated between different AI assistants
Recallium's approach exemplifies this philosophy—it works with Cursor, Claude Desktop, VS Code, and any MCP-compatible tool, rather than locking developers into a single platform.
Case Study: Multi-AI Development Workflows
The emerging pattern in sophisticated development workflows involves using multiple AI tools for different purposes, with persistent memory serving as the integration point.
Unified Context Across Tools
A developer might use:
- Cursor for primary code editing and implementation
- Claude Desktop for architectural discussions and planning
- Cline for autonomous task execution on complex refactoring
- OpenAI's API for specialized analysis or code review
Without a unified memory system, context established in one tool becomes inaccessible in another. A plan discussed with Claude Desktop isn't available in Cursor. Architectural decisions made in Claude become invisible to Cline. This fragmentation forces developers to repeatedly re-establish context across tools.
With persistent memory infrastructure (whether through Recallium, OpenSearch, or custom solutions), context becomes portable:
- Developer establishes plan in Claude Desktop
- Plan is stored in persistent memory (accessible via MCP)
- Cursor accesses the same plan when implementing
- Cline references the plan when executing autonomous tasks
- All tools operate from a unified definition of success
Verification Across Tools
This unified approach enables verification workflows that would be impossible with tool-specific memory:
- Planning: Claude Desktop helps establish detailed specifications
- Implementation: Cursor implements against the specification
- Verification: Claude Desktop reviews implementation against the original plan
- Testing: Cline generates test cases based on the specification
- Documentation: All tools contribute to a unified documentation artifact
Each tool contributes its strengths while maintaining reference to the shared plan.
Best Practices for Documentation in AI-Assisted Development
Based on the emerging ecosystem of memory solutions and documented workflows, several best practices are emerging:
1. Separate Plans from Implementation Details
Documentation should clearly distinguish between:
- Specifications: What the system should do (stable, rarely changes)
- Architecture: How the system is organized (changes infrequently)
- Implementation: How specific components work (changes frequently)
Plans should focus on specifications and architecture. Implementation details can be more fluid, but they should always be verifiable against the stable specification.
2. Establish Update Rituals
Rather than treating documentation as a one-time artifact, establish regular update rituals:
- At the end of each development session, explicitly update the memory bank or persistent documentation
- Use structured prompts like "Update memory bank" to trigger AI-assisted documentation updates
- Review documentation at the start of each session to re-establish context
3. Use Structured Formats
Store plans in formats that are both human-readable and machine-parseable:
- Markdown for human readability
- YAML or JSON for structured data that tools can process
- Version control (Git) for history and rollback capability
This enables both humans and AI systems to work with the same documentation reliably.
4. Implement Namespace Hierarchies
Organize documentation using namespace patterns that align with your project structure:
project/
├── plans/
│ ├── architecture.md
│ ├── success_criteria.md
│ └── constraints.md
├── decisions/
│ └── adr/ (Architecture Decision Records)
├── memory/
│ ├── tech_stack.md
│ └── project_context.md
└── code/
This structure makes it clear what information is foundational (plans, architecture) versus contextual (memory, decisions).
5. Know When to Stop Documenting
Documentation becomes counterproductive when:
- It documents implementation details that change faster than the documentation can be updated
- It specifies solutions rather than problems, constraining the AI's ability to find better approaches
- It becomes so detailed that maintaining it consumes more time than the value it provides
The goal is to document the definition of success, not every implementation detail. If the AI can achieve the success criteria through a different approach, that's a win, not a failure.
Statistics and Adoption Data
While specific adoption metrics for memory management systems in AI coding assistants are limited, several indicators suggest rapid growth in this space:
Context Window Usage Patterns
Developers using advanced AI coding assistants report that context windows fill up significantly faster than expected in real-world projects. Cline's implementation of context monitoring and the new_task tool reflects this reality—the tool was built because developers consistently hit context limits during extended development sessions.
Memory System Adoption
The rapid emergence of multiple memory solutions (OpenSearch agentic memory, Amazon Bedrock AgentCore Memory, Cursor Memory Banks, Recallium, Cline's new_task) within a 12-month period (2024-2025) indicates strong market demand. Each solution addresses the same fundamental problem: developers need persistent memory that survives across AI interactions.
MCP Ecosystem Growth
The Model Context Protocol, which enables cross-tool memory integration, has become a standard for AI tool interoperability. The fact that multiple vendors (Anthropic, OpenAI, and emerging startups) are building MCP-compatible memory systems suggests this is becoming table stakes for AI coding assistants.
Expert Perspectives and Vendor Statements
OpenSearch on Persistent Memory
OpenSearch's documentation emphasizes that "effective AI agents need more than just language understanding—they require the ability to maintain context and learn from interactions over time. Current AI systems process each conversation independently, lacking the persistent memory that enables meaningful, evolving relationships with users". This statement directly acknowledges the core problem: stateless systems cannot maintain context without external infrastructure.
Amazon on Context Window Management
Amazon's position is that context window management is a "fundamental challenge" that developers "encounter" when implementing memory for AI agents. Rather than positioning this as a solved problem, Amazon frames it as an ongoing challenge that requires sophisticated solutions. This honesty about the difficulty reflects the reality that context window constraints are not easily overcome.
Developer Community on Tool Isolation
The Cursor community discussion about persistent AI memory reveals developer frustration with tool isolation. The introduction of Recallium was framed as addressing a critical gap: "Store context once, access everywhere. No more tool isolation—your AI remembers across all platforms." This framing suggests that developers view tool-specific memory as fundamentally inadequate.
The Gap Between Capabilities and Reliability
The core tension in AI coding assistants is between marketed capabilities and practical reliability.
Marketed Capabilities
Vendors promote sophisticated features:
- Claude's planning mode (if available)
- Cursor's memory banks
- GitHub Copilot's context awareness
- Cline's autonomous task execution
These features suggest that AI assistants can maintain context, follow plans, and execute complex tasks reliably.
Practical Reliability
In practice, developers encounter:
- Context compaction that erases earlier conversations
- Memory systems that don't survive tool switches
- Plans that disappear when context windows fill
- Inconsistent behavior when the AI loses track of earlier decisions
This gap creates the need for workarounds—developer-created solutions that compensate for the unreliability of built-in features.
User-Created Workarounds
Sophisticated developers have responded by creating their own infrastructure:
- Storing plans in version-controlled repositories rather than tool-specific locations
- Using
.clinerulesand similar configuration files to explicitly define what context must survive transitions - Implementing MCP-based memory systems that work across multiple tools
- Treating documentation as the source of truth rather than relying on AI memory
These workarounds are not failures of individual tools—they're rational responses to the fundamental architectural limitations of stateless LLMs.
Implications for Development Workflows
Short-term: Accept Context Limitations
In the near term, developers should:
- Assume that context will be lost during extended sessions
- Design workflows that explicitly manage context transitions (like Cline's
new_taskapproach) - Store all critical information outside AI-specific locations
- Treat documentation as the definition of success, not the AI's memory
Medium-term: Implement Persistent Infrastructure
Developers should:
- Evaluate persistent memory systems (OpenSearch, Amazon Bedrock, Recallium)
- Implement MCP-based memory layers that work across multiple tools
- Establish documentation update rituals as part of the development process
- Use structured formats that enable both human and AI access to plans
Long-term: Demand Reliability
As the ecosystem matures, developers should:
- Expect memory systems to be as reliable as databases, not as fragile as conversation history
- Demand clear guarantees about what information will persist across sessions
- Evaluate tools based on their memory reliability, not just their coding capabilities
- Build workflows that are portable across tools, reducing lock-in risk
Conclusion: Plans as Immutable Definitions of Success
The critical insight underlying this entire analysis is simple but profound: a plan is a definition of success, and deleting it means you have nothing to compare your implementation against.
When developers store plans in ephemeral locations, they're making an implicit bet that context compaction won't discard them. When they rely on tool-specific memory features, they're betting that vendors won't deprecate or change those features. When they trust AI assistants to remember architectural decisions, they're betting against the fundamental statelessness of LLMs.
The emerging ecosystem of persistent memory solutions—from OpenSearch to Amazon Bedrock to Recallium—represents an industry-wide recognition that this bet is not safe. The solution is not to hope that AI assistants will remember better, but to implement infrastructure that ensures critical information persists regardless of context window limitations, vendor decisions, or tool changes.
- Documentation should persist beyond implementation completion through verification, testing, and into long-term reference.
- Plans should be stored in version-controlled repositories, not in proprietary tool storage.
- Context should be managed through explicit protocols (like MCP) rather than relying on individual tool features.
- Success should be defined in documents that survive context compaction, not in ephemeral AI memory.
This shift—from trusting AI memory to implementing persistent infrastructure—represents a maturation of AI-assisted development. It acknowledges both the power of AI coding assistants and their fundamental limitations. It treats AI tools as powerful but unreliable collaborators, not as complete solutions. And it places the responsibility for maintaining context and definitions of success where it belongs: with developers and their infrastructure, not with stateless language models.
# My CLAUDE-PLAN file:
Planning Workflow
A structured approach for planning non-trivial implementations. Use this workflow when tasks involve multiple files, architectural decisions, or significant changes.
When to Plan
Plan when:
- Task touches 3+ files
- Multiple valid implementation approaches exist
- Architectural decisions are needed
- You're unsure of the full scope
Skip planning for:
- Single-line fixes
- Typos
- Simple renames
- Tasks with explicit, detailed instructions
Critical: Plan Persistence (Rule 14)
Plans in ~/.claude/plans/ are EPHEMERAL. They do not survive context compaction.
ALWAYS copy plans to ./plans/ FIRST. This ensures:
- Plans persist across context boundaries
- Plans survive conversation summarization
- Future sessions can reference past work
# Step 0 of EVERY plan execution
cp ~/.claude/plans/[plan-name].md ./plans/<descriptive-name>.md
Then work from the copy in ./plans/, not the original.
Planning Phases
Phase 1: Understand
Goal: Fully understand the request before designing a solution.
Explore the codebase using Explore agents (1-3 in parallel)
- Search for existing implementations
- Find related components
- Identify patterns to follow
Ask clarifying questions using AskUserQuestion
- Don't assume user intent
- Clarify ambiguous requirements
- Confirm scope boundaries
Phase 2: Design
Goal: Create a concrete implementation plan.
Use Plan agent to design the approach
- Provide context from Phase 1
- Include file paths discovered
- Describe requirements and constraints
Plan should include:
- Problem statement
- Implementation steps (checkboxes)
- Files to modify (with paths)
- New files to create
- Database changes (if any)
- Testing approach
Phase 3: Review
Goal: Validate the plan before execution.
- Read critical files identified in the plan
- Verify alignment with user's original request
- Ask final questions if needed
- Get user approval before proceeding
Phase 4: Execute
Goal: Implement the plan systematically.
- Copy plan to
./plans/(if not already done) - Work through steps sequentially
- Check off items as you complete them
- Update plan if scope changes
- Mark complete when finished: add
## Status: ✅ Complete
Plan File Template
# [Plan Title]
## Problem Statement
[Brief description of what needs to be fixed/built]
---
## Implementation Plan
### Phase 1: [First Major Step]
[Details]
### Phase 2: [Second Major Step]
[Details]
---
## Files to Modify
- `path/to/file1.ts` - [what changes]
- `path/to/file2.ts` - [what changes]
## New Files
- `path/to/new-file.ts` - [purpose]
---
## Execution Order
0. [ ] Copy this plan to `./plans/<name>.md`
1. [ ] First task
2. [ ] Second task
3. [ ] ...
n. [ ] Test end-to-end
n+1. [ ] Mark plan complete
---
## Notes
[Any gotchas, dependencies, or things to watch for]
Tracking Progress
During execution:
- Use
TodoWritefor real-time progress tracking - Update checkboxes in the plan file as you complete steps
- If blocked, add notes to the plan explaining why
After completion:
- Add
## Status: ✅ Completeat the top of the plan - Keep the plan in
./plans/for reference (don't delete)
Why This Matters
Context compaction loses
~/.claude/plans/- Claude's context window has limits. When conversations get long, earlier content gets summarized. Plans in~/.claude/disappear../plans/is in the project - Files here are re-read when context is rebuilt. Plans persist.Checkboxes show progress - If context compacts mid-implementation, the plan file shows what's done vs remaining.
History for future reference - Completed plans document decisions made and approaches taken.