This project was purpose-built to study the emerging practice of agentic development—integrating AI assistance throughout the software lifecycle. The objective wasn't just to ship a working CLI tool, but to rigorously evaluate how modular architectures, shared configurations, and resilient design can support meaningful collaboration with large language models (LLMs).
The result was Tilecraft, a production-ready vector tile generator built with an AI-augmented workflow. Along the way, we codified concrete architectural patterns, documented code and test metrics, and assessed real-world LLM capabilities and limitations through direct observation.
Context: Building Tilecraft
Tilecraft ingests OSM data, transforms it into a vector tile schema, and renders it in a custom cartographic style using open standards. At its core, it's a CLI app built to be fast, modular, and human-friendly.
Agentic development refers to a modular, AI-augmented workflow that integrates LLM agents as co-workers across the software lifecycle—from design and config generation to real-time code fixes and test authoring.
Key Principles Discovered
1. Modularize Everything (for the Agent, Not Just the Human)
AI tools work better with boundaries. We quickly learned that long monolithic files or loosely scoped functions led to poor LLM performance and lower-quality suggestions. Refactoring our codebase into tight, single-responsibility modules improved LLM interactions dramatically.
2. Configuration is Collaboration
We started with human-readable YAML configs for data sources, layer schemas, and styling parameters. But midway through, we realized that these configs also served as a shared memory space for the AI agents.
3. Use Progressive Enhancement, but for Intelligence
Rather than expect the AI to write perfect code the first time, we scaffolded workflows that let it propose partial solutions: write function shells with TODOs, auto-insert logging wrappers, fallback to mocked outputs when real data fails.
4. Graceful Degradation is Not Optional
LLMs will get things wrong. They will hallucinate field names, misinterpret schemas, or propose unparseable JSON. The key is to design with failure as a first-class citizen.
Documented Results
Qualitative Observations
While we lack baseline measurements for precise quantification, the development process exhibited several notable characteristics:
- Development velocity felt accelerated compared to similar past projects, though without controlled measurement
- AI generated substantial boilerplate code for CLI scaffolding, error handling, and documentation
- Test-first approach reduced iteration cycles by catching AI-generated errors early
- Rich documentation was AI-assisted but required human review and refinement
- Error handling was more comprehensive due to AI's systematic approach to edge cases