Full disclosure: “AI-native” feels like a buzzword from a Microsoft marketing campaign, but it’s the shortest path to get your attention for a critical discussion on how we thoughtfully integrate AI into the programming tools we build.
For over 25 years, I’ve helped build (parts of) compilers, transpilers, linting rulesets, and runtime developer tooling with the goal of eliminating categories of bugs without blaming developers or expecting humans to work flawlessly every day they show up to work. Some days will be a mess for so many reasons. Not everyone can or should be perfectly performing all the time. So previously in software development we built languages and linters that made error paths unrepresentable (or close to it), IDEs that taught best practices through completion, and type systems that told about the thing that should not be.
That legacy is worth defending. Yet defending it doesn’t mean rejecting generative AI. It means incorporating AI tools in a way that respects the semantics, architecture, and maintainability concerns of modern software systems.
Because the alternative is already here and it’s already worse.
The Problem Isn’t Just [General Purpose] LLMs. It’s What We’re Teaching with Them!
In 2023 and 2024, LLMs regularly hallucinated function names or signatures, threw null pointer exceptions in the “happy” path, and left partial functions uncaught. But by mid-2025, base model quality has drastically improved, at least for mainstream languages with massive training data on GitHub like JavaScript, Python, and TypeScript.
Yet even as the surface quality of code generated from these models increases, the deeper concern is that we’re training the next generation of developers to build throw-away codebases after 5 or 10 N-shot prompts. They’re launching with generic, fragile, AI-generated foundations and walking away as soon as maintaining that code becomes difficult (and let’s be honest only an AI agent enjoys working in 20k LOCs of verbose, entangled React mess). Others, fearful of being replaced, start projects based on resume-driven buzzwords, which are increasingly lowest common denominator languages (like the mainstream languages above) because that’s what gets stuffed into increasing fake job postings.
This trend threatens everything maintainable software stands for. We don’t just need AI to generate code. We need it to help understand, adapt, and evolve code, which requires a different vision for developer tools.
You Built Tools to Enforce Guarantees. Now Build Tools That Teach Them.
Imagine a world where AI assistance isn’t just a sidecar that blurts plausible completions. It’s a runtime component of your tooling stack—embedded in the compiler, woven into type inference, present in the debugger, and visible in the IDE not as a ghostwriter but as a collaborator.
You’ve built tooling to eliminate whole classes of bugs. Imagine extending that philosophy to AI:
- 🔧 Compilers and Typecheckers:: Embed language-aware small language models (SLMs) to explain type errors conversationally, not just point to them. All language models are probabilistic generators and may still produce incorrect or unidiomatic suggestions. Yet small, specialized models trained on curated examples significantly reduce hallucination rates, especially when grounded with project context via LSP or RAG mechanisms. Expose the LSP as an MCP server that can provide services to aid the language model in understanding your codebase. Now the possibilities explode.
We can use generative repair tools to suggest refactorings that preserve invariants across large changes, upgrades, or security patches.
📦 Package Managers and Type Indexes:: Enable RAG pipelines with
llms.txt
-style structured metadata, curating not just API signatures but pattern guides and architectural trade-offs. Teach AI to recommend libraries not just by popularity, but by compatibility with your architectural style.🧠 Language Servers (LSP) as AI Gateways:: Extend LSP servers with Model Context Protocol (MCP) endpoints to support questions like:
- “What parts of the codebase could this error type originate?”
- “Find me all the parts of the code that dependend on this feature flag.”
- “Show me common async handling strategies based on this repo’s style guide.”
Let the LSP speak for your language’s semantics and your project’s organizing and structural habits.
🐞 Runtime and Debugger Integration Use AI to propose watch expressions, decode structured logs, and explain anomalies in terms of recent code changes.
Imagine a debugger that answers: “What prior state transitions have led to this unexpected value?”
📚 Documentation and Tutorials Language doc generators could ship interactive tutors, where developers explore not just syntax but the reasoning behind language design.
Turn RFCs and compiler PRs into conversational history: “Why did the language move away from implicit coercion?”
🤖 IDE and Agent Rulesets Distribute pre-packaged AI rulesets tailored to your language: strict, experimental, pedagogical, refactor-safe.
Vendors could offer plug-and-play agent configurations: “Strict Rust assistant,” “FP-for-OOP devs,” or “Team-convention aware Python agent.”
Tooling Creators Are Sitting on the Hardest Part of AI
Foundation models are improving fast—but general-purpose agents still hallucinate, overgeneralize, and ignore discipline. You’ve already encoded rules and a certain philosophy into your tooling. Now imagine using it to supervise, refine, and contextualize generative output. You control the one thing base models can’t: semantic authority.
What you build next doesn’t have to (and definitely won’t) solve AGI. It just has to teach the AI how to think like a language does.
We Need To Stop Thinking About “Final” Code
Software doesn’t end. It adapts. Evolves. Rotates team members. Changes scope. Maintenance isn’t the afterthought—it’s the long game.
Too many generative workflows optimize for a false ideal: perfect N-shot generation of a project that will never need to change. But real software is political, adaptive, interrupted, redefined. The economic win isn’t producing version 1.0 fast—it’s safely shipping 17.4 six months after a team rewrite.
The next generation of tooling should embrace that truth.
A Call to Creativity
This isn’t a proposal for a new standard. It’s a provocation:
- What if your compiler could talk back?
- What if your debugger could speculate?
- What if your linter could teach?
- What if your language shipped with a tutor—not a textbook?
- What if your tool could ingest anomaly distributed traces from production and locate a one-in-forty-million race condition?
We’ve spent years building systems that make errors harder. Now we should help build systems that make understanding easier.
Start small. Wire an AI model to your LSP. Wrap a debugger with retrieval. Add an MCP interface to your toolchain. Give AI the structures it needs to reason using structured and contextual inputs, not just autocomplete.
Let’s build tools for evolving systems, not just for static prompts. Let’s imagine a profession that’s sustainable for people, profitable for businesses, and intellectually worthy of the next generation.
And, most importantly, let’s kill N-shot/chat as the user interface for AI assisted software development.
If you enjoyed this content, please consider sharing this link with a friend, following my GitHub, Twitter/X or LinkedIn accounts, or subscribing to my RSS feed.