Field Notes · Chapter IV
Chapter 4: Where should AI services aim? (Feat. a Local-CLI retrospective)
Local-CLI — an AI coding assistant CLI for offline enterprise environments. A local-LLM platform with a Plan & Execute architecture, a Supervised Mode, and an auto-update system.
That project is my own open source work, and looking back, it is probably the one I poured the most into. It began in the era when Claude Code, and frankly even qwen-coder, were not realistically usable inside a closed-network environment — the goal was simply to give people inside that kind of network a real CLI coding-engine experience. Because the company was Windows-dominant, it grew a Windows UI on top before I wrapped it up. The last numbers I checked before leaving were roughly an average MAU of 420 and a DAU of 50. Not enormous in absolute terms, but the highest I’ve ever hit as a solo project. While I settle into a new company and the studying piles up, there will not be much in the way of updates for a while. Once the new role stabilizes, I expect to either spin a new project off this codebase or push this one further.
There were a lot of things I felt while building and shipping Local-CLI, and I want to write this one as a retrospective. Specifically: which directions I tried actually pulled new users in, which ones bounced off them, and what I now believe matters most for anyone trying to design AI services next.
1. The market aimed at “hard users” is already saturated.
Claude Code, Codex, the X.ai CLI, Antigravity, then OpenCode, Goose Code, OpenClaw, and so on. The people most enthusiastic about AI, most willing to open their wallet, most eager to adopt the next new thing — the services aimed at them are already in oversupply. This holds on both the enterprise side and the open-source ecosystem side.
There is, of course, something interesting about joining that race anyway, and at the very start of Local-CLI I was aiming there too. But the more I built and the more I tried to promote it, the more I noticed something different: a sizeable share of users are actually fatigued by the velocity of new AI tooling, not energized by it.
2. Most of the world doesn’t want to learn anything new.
When I meet up with my friends, we still play League of Legends and KartRider (a long-running Korean racing game with Mario-Kart-shaped cultural muscle memory baked in). Plenty of new games are out, but I don’t want to spend the time learning them. If I learned a new game today, sure, a year from now I’d have more things to enjoy. But in the moment, the familiar game is just more fun.
AI, for most people, lands the same way. Rather than learning a new technology and re-fitting it into their workflow, continuing to do the work the way they’ve always done it is more efficient today. On a multi-year curve that ratio absolutely flips. But the cost of learning, right now, is large enough that the short-term math is brutal. And — this could be a feature of the organization I was in — that demographic was much larger than I had expected.
3. So UI/UX is the thing.
The technology was not the thing. There were fewer users hungry for the technical ceiling than I had assumed, and that’s probably because people who do want the technical ceiling are not, in the first place, going to look for the answer in some individual’s open source project. What the bulk of users wanted was convenience. Something they could apply without learning. Something where install and setup were trivial. Something they could plug directly into work they were already doing. Something whose very first usage example was simple enough that you got value in one move.
MCP, plugins, the whole catalog of extensibility — none of it was the load-bearing factor. Even deciding which features to turn on and off becomes its own learning task. Just take care of it. Take care of it at the right moment, in the right place, with one step shaved off where possible. That, in the end, is the thing. The UI needs to be clean and worth using, but UX is the part that matters.
I am quite confident this will be one of the major signposts of the next phase of the AI market. And I think the direction Claude’s family of services is heading is exactly that. How easy is it to use? My prediction is that that question will be decisive in who wins and loses in AI services. In a way, this rhymes with the older success formula of Apple’s mobile era, and frankly with the success formula of most software. B2B businesses might look a little different. But by the time the B2C AI era is fully cooked, the winner will be whoever is easier to use.
4. Not depending on AI libraries was the right call.
For Local-CLI, I set out with the explicit goal of not relying on LangGraph, LangChain, or any other AI framework. There were several reasons, but the heaviest one was the sheer volume of patches that landed without regard for backward compatibility. That seems to be the fate of AI frameworks — the frontier of AI rotates every three to six months, so no single framework gets to settle into a stable identity in that climate.
Build time has gotten shorter, sure, but a project still has to crawl through the loop of plan → spec → build → ship → feedback → iterate, and by the time you finish that loop, the world outside has changed shape. The reason avoiding libraries was the right call is that whether your initial intuition turned out to be right or wrong, you still need to get the original idea to ship and watch how the market responds. Worry too much about libraries and news, and the idea you set out with starts to dissolve — the project loses its bearings.
So especially for AI-adjacent libraries, technologies, and protocols, I came away convinced that not using them and not depending on them matters more than you’d think. Vibe coding has also chipped a big chunk off the necessity of libraries in general. Honestly, at this point, I think borrowing the idea from a library rather than pulling the whole library in produces lighter projects, less external dependency, and — interestingly — more learning along the way.
5. Picking up user intent is genuinely, genuinely hard.
Reading user intent is unreasonably difficult. This is a problem you cannot avoid the moment you start building anything with a multi-iteration agentic structure, and from where I sit, neither Claude nor OpenAI has fully cleared it yet. The bottleneck is long-term memory. For a developer who keeps CLAUDE.md well-maintained, this is a non-issue. But for the convenience of normal users — users who will never write a CLAUDE.md — long-term memory is non-negotiable, because that’s what makes an agentic service feel like it understands you when it talks to you. The problem is that which information to store, which to delete, and which to update is genuinely ambiguous.
Take me, for instance. I wanted an objective Claude evaluation of my recent job change. But Claude already knew too much about me, and to get an actually objective evaluation I had to manually wipe its memory. Can a normal user understand that, and execute on it? I don’t think so — most won’t. So the real question is how to decide which information is genuinely long-lasting in value, which information should only be referenced in moderation, and which information has no value at all. And on top of that: under what assumed shared context is the current user request being made in the first place? That, for me, was the hardest unsolved problem in the project, and an area that still needs a great deal more experimentation.
Separately from all of that, I also got to feel vibe coding much more deeply.
6. The bigger the project gets, the harder vibe coding becomes.
This is obvious once you say it out loud, but Claude and Codex do not read your code in full. They grep, or they read fragments, and they work from that. So the bigger the project gets, the more critical index management, documentation, and clean code become. (It is genuinely surprising how often clean code decays into dead code and a messy codebase if you don’t deliberately schedule refactoring. To me, that’s strong evidence that Claude Code and Codex are not really aiming to manage the whole codebase as a first-class goal.)
Of course anyone reading this would say who doesn’t know that. But once you get into a real flow state with Claude Max 20, fused into the project, what trips you up over and over is exactly the refactoring you postponed and the dead code you forgot to delete. So habits like periodic documentation, recording recurring mistakes, and auditing the architecture and refactoring right before every big push — those habits turned out to be the deciding factor for how deep you could actually take vibe coding.
7. The answers I picked out of AI’s first response, very rarely, were the answers I actually used.
For issues or new features, the share of times AI’s first proposal was the one I went with was small. The bottleneck wasn’t really the human (it’s hard to find anyone these days who isn’t running with --dangerously-skip-permissions on) — more precisely, the bottleneck was decision-making. And AI’s decisions, the vast majority of the time, were short-sighted (which may itself be evidence that humans work that way too). It was rare for AI to factor in future extensibility or recurrence prevention.
So before handing AI a task, repeatedly pushing back and what I’d call nagging engineering — steering it onto the right path — was important and very effective.
In the end, it comes down to the user.
If your project doesn’t get chosen by users, very little remains. As the cost of building keeps falling, the value of planning keeps rising — I can feel that shift. The most important thing is: how do you build it, which details do you obsess over, which market do you aim at, in order to win the user’s choice? Why does it have to be your service? I feel the era when builders are being asked to answer that question, head-on, has arrived.