Python developer and researcher Sebastian Raschka, PhD, has been getting a lot of requests for his setup of local coding agents.
Local AI on Your Computer
For those who don’t know, a local coding agent is essentially a mini AI model that runs directly on your computer, rather than on a cloud service like Claude or Codex. This approach has some serious benefits, like faster performance and more control over your data.
Raschka explains that using local agents can be particularly useful for people working with sensitive or large datasets, where uploading data to a cloud service might not be ideal. He’s also seen it as a way to reduce latency and improve the overall responsiveness of AI-powered tools.
Open-Weight Models to the Rescue
So how does Raschka’s local agent stack work? He recommends using open-weight models, which are AI models that can be trained and fine-tuned locally. These models, like the ones used in Codex and Claude, have a huge number of parameters that require a lot of computational power to train and run.
However, Raschka and others have been experimenting with a new approach: using these open-weight models in “harnesses.” Essentially, you can wrap an open-weight model in a smaller, more efficient “harness” that still leverages the full power of the original model, but with much lower computational overhead.
What this means for you
This development may not mean much to casual users, but for AI developers and data scientists, it’s a big deal. With the ability to run powerful AI models directly on their computers, they’ll be able to build and deploy custom models faster, without relying on expensive cloud services or proprietary APIs.
For the rest of us, it’s a reminder that there’s still a lot of innovation happening in the world of AI, and that there are often more options and approaches than we might think. As AI becomes increasingly integrated into our daily lives, it’s the open-source, locally-run AI models that will likely shape the future of AI development.



