My answers to the questions I posed about porting open source code with LLMs

11th January 2026

Last month I wrote about porting JustHTML from Python to JavaScript using Codex CLI and GPT-5.2 in a few hours while also buying a Christmas tree and watching Knives Out 3. I ended that post with a series of open questions about the ethics and legality of this style of work. Alexander Petros on lobste.rs just challenged me to answer them, which is fair enough! Here’s my attempt at that.

You can read the original post for background, but the short version is that it’s now possible to point a coding agent at some other open source project and effectively tell it “port this to language X and make sure the tests still pass” and have it do exactly that.

Here are the questions I posed along with my answers based on my current thinking. Extra context is that I’ve since tried variations on a similar theme a few more times using Claude Code and Opus 4.5 and found it to be astonishingly effective.

Does this library represent a legal violation of copyright of either the Rust library or the Python one?

I decided that the right thing to do here was to keep the open source license and copyright statement from the Python library author and treat what I had built as a derivative work, which is the entire point of open source.

Even if this is legal, is it ethical to build a library in this way?

After sitting on this for a while I’ve come down on yes, provided full credit is given and the license is carefully considered. Open source allows and encourages further derivative works! I never got upset at some university student forking one of my projects on GitHub and hacking in a new feature that they used. I don’t think this is materially different, although a port to another language entirely does feel like a slightly different shape.

Does this format of development hurt the open source ecosystem?

Now this one is complicated!

It definitely hurts some projects because there are open source maintainers out there who say things like “I’m not going to release any open source code any more because I don’t want it used for training”—I expect some of those would be equally angered by LLM-driven derived works as well.

I don’t know how serious this problem is—I’ve seen angry comments from anonymous usernames, but do they represent genuine open source contributions or are they just angry anonymous usernames?

If we assume this is real, does the loss of those individuals get balanced out by the increase in individuals who CAN contribute to open source because they can now get work done in a few hours that might previously have taken them a few days that they didn’t have to spare?

I’ll be brutally honest about that question: I think that if “they might train on my code / build a derived version with an LLM” is enough to drive you away from open source, your open source values are distinct enough from mine that I’m not ready to invest significantly in keeping you. I’ll put that effort into welcoming the newcomers instead.

The much bigger concern for me is the impact of generative AI on demand for open source. The recent Tailwind story is a visible example of this—while Tailwind blamed LLMs for reduced traffic to their documentation resulting in fewer conversions to their paid component library, I’m suspicious that the reduced demand there is because LLMs make building good-enough versions of those components for free easy enough that people do that instead.

I’ve found myself affected by this for open source dependencies too. The other day I wanted to parse a cron expression in some Go code. Usually I’d go looking for an existing library for cron expression parsing—but this time I hardly thought about that for a second before prompting one (complete with extensive tests) into existence instead.

I expect that this is going to quite radically impact the shape of the open source library world over the next few years. Is that “harmful to open source”? It may well be. I’m hoping that whatever new shape comes out of this has its own merits, but I don’t know what those would be.

Can I even assert copyright over this, given how much of the work was produced by the LLM?

I’m not a lawyer so I don’t feel credible to comment on this one. My loose hunch is that I’m still putting enough creative control in through the way I direct the models for that to count as enough human intervention, at least under US law, but I have no idea.

Is it responsible to publish software libraries built in this way?

I’ve come down on “yes” here, again because I never thought it was irresponsible for some random university student to slap an Apache license on some bad code they just coughed up on GitHub.

What’s important here is making it very clear to potential users what they should expect from that software. I’ve started publishing my AI-generated and not 100% reviewed libraries as alphas, which I’m tentatively thinking of as “alpha slop”. I’ll take the alpha label off once I’ve used them in production to the point that I’m willing to stake my reputation on them being decent implementations, and I’ll ship a 1.0 version when I’m confident that they are a solid bet for other people to depend on. I think that’s the responsible way to handle this.

How much better would this library be if an expert team hand crafted it over the course of several months?

That one was a deliberately provocative question, because for a new HTML5 parsing library that passes 9,200 tests you would need a very good reason to hire an expert team for two months (at a cost of hundreds of thousands of dollars) to write such a thing. And honestly, thanks to the existing conformance suites this kind of library is simple enough that you may find their results weren’t notably better than the one written by the coding agent.

Posted 11th January 2026 at 10:59 pm · Follow me on Mastodon, Bluesky, Twitter or subscribe to my newsletter

Simon Willison’s Weblog