Writing a C compiler in pure shell is one of those projects that sounds absurd until you think about bootstrapping. If you want to compile C on a system where you literally have nothing but a POSIX shell, this is exactly what you need. The fact that the parser itself is BNF-generated from shell modules makes it even more interesting as a study in how far you can push shell scripting before it breaks. Would love to see this evolve into a proper repo with tests so it can actually serve as a minimal bootstrapping tool.
It's not just a toy or a fun hobby project, there's potential for practical use as a step in bootstrapping an entire software stack from human-verifiable artifacts.
See also pnut.sh, but the server seems to be down at the moment.
xonix 10 hours ago [-]
But how would POSIX shell be available? Wouldn’t it be compiled from C?
laurenth 8 hours ago [-]
A shell is almost always used to setup the bootstrap environment, so the dependency on a shell is more or less always there.
Otherwise, something special with POSIX shell is its large number of independent implementations, making it the ideal starting point for diverse double-compilation (https://arxiv.org/abs/1004.5534). The idea is to bootstrap a toolchain from multiple compilers (shells in this case), and the result compared to verify that no shell introduced a trusting trust attack.
lpribis 10 hours ago [-]
I don't think any realistic system will bootstrap C from shell. What is the shell implemented in if not C?
cestith 7 hours ago [-]
Even if your shell was compiled in C, it doesn’t mean it wasn’t cross-compiled from another platform.
gaigalas 9 hours ago [-]
This shell.c can be compiled from c89cc.sh itself:
Let me remind you that current stage0 bootstraps tinyc from mes, which is an interpreted lisp. It's not that different from the shell architecturally.
The current stage0 also features kaem as one of the first dependencies. kaem is, in fact, a simpler version of the bourne shell.
It's always a tower. You'll never get a one single clean dependency pass in bootstrapping.
zhongwei2049 8 hours ago [-]
The shell.c ouroboros is really cool. Being able to bootstrap trust through an entirely different language family (shell → C → shell) adds genuine value to the trusting-trust problem beyond just technical novelty.
gaigalas 3 days ago [-]
Single standalone file, no external tools used, PATH='' (empty), portable (bash, dash, ksh, zsh), produces x86 ELF executables, has mini-libc builtin.
Usage:
printf 'int main(){puts("hello");return 0;}' | sh c89cc.sh > hello
chmod +x hello
./hello
angry_octet 19 hours ago [-]
I can't think of a reason to use c89cc.sh, but I salute this effort nonetheless.
gaigalas 9 hours ago [-]
The main show are the techniques for writing portable shell scripts, not the compiler.
If you want something one would actually use, try my project tuish:
Is there a linter to ensure scripts are portable across shells? I try to write them like that but I'm certainly no master so I write them to work with busybox.
It's a container with lots of shells that you can test. Like esvu but for the shell.
Might have a little outdated docs, hit me with an issue if you use it and face any problems (I'm also the author).
t-3 16 hours ago [-]
Why not POSIX or some common external tools where it makes sense? Most of those big switch statements could be easily replaced with some standard programs that already exist everywhere.
Brian_K_White 10 hours ago [-]
Why not just use gcc which already exists everywhere?
When you answer that, same answer. If you can't imagine any answer for that, then the answer won't be convincing or make sense even if anyone tried to articulate it. Which is fine. Everyone doesn't have to find meaning in the same things.
t-3 8 hours ago [-]
Shell without a userland is like FORTH without the ability to define new words. It's really contrary to the whole idea of what a shell is. Bootstrapping in very constrained conditions makes some sense, but where would you have a POSIX shell and not a POSIX userland (or close equivalent) to work with? When I wrote a similar compiler in shell, I purposely offloaded everything I could to external tools and used the shell for composition, so I found the approach intriguing and wanted to ask. I wasn't trying to criticize or dismiss the project, I think it's really cool or else I wouldn't have bothered to read the code in the first place.
Brian_K_White 5 hours ago [-]
gcc exists essentially everywhere a shell exists too. If you're ok with using grep and bc or whatever, then why not gcc?
Or better yet, awk? awk is as old and ubiquitous as sh itself, on every machine even ancient ones that don't even have a compiler because that was a paid extra. Only unlike sh it's actually a full normal programming language that can do basically everything the shell can do only in a far more readable and sane way instead of using wierd expansions and abusing the command line parser to achieve functions it doesn't have overt functions for. Just write directly in awk the same way you would in say python or js. If you have sh, especially if you also have the userland you are talking about, then you have awk. It's part of that userland in a way that gcc is admittedly not.
More in your vein actually, when I do things like this I pick yet a different ideal goal than either you or the author. I avoid all externals (and even child shells) to whatever extent possible, but I do use bash for all its' worth. Every possible intentional or hack bashism. Require bash, but leverage bash to within an inch of it's life and require nothing else.
But this project tageting more portable code that doesn't require bash is really cool and valuable. Even though it's not a standard I personally shoot for even when I am specifically shell-golfing.
There are probably as many different points along the spectrum to draw the line as there are individual developers, each with some actually reasonbable argument to justify that particular place to draw the line.
To me using grep and sed and tr and ls and cat etc etc when I don't need them is just unsatisfying, inelegant, uninteresting.
If you are in bash or ksh93 or zsh, you don't need all kinds of things like basename, dirname, cut, tr, wc, nor some of the more powerful stuff either most of the time. I have a shell function that uses the built-in read combined with a named pipe file created in tmp to make a sleep that doesn't need /bin/sleep. Why bother? because it's awesome. And usually the only times I need to use sleep it's in some rapid short duration polling loop that really is better if you don't have to fork & exec & teardown on every iteration. It's bad enough to be polling like that in the first place. And it just doesn't matter how "probably all the externals will be there", not using them is even better. And these days a lot of once-common "userland" is no longer common or installed by default. A script that never tries to run dos2unix never cares that it's not installed, or that the bsd version behaves differently, or the mac version is stupid old, etc.
anthk 6 hours ago [-]
Busybox it's what you need. On Forth, Subleq+EForth can do a lot more than you think.
gaigalas 14 hours ago [-]
One main reason is performance. Forking for other tools is very expensive.
That said, using larger sed or awk programs instead of ad-hoc calls for small snippets would perhaps be net-positive for performance and readability.
I'm currently working on very strict bootstrap scenarios in which sed and awk might not be available, but a shell might be (if I'm able to write it). It is possible that in such scenarios, the fist send and awk versions will be shell-written polyfills anyway.
MisterTea 10 hours ago [-]
> One main reason is performance
This assumes the executed program is as fast or slower than the caller.
jonahx 16 hours ago [-]
gorgeous!
kelsey98765431 20 hours ago [-]
Would be a lot better if it came with tests. Please do this justice and dont let it rot as a gist, make a real repo and add some docs and at least smoke tests or some kind. Thanks
gaigalas 15 hours ago [-]
This gist is a concatenation of several shell script modules which form a comprehensive parser library for the portable shell.
The main parser and emitter are BFN-generated (that's why they look so mechanical). The BNF parser generator is also written in portable shell (I posted another gist with a preview of it in another thread).
All modules have comprehensive tests, but it is still lacking documentation and not ready for prime time!
akavel 13 hours ago [-]
In the classic FLOSS tradition, it would be cool if you might still consider publishing such a "not-ready" repository - some people may (or may not!) be still interested, and also (sorry!) there's the bus factor... But on the other hand, in the classic FLOSS tradition, it's also 100% your decision and you have the full right to do any way you like!
fuhsnn 18 hours ago [-]
Don't understand why you were downvoted. An untested C compiler is simply worthless.
nananana9 16 hours ago [-]
The 2026 brain simply cannot comprehend recreational programming.
fuhsnn 16 hours ago [-]
Well, I happen to have been recreationally maintaining a hobbyist C compiler for three years, adding tests is part of the fun.
gaigalas 14 hours ago [-]
You want to know what kinds of programs it can run, right?
shell.c is a shell interpreter written for c89cc.sh. It can do the full self-hosting ouroboros:
- c89cc.sh can compile shell.c
- compiled shell.c via c89cc.sh can run c89cc.sh
It's not a full blown battle tested shell interpreter yet, but I'm working on it.
This file is part of the bootstrapping setup I'm working on for very early (pre tinyc) bootstrap from source in x64 machines and it is by far the most complicated program c89cc.sh can compile.
fuhsnn 13 hours ago [-]
Thanks, that actually look like a very solid baseline to start things with. Are you aware of onramp[1]? They use a custom VM to base compiler and shell on, it's extra steps, but could be more flexible long term.
One of the VMs I wrote for Onramp is in POSIX shell [1]. This was intended to make C bootstrappable on any POSIX system of any architecture with nothing besides the shell. Unfortunately it's about 100,000x too slow to be useful. It's also at least as complicated as a machine code VM. I've since mostly abandoned the POSIX shell idea.
Onramp does have a very simple C89 VM though, and its purpose is for bootstrapping modern C on systems that have only a basic C compiler [2]. So this c89cc.sh could in theory work. I tried it and unfortunately it doesn't quite compile yet (and doesn't give a comprehensible error message either.) Even if it worked, c89cc.sh only compiles to x86_64 ELF, and it's way more complicated than the x86_64 ELF machine code Onramp VM [3].
This has been a bit of a recurring theme with Onramp: anything I've tried to get away from the initial machine code stages ends up being more complicated than handwritten machine code. Still, it's nice to have a lot of different ways to bootstrap. I love seeing projects like this and I'm glad to see more people taking bootstrapping seriously.
I love this as a novelty, and it could be useful for bootstrapping a system that’s had a shell cross-compiled to it.
Thinking about this in the context of a job I used to do, security on shared hosting environments, it gives me a bit of a shiver. There are reasons compilers aren’t available to normal users on those.
uecker 14 hours ago [-]
I am tempted to click the "report abuse" link ;-)
_ache_ 18 hours ago [-]
I'm tempted to execute it, but it may as well be shellcode I couldn't tell.
wengo314 12 hours ago [-]
if one could boostrap tcc with it, then it might be a viable tool.
jey 19 hours ago [-]
It targets x86-64/ELF? I thought it would target `sh` to be portable?
It's an incomplete idea from around a year ago. The approach taken here (aliases as macro-like evals, AST generation using shell variables) became the backbone for the BNF parser generator.
This one is much simpler to understand. Simpler grammars tend to produce parser code that looks more like this one.
gaigalas 15 hours ago [-]
Yes! The main parser and emitter come from a BNF parser generator, also written in portable shell (to be released though).
JackSmith_YC 16 hours ago [-]
[dead]
tho2u3i4o23497 15 hours ago [-]
Node stuff atleast "works" - you've not seen real dependency hell until you've seen the horrible world of the Python ML ecosystem.
yetihehe 13 hours ago [-]
> Node stuff atleast "works"
As someone just starting with complicated node based project, that "works" for Python ML and Node is very close together and very far from 'just works'.
stared 14 hours ago [-]
uv solves most of that.
Before, it was a mess.
self_awareness 14 hours ago [-]
"Claude please generate me a C compiler in bash"
I mean, today it's possible to generate it in Tcl, Elisp, Windows BAT, Powershell.
The effort is just 1 prompt.
The WHY question is much more important today -- "because I can" no longer makes sense, because we all can do much, much more with minimum effort today than before LLMs.
gaigalas 14 hours ago [-]
Here's a prototype parser from 10 months ago, when this was not possible yet:
Yes, c89cc.sh was definitely AI-assisted. However, I do carry extensive knowledge of the portable shell that was essential for the AI to complete it.
You'll find tricks inside c89cc.sh that don't exist anywhere, except in other code from me (like the ksh93 fix for local dynamic scoping or the alias/macro read -n1 polyfill).
The WHY is pretty obvious: I want to show that the portable shell is not a toy.
fkoep 10 hours ago [-]
>The WHY is pretty obvious: I want to show that the portable shell is not a toy.
What does that mean? You sat down with the goal of showing that a decades old scripting environment can be used for large projects in production, with all its obscure hacks? I'd say it's more a novelty project made for the fun of it - and that's fine, it's a cool project.
It would be pretty interesting to read a blog post about the making-of: How to write a compiler in portable shell, what parts could be automated and where LLM-coding fell short, what rare tricks were applied, etc...
gaigalas 9 hours ago [-]
Most people think shell is just bash, and portability is impossible.
It is also common sense that shell scripts are just glue code, and it's impossible to do anything else with it.
I think they're wrong. Never said one should use this to write large production programs though.
The hacks I'm using are no different than JavaScript polyfills (set once, makes a feature more uniform). It's actually a clean design, the bulk of the program is POSIX shell.
self_awareness 10 hours ago [-]
I didn't mean to imply that you're not capable doing it without LLM. I believe you.
The point I'm trying to make is that the rest of us that don't know bash that well is capable of doing it as well.
This is the new reality we all need to adapt to.
oybng 9 hours ago [-]
If you can't discern between bash and shell, or even read the title, then it's certainly not for you
gaigalas 9 hours ago [-]
Just bash? Sure. Anyone can do just bash.
anthk 6 hours ago [-]
LLM's can barely generate valid Subleq code.
Rendered at 21:27:03 GMT+0000 (Coordinated Universal Time) with Vercel.
This project reminds me of:
The Design of a Self-Compiling C Transpiler Targeting POSIX Shell - https://dl.acm.org/doi/10.1145/3687997.3695639
It's not just a toy or a fun hobby project, there's potential for practical use as a step in bootstrapping an entire software stack from human-verifiable artifacts.
See also pnut.sh, but the server seems to be down at the moment.
Otherwise, something special with POSIX shell is its large number of independent implementations, making it the ideal starting point for diverse double-compilation (https://arxiv.org/abs/1004.5534). The idea is to bootstrap a toolchain from multiple compilers (shells in this case), and the result compared to verify that no shell introduced a trusting trust attack.
https://gist.github.com/alganet/1513d7b6abef5c1a53a324d897c3...
Ouroboros self-hosting. They can self-host one another.
The idea is to make shell.c compile from an even simpler C compiler, such as M2-Planet:
https://github.com/oriansj/M2-Planet
Let me remind you that current stage0 bootstraps tinyc from mes, which is an interpreted lisp. It's not that different from the shell architecturally.
The current stage0 also features kaem as one of the first dependencies. kaem is, in fact, a simpler version of the bourne shell.
It's always a tower. You'll never get a one single clean dependency pass in bootstrapping.
Usage:
printf 'int main(){puts("hello");return 0;}' | sh c89cc.sh > hello
chmod +x hello
./hello
If you want something one would actually use, try my project tuish:
http://github.com/alganet/tuish
You can use what I use: https://github.com/alganet/shell-versions
It's a container with lots of shells that you can test. Like esvu but for the shell.
Might have a little outdated docs, hit me with an issue if you use it and face any problems (I'm also the author).
When you answer that, same answer. If you can't imagine any answer for that, then the answer won't be convincing or make sense even if anyone tried to articulate it. Which is fine. Everyone doesn't have to find meaning in the same things.
Or better yet, awk? awk is as old and ubiquitous as sh itself, on every machine even ancient ones that don't even have a compiler because that was a paid extra. Only unlike sh it's actually a full normal programming language that can do basically everything the shell can do only in a far more readable and sane way instead of using wierd expansions and abusing the command line parser to achieve functions it doesn't have overt functions for. Just write directly in awk the same way you would in say python or js. If you have sh, especially if you also have the userland you are talking about, then you have awk. It's part of that userland in a way that gcc is admittedly not.
More in your vein actually, when I do things like this I pick yet a different ideal goal than either you or the author. I avoid all externals (and even child shells) to whatever extent possible, but I do use bash for all its' worth. Every possible intentional or hack bashism. Require bash, but leverage bash to within an inch of it's life and require nothing else.
But this project tageting more portable code that doesn't require bash is really cool and valuable. Even though it's not a standard I personally shoot for even when I am specifically shell-golfing.
There are probably as many different points along the spectrum to draw the line as there are individual developers, each with some actually reasonbable argument to justify that particular place to draw the line.
To me using grep and sed and tr and ls and cat etc etc when I don't need them is just unsatisfying, inelegant, uninteresting.
If you are in bash or ksh93 or zsh, you don't need all kinds of things like basename, dirname, cut, tr, wc, nor some of the more powerful stuff either most of the time. I have a shell function that uses the built-in read combined with a named pipe file created in tmp to make a sleep that doesn't need /bin/sleep. Why bother? because it's awesome. And usually the only times I need to use sleep it's in some rapid short duration polling loop that really is better if you don't have to fork & exec & teardown on every iteration. It's bad enough to be polling like that in the first place. And it just doesn't matter how "probably all the externals will be there", not using them is even better. And these days a lot of once-common "userland" is no longer common or installed by default. A script that never tries to run dos2unix never cares that it's not installed, or that the bsd version behaves differently, or the mac version is stupid old, etc.
That said, using larger sed or awk programs instead of ad-hoc calls for small snippets would perhaps be net-positive for performance and readability.
I'm currently working on very strict bootstrap scenarios in which sed and awk might not be available, but a shell might be (if I'm able to write it). It is possible that in such scenarios, the fist send and awk versions will be shell-written polyfills anyway.
This assumes the executed program is as fast or slower than the caller.
The main parser and emitter are BFN-generated (that's why they look so mechanical). The BNF parser generator is also written in portable shell (I posted another gist with a preview of it in another thread).
All modules have comprehensive tests, but it is still lacking documentation and not ready for prime time!
Look at this one:
https://gist.github.com/alganet/1513d7b6abef5c1a53a324d897c3...
shell.c is a shell interpreter written for c89cc.sh. It can do the full self-hosting ouroboros:
- c89cc.sh can compile shell.c
- compiled shell.c via c89cc.sh can run c89cc.sh
It's not a full blown battle tested shell interpreter yet, but I'm working on it.
This file is part of the bootstrapping setup I'm working on for very early (pre tinyc) bootstrap from source in x64 machines and it is by far the most complicated program c89cc.sh can compile.
[1] https://github.com/ludocode/onramp
One of the VMs I wrote for Onramp is in POSIX shell [1]. This was intended to make C bootstrappable on any POSIX system of any architecture with nothing besides the shell. Unfortunately it's about 100,000x too slow to be useful. It's also at least as complicated as a machine code VM. I've since mostly abandoned the POSIX shell idea.
Onramp does have a very simple C89 VM though, and its purpose is for bootstrapping modern C on systems that have only a basic C compiler [2]. So this c89cc.sh could in theory work. I tried it and unfortunately it doesn't quite compile yet (and doesn't give a comprehensible error message either.) Even if it worked, c89cc.sh only compiles to x86_64 ELF, and it's way more complicated than the x86_64 ELF machine code Onramp VM [3].
This has been a bit of a recurring theme with Onramp: anything I've tried to get away from the initial machine code stages ends up being more complicated than handwritten machine code. Still, it's nice to have a lot of different ways to bootstrap. I love seeing projects like this and I'm glad to see more people taking bootstrapping seriously.
[1]: https://github.com/ludocode/onramp/blob/develop/platform/vm/...
[2]: https://github.com/ludocode/onramp/blob/develop/platform/vm/...
[3]: https://github.com/ludocode/onramp/blob/develop/platform/vm/...
Thinking about this in the context of a job I used to do, security on shared hosting environments, it gives me a bit of a shiver. There are reasons compilers aren’t available to normal users on those.
https://github.com/udem-dlteam/pnut
All the portability tricks are authored by me (I have previous repos from before the AI era that feature them).
The parsing structure is also authored by me (also have pre-AI proof).
What AI did here was the boring, mechanical work.
Just prompt it, all AI as of 2026 is clueless about wide portability shell scripts.
https://gist.github.com/alganet/4dfd501a3377a60f7825901114d6...
Roughly 70% of c89cc was generated from it (parser, emitter).
It can generate parsers for C, ES6 and XML for example (subsets but not missing a lot).
It's still a mess though and I have lots of work to do to a proper release.
But the rest seems easy enough to understand.
Or much easier to backdoor...
https://gist.github.com/alganet/23df53c567b8a0bf959ecbc7b689...
It's an incomplete idea from around a year ago. The approach taken here (aliases as macro-like evals, AST generation using shell variables) became the backbone for the BNF parser generator.
This one is much simpler to understand. Simpler grammars tend to produce parser code that looks more like this one.
As someone just starting with complicated node based project, that "works" for Python ML and Node is very close together and very far from 'just works'.
Before, it was a mess.
I mean, today it's possible to generate it in Tcl, Elisp, Windows BAT, Powershell.
The effort is just 1 prompt.
The WHY question is much more important today -- "because I can" no longer makes sense, because we all can do much, much more with minimum effort today than before LLMs.
https://gist.github.com/alganet/23df53c567b8a0bf959ecbc7b689...
Here is me 10 years ago experimenting on parsing stuff with sed:
https://gist.github.com/alganet/542f46865420529c9bd2
---
Yes, c89cc.sh was definitely AI-assisted. However, I do carry extensive knowledge of the portable shell that was essential for the AI to complete it.
You'll find tricks inside c89cc.sh that don't exist anywhere, except in other code from me (like the ksh93 fix for local dynamic scoping or the alias/macro read -n1 polyfill).
The WHY is pretty obvious: I want to show that the portable shell is not a toy.
What does that mean? You sat down with the goal of showing that a decades old scripting environment can be used for large projects in production, with all its obscure hacks? I'd say it's more a novelty project made for the fun of it - and that's fine, it's a cool project.
It would be pretty interesting to read a blog post about the making-of: How to write a compiler in portable shell, what parts could be automated and where LLM-coding fell short, what rare tricks were applied, etc...
It is also common sense that shell scripts are just glue code, and it's impossible to do anything else with it.
I think they're wrong. Never said one should use this to write large production programs though.
The hacks I'm using are no different than JavaScript polyfills (set once, makes a feature more uniform). It's actually a clean design, the bulk of the program is POSIX shell.
The point I'm trying to make is that the rest of us that don't know bash that well is capable of doing it as well.
This is the new reality we all need to adapt to.