A related idea is to have the LLM quiz you, Socratic-style about a topic of interest. It persists in asking questions at deeper levels until you arrive at the answer yourself. This forces you to think hard about a problem, and this effort helps with understanding, learning and retention. Of course I made a Socratic-quiz skill for this, to use with any coding agent or similar:
For example I’ve used this to better understand counter-intuitive things about diabetes/insulin, dopamine and motivation, Claude’s implementations, etc (to combat so-called cognitive debt).
Strong LLMs are surprisingly good at this type of quizzing, they display a semblance of “theory of mind”.
OtherShrezzing 4 hours ago [-]
How do you deal with context length degradation here?
The harder questions will only arrive when the context is getting full.
danieljacksonno 4 hours ago [-]
If you have it question you for 1M tokens (aka the full length of the Wheel of Time series), I think your own context might get full before the LLMs.
d4rkp4ttern 2 hours ago [-]
Right, even a conservative 200k context length is on the order of 200 pages, which is more than enough context to arrive at an answer.
dchuk 7 hours ago [-]
I’ve been using this general pattern - a custom cli app for deterministic tasks, skills for the agent harness, run the skills in the agent and it produces artifacts for you by using the cli and its own agentic reasoning - a lot lately for work. Things like “give me an executive brief of the activity in these teams backlogs over the last month” and in 5-10 minutes I have a few page doc I can read that is cited with the tickets it analyzed and I don’t have to go bug people or ask them to do yet another task for me, just make sure your backlog is updated and detailed like normal practice. It’s awesome and really fits a useful spot between pure agent usage (which is hard to get consistent results from on repeat tasks) and not having to build/buy a full blown app for every random thing.
derefr 5 hours ago [-]
This approach works well, I agree. But I keep wishing that I could invert it. The architecture I feel like I keep yearning for, is a traditional CLI program that encodes most workflow knowledge/decisions as real code; but which does "just a little bit of coding agent invocation" during one specific workflow step.
Not sure how to accomplish this. Anyone have any suggestions? Are there libraries for this yet? (And how would they even work? It feels like, to do this right, there would have to be some background service that CLI software could expect to interact with via a well-known local IPC socket — similar to how e.g. the docker daemon works. But I'm unaware of any coding agent software/frameworks that expose such an IPC capability...)
didgeoridoo 4 hours ago [-]
I’m building this! It was originally designed for human accessibility for interactive CLIs, but it turned out to be really useful for giving agents the ability to follow structured workflows.
It runs as a background terminal that the agent can observe, and then exposes all interaction options as structured commands that can be run from the foreground CLI which then update the state of the background terminal via IPC. My hope is to establish a sort of “ARIA for terminals” standard to improve accessibility for both humans and agents. Email in profile, ping me if you’re interested in giving it a spin (just have plugins for Inquirer + Commander right now, hoping to broaden to other frameworks & TUIs soon).
devenjarvis 4 hours ago [-]
I reverted this due to impending billing changes, but Claude and most LLM providers to my knowledge do offer a way to directly fire a prompt to the LLM in a "headless" or non-interactive mode. Specifically "claude -p <your_prompt_here>" is the way to do it with Claude Code. It allows for using the agent to do a one-off command with a given structured prompt. Originally Lathe would use this from the Go application to allow you to extend a tutorial directly from the UI without directly interacting with the LLM.
You'd have to exec out, so it's alittle clunkier than an IPC, but I think you could achieve what you want with it.
derefr 2 hours ago [-]
That's almost it, yes.
But in my experience, to actually get where they're going quickly (as opposed to spending hours and hundreds of dollars stumbling around in the dark), coding agents generally need more interactive hand-holding than that. If you just fire off one non-interactive session and wait for it to come to a stop, the problem usually isn't fully+correctly solved at the point at the LLM decides to "finish." And if you then start another non-interactive session to continue the work, the new session will have lost the old session's state/memory/context, and so will stumble through many of the same mistakes / misapprehensions.
What you really want, for a CLI program with a "use coding agent to do X" workflow-step, is for the CLI program to play the role of a human in a temporary durable coding-agent conversation session: prompting the agent; then waiting for it to finish responding (and side-effecting); then either asking the agent itself to evaluate an "am I done yet" predicate with a constrained output syntax; or having the CLI program do its own out-of-band validation of the changes made to the shared state by the agent; where, in either case, if the agent isn't "done yet", then the workflow step must continue poking it — or prompt the human to make a decision on how to proceed (possibly involving providing direct input to the LLM, but this is not ideal; ideally the CLI "abstracts away" the need for the end-user to understand the intricacies of the conversation the program is having with the LLM. Even more ideally, the conversation just whizzes by and the human doesn't have to think about an LLM being involved at all.)
Basically, think of this not as the CLI program saying to an agent "answer me this question" or "edit this file for me", but rather, the CLI program popping open a mini "guided + 99%-of-the-time automated" TUI coding-agent micro-IDE "inside" the workflow, in about the same way that git pops open your EDITOR inside `git commit`.
edot 5 hours ago [-]
Can you give some examples of the deterministic tasks? So in your example, was the deterministic task “fetch this team’s backlog”? And then the LLM parts are “process each backlog” and “combine a summary”?
devenjarvis 6 hours ago [-]
I agree! I want to say I first saw this pattern in some work Simon Willison did (Rodney and Showboat). For certain workflows the pair of Skills + CLI give me a nice balance between the flexibility of LLMs and the consistency of a CLI.
andai 3 hours ago [-]
Hey this is neat!
I was telling my friend the other day. The way you learn programming is by typing code out by hand. And I suggested using LLMs to generate minimal educational examples aligned with his interests and needs.
I've tried the Zed Shaw method to learning programming (just typing out code examples by hand -- doing "studies", the same way you would with music or art). I tested it on a programming language I had been learning for a while and was struggling with. After just a few hours of typing my fluency had skyrocketed.
I realized that in several hours of typing I had written more code than in weeks of study. Because when you don't know a language yet, producing code is extremely slow and error prone. But typing out correct code is relatively straightforward.
So due to changing my approach to "just blindly typing", I got more practice (at least as far as reading and muscle memory goes) in a few hours than the previous few weeks.
Now of course understanding is important too, but it's a separate dimension, and largely comes after memory and fluency in my experience. (Understanding something theoretically and being able to use it are two very different things!)
The general principle here is Stephen Krashen's Input Hypothesis of language acquisition (https://en.wikipedia.org/wiki/Input_hypothesis) which says a baby learns language by just hearing stuff -- just being exposed to inputs -- and that adults can learn the same way too.
And I heard it on the excellent website (now defunct?) All Japanese All The Time, where the author tested the hypothesis on himself by mostly listening to a lot of Japanese and gained fluency in a year.
I have updated the popular /grill-me skill for this exact purpose! I had a very insightful grilling session yesterday on what exactly happens when you try to load an extremely large dataset in pandas, covering everything down to the last detail !
smallerize 3 hours ago [-]
Do you have that version published anywhere?
mobiuscog 2 hours ago [-]
I have been using a similar skill (built over a few iterations) that builds whatever I ask, through a series of milestones, and then creates a full tutorial to follow in markdown and uses zola to turn it into a full static site.
90% of my Claude usage is getting it to write me guides, that I can then spend most of my time following to build the end results.
Keeps the brain healthy and also provides bespoke learning, rather than a generic course off the internet. Definitely a great use of AI.
mmarian 4 hours ago [-]
I think you're tackling an interesting area. I was thinking of something similar for system design prep. I experimented with a couple of series of blog posts - one for designing Twitter, another for WhatsApp: https://prepcommons.com/.
Still, it took a lot more effort than just delivering the initial request. AI makes everyone produce something average but you still need taste to produce something good - I guess this applies to courses too.
Galanwe 2 hours ago [-]
It's very cool, and I can really see myself use that, but not in that form of deliverable.
See the best place I learn and read through materials is when I'm commuting. Far away from a console.
Could you envision a way to deliver this as a web app linked to e.g. an OpenRouter/Anthropic/OpenAI API key?
tatjam 8 hours ago [-]
This is a very cool idea, feels like a sane way to use LLMs in this crazy time! Could be a very good way to break the ice when starting a new project and everything is friction.
devenjarvis 8 hours ago [-]
Yea that’s definitely been a primary usecase for me! Easing the barrier to entry into a new project, and giving me the foundation to take it further on my own once I’m comfortable.
schmorptron 7 hours ago [-]
Cool project! I'll be trying it out. I've been a big fan of throwing whatever sources I have on a new topic i'm trying to get into into a llm "project" and then asking it to teach me, grounded on the actual content to speed things up.
But at the same time, I'm afraid getting everything laid out for you in exactly the way you want will erode some of the understanding you build by going through a primary source directly and figuring things out the hard way. So this having more focus on actually doing stuff by yourself seems right up my alley (while still tending to the LLM induced intellecutal laziness... ) .
Arubis 3 hours ago [-]
For a somewhat hybrid approach here, have a look at https://github.com/DrCatHicks/learning-opportunities — the idea is to be used during “productive work” (so it’s not purely learning-oriented as with your repo here), and to interject as you work to ensure that you learn related concepts as you go.
ramon156 8 hours ago [-]
What I'm more looking at is your own experience with a vibed tool. I cannot really tell from this introduction whether you actually use and like it (you mentioned you use it and sometimes push back, which is a learning strategy of its own?)
Also, I wouldn't say "have another model test the tutorial compiles" a feature, but also I do not expect a fool-proof tutorial from a one-shot, I guess.
Not sure why I would try this over a hand-written promot. Also wondering why ChatGPT Study mode failed, it seemed interesting.
devenjarvis 7 hours ago [-]
I've been using it quite a bit and I like it a lot! You certainly could roll your own prompt for this. The value I'm seeing is in the reusable skill/prompt to structure tutorials in a way that help me think and learn a new concept (rather than Claude just giving me code to copy/paste), and the local UI that makes working through the tutorial much more pleasant than scrolling through Claude's markdown output. Plus tutorial series are persistent so I can easily come back around later with a `/lathe-extend` to explore an extension to a topic/tutorial I'm interested in.
That said, it's been a tool that's been helpful for me personally, but doesn't have to be for everyone! I've never used ChatGPT Study, I'll look into it more. Thanks for sharing!
visarga 4 hours ago [-]
I just use md files to plan questions, track my answers, and implement rehearsals for concepts that need more repetition from claude code. And I start from a good book or documentation as source material, first the agent reads the learning material and structures it for learning.
6 hours ago [-]
f311a 4 hours ago [-]
> If you can find resources to learn something that was written by a human, read that first. But Lathe is here to fill in the gaps when that isn't the case
Well, but it will still serve you content from humans, but without any attribution.
threecheese 6 hours ago [-]
Did you write the skill.md files yourself? I often wonder this; there’s so much text in most skills, and I can’t imagine it’s human generated.
I don’t write my own - I can’t optimize for the models understanding, and so I just give the skill-creator skill an outline and then have it refine until the output is what I want.
TonyAlicea10 4 hours ago [-]
There is of course a degree of true usefulness to this. However I’ve been a technical educator for years and I’ve tried to do lots with LLMs.
Even now, LLMs are terrible educators. They do not make coherent progressive curriculums. They hallucinate details which the student will not have the knowledge to challenge.
If you use an LLM to make a tutorial you will get some benefit for sure, especially if you use it for Socratic sessions based on a corpus of data you provide (like a blog post or documentation).
Don’t expect it to teach you reliably though. It feels good to ask the LLM whatever you want, but if you’re learning a topic you don’t have the instinct to realize when it’s giving you a poorly chosen progression of information or teaching you something flat out untrue.
devenjarvis 4 hours ago [-]
Really appreciate your perspective here! I do _not_ have a background in technical education, and am certain you've used and seen the failure modes of LLMs in this space far more than I have so far.
A few thoughts based on my limited experience building and using lathe:
- Part of the lathe skills are to first find source materials to base curriculum on. It's not foolproof by any means, nor is it a novel approach, but it's helped ground the content in reality more than an open ended prompt usually does (in my experience)
- We're scoped to tutorials, over full blown curriculum. I found having lathe write one part of a tutorial series at a time, over the whole thing at once, usually gave me better results (and is why `/lathe-extend` is a thing)
- To your point about not having the instinct to realize when it's giving me a poor progression or untrue content, my experience is that by actually writing the program the tutorial walks me through, I get definitive proof of if the results are true or not. One of the most impactful (and all too frequent) answers I got as a young programmer was "write a program and find out" and it's still good advice today. Not at all proposing this makes lathe tutorials infallible, but in the context and scope of the project it seems to take the bite out of the worst failure modes here. That said, maybe that implies lathe is most useful and least dangerous in the hands of an experience developer looking to learn a new domain, over someone looking to build foundational technical (and technical learning) skills? I'll think that over!
I'm really curious what your experience would lead you to think about the above though? Are there critical failure modes for LLMs writing hands on technical education I just haven't tripped over yet?
tmountain 4 hours ago [-]
I have been working on a language learning app for myself, and I am using a textbook that I like as the basis for an Anki inspired “learning tree”. This is working pretty well because I can build progressions from the original table of contents.
Sathwickp 4 hours ago [-]
Maybe add voice to it so that it reads the tutorial out loud and listen to it lessons on the move?
4b11b4 7 hours ago [-]
I like the idea and I know you explicitly address this but wonder if still it could search for human made works for you to learn from first
If it does find some, maybe it could supplement them instead of just from scratch
james_marks 8 hours ago [-]
Love this idea, can’t wait to try it. Thank you for sharing!
devenjarvis 8 hours ago [-]
Thanks for checking it out!
28304283409234 7 hours ago [-]
Nice! I do this now locally with LLMS and ollama and my own havky prompts. I could not find if this also supports ollama?
devenjarvis 6 hours ago [-]
Thanks for checking it out! ollama wasn't top of my list for support, just because I don't have a machine powerful enough to run decent local LLMs (I wish I did!). I'll look into it though, nothing here should be locked in to any one LLM, as long as it has the concept of a skill/slash command/reusable prompt.
Someone else asked about Gemini, so I think broader LLM support will be my focus for v0.4.0
mixtureoftakes 7 hours ago [-]
We have notebooklm at home? Is there any comparison between these two, looks nice
devenjarvis 6 hours ago [-]
Thanks for sharing NotebookLM, I hadn't seen that! I'll take a look and add a comparison to the README if it's compelling.
troymc 4 hours ago [-]
In my opinion, the coolest thing in NotebookLM is the podcast-episode-generator. Each one sounds like two people having a conversation. It's fun to listen to a podcast episode about some niche topic (e.g. nuclear isomers, or the Weyl curvature tensor) while I'm cooking or driving.
kaeluka 8 hours ago [-]
great, i'll try this. something like this has on my list and i'm super curious :)
https://pchalasani.github.io/claude-code-tools/plugins-detai...
For example I’ve used this to better understand counter-intuitive things about diabetes/insulin, dopamine and motivation, Claude’s implementations, etc (to combat so-called cognitive debt).
Strong LLMs are surprisingly good at this type of quizzing, they display a semblance of “theory of mind”.
The harder questions will only arrive when the context is getting full.
Not sure how to accomplish this. Anyone have any suggestions? Are there libraries for this yet? (And how would they even work? It feels like, to do this right, there would have to be some background service that CLI software could expect to interact with via a well-known local IPC socket — similar to how e.g. the docker daemon works. But I'm unaware of any coding agent software/frameworks that expose such an IPC capability...)
It runs as a background terminal that the agent can observe, and then exposes all interaction options as structured commands that can be run from the foreground CLI which then update the state of the background terminal via IPC. My hope is to establish a sort of “ARIA for terminals” standard to improve accessibility for both humans and agents. Email in profile, ping me if you’re interested in giving it a spin (just have plugins for Inquirer + Commander right now, hoping to broaden to other frameworks & TUIs soon).
You'd have to exec out, so it's alittle clunkier than an IPC, but I think you could achieve what you want with it.
But in my experience, to actually get where they're going quickly (as opposed to spending hours and hundreds of dollars stumbling around in the dark), coding agents generally need more interactive hand-holding than that. If you just fire off one non-interactive session and wait for it to come to a stop, the problem usually isn't fully+correctly solved at the point at the LLM decides to "finish." And if you then start another non-interactive session to continue the work, the new session will have lost the old session's state/memory/context, and so will stumble through many of the same mistakes / misapprehensions.
What you really want, for a CLI program with a "use coding agent to do X" workflow-step, is for the CLI program to play the role of a human in a temporary durable coding-agent conversation session: prompting the agent; then waiting for it to finish responding (and side-effecting); then either asking the agent itself to evaluate an "am I done yet" predicate with a constrained output syntax; or having the CLI program do its own out-of-band validation of the changes made to the shared state by the agent; where, in either case, if the agent isn't "done yet", then the workflow step must continue poking it — or prompt the human to make a decision on how to proceed (possibly involving providing direct input to the LLM, but this is not ideal; ideally the CLI "abstracts away" the need for the end-user to understand the intricacies of the conversation the program is having with the LLM. Even more ideally, the conversation just whizzes by and the human doesn't have to think about an LLM being involved at all.)
Basically, think of this not as the CLI program saying to an agent "answer me this question" or "edit this file for me", but rather, the CLI program popping open a mini "guided + 99%-of-the-time automated" TUI coding-agent micro-IDE "inside" the workflow, in about the same way that git pops open your EDITOR inside `git commit`.
I was telling my friend the other day. The way you learn programming is by typing code out by hand. And I suggested using LLMs to generate minimal educational examples aligned with his interests and needs.
I've tried the Zed Shaw method to learning programming (just typing out code examples by hand -- doing "studies", the same way you would with music or art). I tested it on a programming language I had been learning for a while and was struggling with. After just a few hours of typing my fluency had skyrocketed.
I realized that in several hours of typing I had written more code than in weeks of study. Because when you don't know a language yet, producing code is extremely slow and error prone. But typing out correct code is relatively straightforward.
So due to changing my approach to "just blindly typing", I got more practice (at least as far as reading and muscle memory goes) in a few hours than the previous few weeks.
Now of course understanding is important too, but it's a separate dimension, and largely comes after memory and fluency in my experience. (Understanding something theoretically and being able to use it are two very different things!)
The general principle here is Stephen Krashen's Input Hypothesis of language acquisition (https://en.wikipedia.org/wiki/Input_hypothesis) which says a baby learns language by just hearing stuff -- just being exposed to inputs -- and that adults can learn the same way too.
And I heard it on the excellent website (now defunct?) All Japanese All The Time, where the author tested the hypothesis on himself by mostly listening to a lot of Japanese and gained fluency in a year.
https://web.archive.org/web/20080705194055/http://www.alljap...
90% of my Claude usage is getting it to write me guides, that I can then spend most of my time following to build the end results.
Keeps the brain healthy and also provides bespoke learning, rather than a generic course off the internet. Definitely a great use of AI.
Still, it took a lot more effort than just delivering the initial request. AI makes everyone produce something average but you still need taste to produce something good - I guess this applies to courses too.
See the best place I learn and read through materials is when I'm commuting. Far away from a console.
Could you envision a way to deliver this as a web app linked to e.g. an OpenRouter/Anthropic/OpenAI API key?
But at the same time, I'm afraid getting everything laid out for you in exactly the way you want will erode some of the understanding you build by going through a primary source directly and figuring things out the hard way. So this having more focus on actually doing stuff by yourself seems right up my alley (while still tending to the LLM induced intellecutal laziness... ) .
Also, I wouldn't say "have another model test the tutorial compiles" a feature, but also I do not expect a fool-proof tutorial from a one-shot, I guess.
Not sure why I would try this over a hand-written promot. Also wondering why ChatGPT Study mode failed, it seemed interesting.
That said, it's been a tool that's been helpful for me personally, but doesn't have to be for everyone! I've never used ChatGPT Study, I'll look into it more. Thanks for sharing!
Well, but it will still serve you content from humans, but without any attribution.
I don’t write my own - I can’t optimize for the models understanding, and so I just give the skill-creator skill an outline and then have it refine until the output is what I want.
Even now, LLMs are terrible educators. They do not make coherent progressive curriculums. They hallucinate details which the student will not have the knowledge to challenge.
If you use an LLM to make a tutorial you will get some benefit for sure, especially if you use it for Socratic sessions based on a corpus of data you provide (like a blog post or documentation).
Don’t expect it to teach you reliably though. It feels good to ask the LLM whatever you want, but if you’re learning a topic you don’t have the instinct to realize when it’s giving you a poorly chosen progression of information or teaching you something flat out untrue.
A few thoughts based on my limited experience building and using lathe:
- Part of the lathe skills are to first find source materials to base curriculum on. It's not foolproof by any means, nor is it a novel approach, but it's helped ground the content in reality more than an open ended prompt usually does (in my experience)
- We're scoped to tutorials, over full blown curriculum. I found having lathe write one part of a tutorial series at a time, over the whole thing at once, usually gave me better results (and is why `/lathe-extend` is a thing)
- To your point about not having the instinct to realize when it's giving me a poor progression or untrue content, my experience is that by actually writing the program the tutorial walks me through, I get definitive proof of if the results are true or not. One of the most impactful (and all too frequent) answers I got as a young programmer was "write a program and find out" and it's still good advice today. Not at all proposing this makes lathe tutorials infallible, but in the context and scope of the project it seems to take the bite out of the worst failure modes here. That said, maybe that implies lathe is most useful and least dangerous in the hands of an experience developer looking to learn a new domain, over someone looking to build foundational technical (and technical learning) skills? I'll think that over!
I'm really curious what your experience would lead you to think about the above though? Are there critical failure modes for LLMs writing hands on technical education I just haven't tripped over yet?
If it does find some, maybe it could supplement them instead of just from scratch
Someone else asked about Gemini, so I think broader LLM support will be my focus for v0.4.0