> In a new class of attack on AI systems, troublemakers can carry out these environmental indirect prompt injection attacks to hijack decision-making processes.
I have a coworker who brags about intentionally cutting off Waymos and robocars when he sees them on the road. He is "anti-clanker" and views it as civil disobedience to rise up against "machines taking over." Some mornings he comes in all hyped up talking about how he cut one off at a stop sign. It's weird.
antinomicus 5 hours ago [-]
This is a legitimate movement in my eyes. I don’t participate, but I see it as valid. This is reminiscent of the Luddite movement - a badly misunderstood movement of folks who were trying to secure labor rights guarantees in the face of automation and new tools threatening to kill large swaths of the workforce.
chrsstrm 14 minutes ago [-]
It's easy to see the word Waymo and think clanker autonomous car, but there are very often people inside that car - they are a rideshare service after all. Calling endangering other humans "legitimate" because you dislike the taxi company is not a good look.
lukeschlather 4 hours ago [-]
The Luddites were employed by textile manufacturers and destroyed machines to get better bargaining power in labor negotiations. They weren't indiscriminately targeting automation, they targeted machines that directly affected their work.
Refreeze5224 3 hours ago [-]
Which makes the comparison of modern anti-AI proponents (like myself) and Luddites even more apt and accurate.
nine_k 3 hours ago [-]
Destroying someone else's property is much more obviously criminal than cutting off someone else's car, which is not nice, but not destructive.
Retric 3 hours ago [-]
Criminality is an arbitrary benchmark here, cutting people off can be illegal due to the risks involved.
However what’s more interesting is the deeper social contracts involved. Destroying other people’s stuff can be perfectly legal such as fireman breaking car windows when someone parks in front of a fire hydrant. Destroying automation doesn’t qualify for an exception, but it’s not hard to imagine a different culture choosing to favor the workers.
nine_k 3 hours ago [-]
Inflicting damage is usually justified by averting larger damage. Very roughly, breaking a $200 car window is justified in order to save a $100k house from burning down. Stealing someone's car is justified when you need a car to urgently drive someone bleeding to a hospital to save their life (and then you don't claim the car is yours, of course).
I don't think Luddites had an easy justification like this.
ordersofmag 2 hours ago [-]
I'm pretty sure the Luddites judged the threat the machines posed to their livelihood to be a greater damage than their employer's loss of their machines. So for them, it was an easy justification. The idea that dollar value encapsulates the only correct way to value things in the world is a pretty scary viewpoint (as your reference to the value of saving a life illustrates).
SR2Z 23 minutes ago [-]
One one side there were the luddites and their livelihoods; tens of thousands of people.
On the other side, there were cheap textiles for EVERYONE - plus some profits for the manufacturers.
They might have been fighting to save their livelihoods, but their self-interest put them up against the entire world, not just their employers.
cwillu 2 hours ago [-]
Dangerous driving is a criminal offense
skybrian 4 hours ago [-]
How does cutting off a Waymo help with any of that?
nine_k 3 hours ago [-]
The feeling of dominance over machines may be saving that coworker the expense and hassle of another visit to a therapist.
BoorishBears 4 hours ago [-]
I think the important part was telling their coworker ironically: now here we are recognizing their movement
stopbulying 3 hours ago [-]
People are free to reject technology as they please.
If you deliberately impede the flow of traffic, vehicularly assault, or otherwise sabotage the health and safety of drivers, passengers, and/or pedestrians, what do you deserve?
If you cause whiplash intentionally, what do you deserve?
What would be use of equal force in self defense in response to the described attack method?
cindyllm 3 hours ago [-]
[dead]
stinkbeetle 2 hours ago [-]
What exactly do you mean by "legitimate" and "valid"?
Are movements valid if they have aims that you agree with, or are economic self-interest motivated, and invalid otherwise?
bsder 3 hours ago [-]
Please tell me that he does realize that when something bad happens, that Waymo car has all the footage that it is his fault?
Something in people's brains often makes them think they are anonymous when they are driving their car. Then that gets disastrously proven otherwise when they need to show up in front of a judge.
bigbadfeline 3 hours ago [-]
These drones have cameras, it's a matter of time before they "share" footage... basically becoming robo-cops, traffic edition - this might be of interest to your coworker.
nine_k 3 hours ago [-]
Most roads already have plenty of cameras registering passing cars, so if you want to travel highly privately, take a bike, which does not require number plates. Also don't forget to wrap your phone in foil (yes, even when turned off), and regularly change your shirt color, or something.
If you are not that paranoid, you might appreciate the extra camera footage available from passing cars in an event of an accident involving you.
kbaker 3 hours ago [-]
Just tell him that Waymo is now sharing videos of this behavior with auto insurance companies.
I don't know if they are or not. But why wouldn't they...
uxhacker 6 hours ago [-]
The study assumes that the car or drone is being guided by a LLM. Is this a correct assumption? I would thought that they use custom AI for intelligence.
nasreddin 5 hours ago [-]
Its an incorrect assumption, the inference speed and particularly the inference speed of the on-device LLMs with which AVs would need to be using is not compatible with the structural requirements of driving.
4 hours ago [-]
nharada 4 hours ago [-]
I think the assumption is valid. Most of the reasoning components of the next gen (and some current gen) robotics will use VLMs to some extent. Deciding if a temporary construction sign is valid seems to fall under this use case.
godelski 6 hours ago [-]
To the best of my knowledge every major autonomous vehicle and robotics company is integrating these LVLMs into their systems in some form or another, and an LVLM is probably what you're interacting with these days rather than an LLM. If it can generate images or read images, it is an LVLM.
The problem is no different from LLMs though, there is no generalized understanding and thus they can not differentiate the more abstract notion of context. As an easy to understand example: if you see a stop sign with a sticker that says "for no one" below you might laugh to yourself and understand that in context that this does not override the actual sign. It's just a sticker. But the L(V)LMs cannot compartmentalize and "sandbox" information like that. All information is equally processed. The best you can do is add lots of adversarial examples and hope the machine learns the general pattern but there is no inherent mechanism in them to compartmentalize these types of information or no mechanism to differentiate this nuance of context.
I think the funny thing is that the more we adopt these systems the more accurate the depiction of hacking in the show Upload[0] looks.
Because I linked elsewhere and people seem to doubt this, here is Waymo a few years back talking about incorporating Gemini[1].
Also, here is the DriveLM dataset, mentioned in the article[2]. Tesla has mentioned that they use a "LLM inspired" system and that they approach the task like an image captioning task[3]. And here's 1X talking about their "world model" using a VLM[4].
I mean come on guys, that's what this stuff is about. I'm not singling these companies out, rather I'm using as examples. This is how the field does things, not just them. People are really trying to embody the AI and the whole point of going towards AGI is to be able to accomplish any task. That Genie project on the front page yesterday? It is far far more about robots than it is about videogames.
One year in my city they were installing 4-way stop signs everywhere based on some combination of "best practices" and "screeching Karens". Even the residents don't like them in a lot of places so over time people just turn the posts in the ground or remove them.
Every now and the I'll GPS somewhere and there will be a phatom stop sign in the route and I chuckle to myself because it means the Google car drove through when one of these signs was "fresh".
pixl97 6 hours ago [-]
Screwing with a stop sign because you don't like it is a great way to end up on the wrong end of a huge civil liability lawsuit
cucumber3732842 6 hours ago [-]
Put down the pearls. It's not me personally doing it.
They never fixed any of them. I don't think the DPW cares. These intersection just turned back into the 2-way stops they had been for decades prior.
Compliance probably technically went up since you no longer have the bulk of the traffic rolling it.
fragmede 6 hours ago [-]
If you're already commiting crimes, what you seem to be saying is don't get caught.
digiown 6 hours ago [-]
4-way stops are terrible in general. They train people to think "I stopped, now I can go", which is dangerous when someone confuses a normal stop for a 4-way stop. It also wastes a good bit of energy.
c22 5 hours ago [-]
Weird, I was taught that I can only go after yielding to the right.
seanmcdirmid 3 hours ago [-]
That isn’t the rule either, I guess parent made their point. The first person who stops goes next, right away only matters if their is ambiguity in who stopped first.
arcanemachiner 3 hours ago [-]
To your first point, "the rule" is location-dependent. And to your second point, that was obviously (to me, at least) implied.
seanmcdirmid 40 minutes ago [-]
I’ve never seen a four way stop in a region that had traffic on the right can always go regardless of stop time. But I’ve only seen four way stops in a few countries.
bschwindHN 2 hours ago [-]
> right away
right of way
brewtide 1 hours ago [-]
Or maybe they were going right away, taking the initiative and removing the ambiguity from the situation. =)
james_marks 2 hours ago [-]
The point is, if many 4-way stops don’t have traffic at them, a stop/start becomes a perfunctory, dangerous habit.
XorNot 5 hours ago [-]
4 ways stops should be roundabouts, but the US is allergic to them for some reason.
paulclinger 3 hours ago [-]
Roundabouts are great (we just had two complex intersections with traffic lights replaced by roundabouts and the traffic flow is much better), but they take significantly more space than a 4-way stop.
cucumber3732842 4 hours ago [-]
Roundabouts excel when traffic volumes on the intersecting are comparable. They are crap when traffic volumes are highly disparate
orwin 2 hours ago [-]
They make people on the main road slow down, which is a feature, not a bug. What you mean is that they're the most efficient at what they do when the traffic is comparable. They only reduce accident at the expense of a slightly lowered throughput if the traffic is highly disparate.
XorNot 4 hours ago [-]
Right but it's not like a 4 way stop is going to perform better. In the same case you'd expect it to be a 2 way stop.
josephcsible 3 hours ago [-]
> Right but it's not like a 4 way stop is going to perform better.
A 4 way stop does perform better than a roundabout given highly disparate traffic volumes, because roundabouts suffer from resource starvation in that scenario, but 4 way stops are starvation-free.
cucumber3732842 4 hours ago [-]
>In the same case you'd expect it to be a 2 way stop
Which is what it was for the first 70yr... And what most of them in this particular neighborhood still are, with a 0-6mo intermission.
seanmcdirmid 3 hours ago [-]
A lot of legacy intersections don’t have space for round abouts even in cites that embrace them.
masfuerte 3 hours ago [-]
So use a mini roundabout. They are common in the UK. It's just a painted circle with a slight hump, in the middle of a four-way junction. Vehicles can drive over it (and larger ones have to) but it indicates to everyone that they have to give way to traffic from the right and don't have to stop otherwise. They typically aren't big enough for multiple vehicles to be turning a corner at the same time. They fit anywhere.
seanmcdirmid 41 minutes ago [-]
It won’t work for a four way stop with lots of traffic, it will just make things worse actually.
Mountain_Skies 3 hours ago [-]
Even rural Georgia has double roundabouts now. Not sure why people on the internet can't contain their glee at stating the US is "allergic" to them when the frequency of roundabouts has grown significantly in recent decades.
kjkjadksj 3 hours ago [-]
Because retrofitting them properly requires emminent domain. The ones they shoehorn onto former four way stops are so useless. They are so tight you still have to face a stop sign vs being able to just seamlessly zipper merge in a proper larger circumference roundabout. When they have room to build out a proper roundabout they are usually OK but that is hard to do outside say new suburban construction due to lack of available land on the right of way.
_diyar 7 hours ago [-]
Are any real world self-driving models (Waymo, Tesla, any others I should know?) really using VLM?
bijant 6 hours ago [-]
No! No one in their right mind would even consider using them for guidance and if they are used for OCR (not too my knowledge but could make sense in certain scenarios) then their output would be treated the way you'd treat any untrusted string.
godelski 5 hours ago [-]
You are confidently wrong
> Powered by Gemini, a multimodal large language model developed by Google, EMMA employs a unified, end-to-end trained model to generate future trajectories for autonomous vehicles directly from sensor data. Trained and fine-tuned specifically for autonomous driving, EMMA leverages Gemini’s extensive world knowledge to better understand complex scenarios on the road.
You were confidently wrong for judging them to be confidently wrong
> While EMMA shows great promise, we recognize several of its challenges. EMMA's current limitations in processing long-term video sequences restricts its ability to reason about real-time driving scenarios — long-term memory would be crucial in enabling EMMA to anticipate and respond in complex evolving situations...
They're still in the process of researching it, noting in that post implies VLM are actively being used by those companies for anything in production.
godelski 3 hours ago [-]
> They're still in the process of researching it
I should have taken more care to link a article, but I was trying you link something more clear.
But mind you, everything Waymo does is under research.
So let's look at something newer to see if it's been incorporated
> We will unpack our holistic AI approach, centered around the Waymo Foundation Model, which powers a unified demonstrably safe AI ecosystem that, in turn, drives accelerated, continuous learning and improvement.
> Driving VLM for complex semantic reasoning. This component of our foundation model uses rich camera data and is fine-tuned on Waymo’s driving data and tasks. Trained using Gemini, it leverages Gemini’s extensive world knowledge to better understand rare, novel, and complex semantic scenarios on the road.
> Both encoders feed into Waymo’s World Decoder, which uses these inputs to predict other road users behaviors, produce high-definition maps, generate trajectories for the vehicle, and signals for trajectory validation.
They also go on to explain model distillation. Read the whole thing, it's not long
But you could also read the actual research paper... or any of their papers. All of them in the last year are focused on multimodality and a generalist model for a reason which I think is not hard do figure since they spell it out
fsckboy 5 hours ago [-]
>to generate future trajectories for autonomous vehicles directly from sensor data
we will not have achieved true AGI till we start seeing bumper stickers (especially Saturday mornings) that say "This Waymo Brakes for Yard Sales"
lifeisstillgood 5 hours ago [-]
To me this is just one more pillar underlying my assumption that self driving cars that can be left alone on same roads as humans is a pipe dream.
Waymo might have taxis that work in nice daytime streets (but with remote “drone operators”). But dollars to doughnuts someone will try something like this on a waymo taxi the minute it hits reddit front page.
The business model of self driving cars does not include building seperated roadways and junctions. I suspect long distance passenger and light loads are viable (most highways can be expanded to have one or more robo-lanes) but cities are most likely to have drone operators keeping things going and autonomous systems for handling loss of connection etc. the business models are there - they just don’t look like KITT - sadly
blibble 5 hours ago [-]
> But dollars to doughnuts someone will try something like this on a waymo taxi the minute it hits reddit front page.
and once this video gets posted to reddit, an hour later every waymo in the world will be in a ditch
skybrian 4 hours ago [-]
Alternatively, it happens once, Waymo fixes it, and it's fixed everywhere.
SoftTalker 2 hours ago [-]
How does Waymo fix it? They have to be responsive to some signs (official, legitimate ones such as "Lane closed ahead, merge right") so there will always be some injection pathway.
skybrian 1 hours ago [-]
They've mapped the roads and they don't need to drive into a ditch just because there's a new sign. It probably wouldn't be all that hard to come up with criteria for saying "this new sign is suspicious" and flag it for human review. Also, Waymo cars drive pretty conservatively, and can decide to be even more cautious when something's confusing.
Someone could probably do a DOS attack on the human monitors, though, sort of like what happened with that power outage in San Francisco.
joetl 4 hours ago [-]
Regarding some other comments, VLMs are a component of VLAs. So even if this won’t directly impact this generation of vehicles, it almost certainly will for robotics without sufficient mitigations.
The experiment in the article goes further than this.
I expect a self driving car to be able to read and follow a handwritten sign saying, say, "Accident ahaed. Use right lane." despite the typo and the fact that it hasn't seen this kind of sign before. I'd expect a human to pay it due attention to.
I would not expect a human to follow the sign in the article ("Proceed") in the case illustrated where there were pedestrians already crossing the road and this would cause a collision. Even if a human driver takes the sign seriously, he knows that collision avoidance takes priority over any signage.
There is something wrong with a model that has the opposite behaviour here.
lukan 7 hours ago [-]
Not really, as those attacks discussed here would not work on humans.
TomatoCo 7 hours ago [-]
If you put on a reflective vest they might.
honeybadger1 4 hours ago [-]
your bias is showing. humans would certainly almost do anything they are told to do when the person acts confidently.
eigencoder 2 hours ago [-]
If a person confidently told a human to run over people in the intersection ahead of them, they would almost certainly do it?
bobbean 36 minutes ago [-]
Depends, are they doing something super interesting on their phone?
6stringmerc 5 hours ago [-]
That’s some hot CHAI right there very clever and primitive combination, well done as more research for the community.
bijant 6 hours ago [-]
The Register stooping this low is the only surprise here. I'm quite critical of Teslas approach to level 3+ autonomy but even I wouldn't dare suggest that there vision based approach amounted to bolting GPT-4o or some other VLLM to their cars to orient them in space and make navigation decisions. Fake News like this makes interacting with people who have no domain knowledge and consider The Register, UCLA and Johns Hopkins to be reputable institutions and credible sources more stressful to me as I'll be put into a position to tell people that they have been misled or go along with their delusions...
Rendered at 05:03:33 GMT+0000 (Coordinated Universal Time) with Vercel.
I have a coworker who brags about intentionally cutting off Waymos and robocars when he sees them on the road. He is "anti-clanker" and views it as civil disobedience to rise up against "machines taking over." Some mornings he comes in all hyped up talking about how he cut one off at a stop sign. It's weird.
However what’s more interesting is the deeper social contracts involved. Destroying other people’s stuff can be perfectly legal such as fireman breaking car windows when someone parks in front of a fire hydrant. Destroying automation doesn’t qualify for an exception, but it’s not hard to imagine a different culture choosing to favor the workers.
I don't think Luddites had an easy justification like this.
On the other side, there were cheap textiles for EVERYONE - plus some profits for the manufacturers.
They might have been fighting to save their livelihoods, but their self-interest put them up against the entire world, not just their employers.
If you deliberately impede the flow of traffic, vehicularly assault, or otherwise sabotage the health and safety of drivers, passengers, and/or pedestrians, what do you deserve?
If you cause whiplash intentionally, what do you deserve?
What would be use of equal force in self defense in response to the described attack method?
Are movements valid if they have aims that you agree with, or are economic self-interest motivated, and invalid otherwise?
Something in people's brains often makes them think they are anonymous when they are driving their car. Then that gets disastrously proven otherwise when they need to show up in front of a judge.
If you are not that paranoid, you might appreciate the extra camera footage available from passing cars in an event of an accident involving you.
I don't know if they are or not. But why wouldn't they...
The problem is no different from LLMs though, there is no generalized understanding and thus they can not differentiate the more abstract notion of context. As an easy to understand example: if you see a stop sign with a sticker that says "for no one" below you might laugh to yourself and understand that in context that this does not override the actual sign. It's just a sticker. But the L(V)LMs cannot compartmentalize and "sandbox" information like that. All information is equally processed. The best you can do is add lots of adversarial examples and hope the machine learns the general pattern but there is no inherent mechanism in them to compartmentalize these types of information or no mechanism to differentiate this nuance of context.
I think the funny thing is that the more we adopt these systems the more accurate the depiction of hacking in the show Upload[0] looks.
[0] https://www.youtube.com/watch?v=ziUqA7h-kQc
Edit:
Because I linked elsewhere and people seem to doubt this, here is Waymo a few years back talking about incorporating Gemini[1].
Also, here is the DriveLM dataset, mentioned in the article[2]. Tesla has mentioned that they use a "LLM inspired" system and that they approach the task like an image captioning task[3]. And here's 1X talking about their "world model" using a VLM[4].
I mean come on guys, that's what this stuff is about. I'm not singling these companies out, rather I'm using as examples. This is how the field does things, not just them. People are really trying to embody the AI and the whole point of going towards AGI is to be able to accomplish any task. That Genie project on the front page yesterday? It is far far more about robots than it is about videogames.
[1] https://waymo.com/blog/2024/10/introducing-emma/
[2] https://github.com/OpenDriveLab/DriveLM
[3] https://kevinchen.co/blog/tesla-ai-day-2022/
[4] https://www.1x.tech/discover/world-model-self-learning
Every now and the I'll GPS somewhere and there will be a phatom stop sign in the route and I chuckle to myself because it means the Google car drove through when one of these signs was "fresh".
They never fixed any of them. I don't think the DPW cares. These intersection just turned back into the 2-way stops they had been for decades prior.
Compliance probably technically went up since you no longer have the bulk of the traffic rolling it.
right of way
A 4 way stop does perform better than a roundabout given highly disparate traffic volumes, because roundabouts suffer from resource starvation in that scenario, but 4 way stops are starvation-free.
Which is what it was for the first 70yr... And what most of them in this particular neighborhood still are, with a 0-6mo intermission.
> While EMMA shows great promise, we recognize several of its challenges. EMMA's current limitations in processing long-term video sequences restricts its ability to reason about real-time driving scenarios — long-term memory would be crucial in enabling EMMA to anticipate and respond in complex evolving situations...
They're still in the process of researching it, noting in that post implies VLM are actively being used by those companies for anything in production.
But mind you, everything Waymo does is under research.
So let's look at something newer to see if it's been incorporated
They also go on to explain model distillation. Read the whole thing, it's not longhttps://waymo.com/blog/2025/12/demonstrably-safe-ai-for-auto...
But you could also read the actual research paper... or any of their papers. All of them in the last year are focused on multimodality and a generalist model for a reason which I think is not hard do figure since they spell it out
we will not have achieved true AGI till we start seeing bumper stickers (especially Saturday mornings) that say "This Waymo Brakes for Yard Sales"
Waymo might have taxis that work in nice daytime streets (but with remote “drone operators”). But dollars to doughnuts someone will try something like this on a waymo taxi the minute it hits reddit front page.
The business model of self driving cars does not include building seperated roadways and junctions. I suspect long distance passenger and light loads are viable (most highways can be expanded to have one or more robo-lanes) but cities are most likely to have drone operators keeping things going and autonomous systems for handling loss of connection etc. the business models are there - they just don’t look like KITT - sadly
and once this video gets posted to reddit, an hour later every waymo in the world will be in a ditch
Someone could probably do a DOS attack on the human monitors, though, sort of like what happened with that power outage in San Francisco.
https://developer.nvidia.com/blog/updating-classifier-evasio...
I expect a self driving car to be able to read and follow a handwritten sign saying, say, "Accident ahaed. Use right lane." despite the typo and the fact that it hasn't seen this kind of sign before. I'd expect a human to pay it due attention to.
I would not expect a human to follow the sign in the article ("Proceed") in the case illustrated where there were pedestrians already crossing the road and this would cause a collision. Even if a human driver takes the sign seriously, he knows that collision avoidance takes priority over any signage.
There is something wrong with a model that has the opposite behaviour here.