What would be a truly epic application would be their own chat bot to ask about applying edit guidelines. After reading almost all of the guidelines the talkpage debates, even amoung experienced edditors, looked waaaay off. The pattern of revert first make up excuses later seems the worse newbie deterrent possible. This while it should be fine to make mistakes. Many such excuses would get debunked by a bot imediately. It simply wont do any favors. If established editors dont like it they can edit the guidelines.
crazygringo 6 hours ago [-]
> That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source. When a claim fails verification, it’s impossible to tell whether the information is true or not.
This has been a rampant problem on Wikipedia always. I can't seem to find any indicator that this has increased recently? Because they're only even investigating articles flagged as potentially AI. So what's the control baseline rate here?
Applying correct citations is actually really hard work, even when you know the material thoroughly. I just assume people write stuff they know from their field, then mostly look to add the minimum number of plausible citations after the fact, and then most people never check them, and everyone seems to just accept it's better than nothing. But I also suppose it depends on how niche the page is, and which field it's in.
crabmusket 5 hours ago [-]
There was a fun example of this that happened live during a recent episode of the Changelog[1]. The hosts noted that they were incorrectly described as being "from GitHub" with a link to an episode of their podcast which didn't substantiate that claim. Their guest fixed the citation as they recorded[2].
LLMs can add unsubstantiated conclusions at a far higher rate than humans working without LLMs.
EA-3167 3 hours ago [-]
At some point you're forced to either believe that people have never heard of the concept of a force multiplier, or to return to Upton Sinclair's observation about getting people to believe in things that hurt their bottom line.
DrewADesign 2 hours ago [-]
I don’t see why people keep blaming cars for road safety problems; people got into buggy crashes for centuries before automobiles even existed
nullsanity 2 hours ago [-]
Because a difference in scale can become a difference in category. A handful of buggy crashes can be reduced to operator error, but as the car becomes widely adopted and analysis matures, it becomes clear that the fundamental design of the machine and its available use cases has fundamental flaws that cause a higher rate of operator error than desired. Therefore, cars are redesigned to be safer, laws and regulations are put in place, license systems are issued, and traffic calming and road design is considered.
Hope that helps you understand.
DrewADesign 1 hours ago [-]
Is the sarcasm really that opaque? Who would unironically equate buggy accidents and automobile accidents?
gonzobonzo 4 hours ago [-]
The problems I've run into is both people giving fake citations (the citations don't actually justify the claim that's being made in the article), and people giving real citations, but if you dig into the source you realize it's coming from a crank.
It's a big blind spot among the editors as well. When this problem was brought up here in the past, with people saying that claims on Wikipedia shouldn't be believed unless people verify the sources themselves, several Wikipedia editors came in and said this wasn't a problem and Wikipedia was trustworthy.
It's hard to see it getting fixed when so many don't see it as an issue. And framing it as a non-issue misleads users about the accuracy of the site.
3 hours ago [-]
mmooss 5 hours ago [-]
When I've checked Wikipedia citations I've found so much brazen deception - citations that obviously don't support the claim - that I don't have confidence in Wikipedia.
> Applying correct citations is actually really hard work, even when you know the material thoroughly.
Why do you find it hard? Scholarly references can be sources for fundamental claims, review articles are a big help too.
Also, I tend to add things to Wikipedia or other wikis when I come across something valuable rather than writing something and then trying to find a source (which also is problematic for other reasons). A good thing about crowd-sourcing is that you don't have to write the article all yourself or all at once; it can be very iterative and therefore efficient.
crazygringo 4 hours ago [-]
It's not that I personally find it hard.
It's more like, a lot of stuff in Wikipedia articles is somewhat "general" knowledge in a given field, where it's not always exactly obvious how to cite it, because it's not something any specific person gets credit for "inventing". Like, if there's a particular theorem then sure you cite who came up with it, or the main graduate-level textbook it's taught in. But often it's just a particular technique or fact that just kind of "exists" in tons of places but there's no obvious single place to cite it from.
So it actually takes some work to find a good reference. Like you say, review articles can be a good source, survey articles or books. But it can take a surprising amount of effort to track down a place that actually says the exact thing. I literally just last week was helping a professor (leader in their field!) try to find a citation during peer review for their paper for an "obvious fact" in the field, that was in their introduction section. It was actually really challenging, like trying to produce a citation for "the sky is blue".
I remember, years ago, creating a Wikipedia article for a particular type of food in a particular country. You can buy it at literally every supermarket there. How the heck do you cite the food and facts about it? It just... is. Like... websites for manufacturers of the food aren't really citations. But nobody's describing the food in academic survey articles either. You're not going to link to Allrecipes. What do you do? It's not always obvious.
FranklinJabar 3 hours ago [-]
[dead]
ColinWright 8 hours ago [-]
The title I've chosen here is carefully selected to highlight one of the main points. It comes (lightly edited for length) from this paragraph:
Far more insidious, however, was something else we discovered:
More than two-thirds of these articles failed verification.
That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source. When a claim fails verification, it’s impossible to tell whether the information is true or not. For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.
the_fall 6 hours ago [-]
FWIW, this is a fairly common problem on Wikipedia in political articles, predating AI. I encourage you to give it a try and verify some citations. A lot of them turn out to be more or less bogus.
I'm not saying that AI isn't making it worse, but bad-faith editing is commonplace when it comes to hot-button topics.
mjburgess 5 hours ago [-]
Any articles where newspapers are the main source are basically just propaganda. An encyclopaedia should not be in the business of laundering yellow journalism into what is supposed to be a tertiary resource. If they banned this practice, that would immediately deal with this issue.
the_fall 5 hours ago [-]
That's not what I'm saying. I mean citations that aren't citations: a "source" that doesn't discuss the topic at all or makes a different claim.
mmooss 5 hours ago [-]
A blanket dimsissal is a simple way to avoid dealing with complexity, here both in understanding the problem and forming solutions. Obviously not all newspapers are propaganda and at the same time not all can be trusted; not everything in the same newspaper or any other news source is of the same accuracy; nothing is completely trustworthy or completely untrustworthy.
I think accepting that gets us to the starting line. Then we need to apply a lot of critical thought to sometimes difficult judgments.
IMHO quality newspapers do an excellent job - generally better than any other category of source on current affairs, but far from perfect. I remember a recent article for which they intervied over 100 people, got ahold of secret documents, read thousands of pages, consulted experts .... That's not a blog post or Twitter take, or even a HN comment :), but we still need to examine it critically to find the value and the flaws.
abacadaba 4 hours ago [-]
> Obviously not all newspapers are propaganda
citation needed
tbossanova 3 hours ago [-]
There is literally no source without bias. You just need to consider whether you think a sources biases are reasonable or not
snigsnog 4 hours ago [-]
That is probably 95% of wikipedia articles. Their goal is to create a record of what journalists consider to be true.
dang 6 hours ago [-]
Submitted title was "For most flagged articles, nearly every cited sentence failed verification".
I agree, that's interesting, and you've aptly expressed it in your comment here.
chr15m 3 hours ago [-]
People here are claiming that this is true of humans as well. Apart from the fact that bad content can be generated much faster with LLMs, what's your feeling about that criticism? It's there any measure of how many submissions before LLMs make unsubstantiated claims?
Thank you for publishing this work. Very useful reminder to verify sources ourselves!
chrisjj 7 hours ago [-]
So, a small proportion of articles were detected as bot-written, and a large proportion of those failed validation.
What if in fact a large proportion of articles were bot-written, but only the unverifiable ones were bad enough to be detected?
EdwardDiego 6 hours ago [-]
Human editors, I suspect, would pick up the "tells" of generated text, although as we know, there's a lot of false positives in that space.
But it looks like Pangram is a text classifying NN trained using a technique where they get a human to write a body of text on a subject, and then get various LLMs to write a body of text on the same subject, which strikes me as a good way to approach the problem. Not that I'm in anyway qualified to properly understand ML.
Note that this article is only about edits made through the Wiki Edu program, which partners with universities and academics to have students edit Wikipedia on course-related topics. It's not about Wikipedia writ large!
candiddevmike 6 hours ago [-]
I feel like this is such a tragedy of the commons for the LLM providers. Wikipedia probably makes up a huge bulk of their dataset, why taint it? Would be interesting if there was some kind of "you shall not use our platform on Wikipedia" stance adopted.
kingstnap 4 hours ago [-]
Wikipedia having incorrect citations is way older than LLMs. As many other people have pointed out in this thread, if you start pulling strings a lot of what people write starts falling apart.
Its not even unique to Wikipedia. Its really not difficult to find very misleading statements cited through a citation that doesn't even support the claim when you check the original.
acdha 4 hours ago [-]
This is like saying handing out machine guns is no big change because people have been shooting arrows for a long time. At some point volume becomes the story once it overwhelms the community’s ability to correct errors.
ohyoutravel 6 hours ago [-]
I don’t think it’s the providers doing this, it’s the awful users. They’re doing the same thing on GitHub. It’s maddening.
MattGaiser 6 hours ago [-]
It would be random individuals.
arjie 4 hours ago [-]
> That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source.
This happens a lot on Wikipedia. I'm not sure why, but it does and you can see its traces through the Internet as people post the mistaken information around.
When I found the source, the twitter poster was correct! Someone had decided to translate "A hundred years ago, people would have considered this an outrage. But now..." as "this function is an outrage" which honestly is ironically an outrageous translation. What the hell dude.
I had to go find the actual source (not the other 'sources' that repeated off Wikipedia or each other) and then make sure it was correct before dealing with it. A lie can travel halfway around the world...
simianwords 6 hours ago [-]
I find it very interesting that the main competitor to Wikipedia which is Grokipedia is taking a 180 degree approach being AI first.
ktzar 6 hours ago [-]
Didn't know about Grokipedia, I've just opened an article in it about Spain, scrolled to a random paragraph, and the information in it is plain wrong:
From https://grokipedia.com/page/Spain#terrain-and-landforms
> Spain's peninsular terrain is dominated by the Meseta Central, a vast interior plateau covering about two-thirds of the country's land area, with elevations ranging from 610 to 760 meters and averaging around 660 meters
I still stand on not trusting any of what AI spits out, be it code or text. And it takes me usually longer to check that everything is ok than doing it myself, but my brain is enticed by the "effort shortcut" that AI promised.
nl 2 hours ago [-]
I'm not an expert on the geography of Spain, and it's rare that I'd defend Grokipedia but in this case I think it is correct.
Meseta Central mean central tableland. Segovia is on the edge of the mountain range that surrounds that tableland, but often referred to as part of it. This is fuzzy though.
Wikipedia says: The Meseta Central (lit. 'central tableland', sometimes referred to in English as Inner Plateau) is one of the basic geographical units of the Iberian Peninsula. It consists of a plateau covering a large part of the latter's interior.[1]
Looking at the map you linked the flat part is between 610 to 760 meters.
Finally, when speaking about the Iberian Peninsula Wikipedia itself includes this:
> "About three quarters of that rough octagon is the Meseta Central, a vast plateau ranging from 610 to 760 m in altitude."[2]
Grok does cite that claim as being from https://countrystudies.us/spain/30.htm a page in Eric Solsten and Sandra W. Meditz, editors. Spain: A Country Study. Washington: GPO for the Library of Congress, 1988.
The nice thing about grokipedia is that if you have counter examples like that you can provide it as evidence to change it and it will rewrite the article to be more clear.
malfist 2 hours ago [-]
You know what other site you can provide evidence to and change to be more correct?
homebrewer 1 hours ago [-]
I don't ever edit English wikipedia because my English is not nearly up to the standard, and suggestions for improvement (worthwhile IMO) are usually ignored. Grok at least won't ignore you. (I tend to post suggestions to unpopular pages with sparse edit history, which is probably the reason for them going unnoticed.)
bawolff 3 hours ago [-]
> I find it very interesting that the main competitor to Wikipedia which is Grokipedia
Encyclopedia Britannica (the website not the printed book) is the main competitor to Wikipedia and gets an order of magnitude more traffic than grokipedia. Right now grokipedia is the new kid on the block. It has yet to be seen if its just a novelty or if it has staying power but either way it still has a ways to go before its Wikipedia's primary competitor.
Sharlin 4 hours ago [-]
Main competitor? I’m pretty sure that Uncyclopedia is a more relevant competitor to Wikipedia than Grokipedia. Likely more accurate, too.
throwaway5465 4 hours ago [-]
There seems much defensiveness in the comments here along the lines of "not a new thing" and "not unique to LLM/AI".
It seems to deflect, even gaslight TFA.
> For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.
So why deflect that into convenient other pedantry (surely not under the guise tech forums often do so)?
WSo why the discomfort for part of HN at an assertion AI is being used for nefarious purposes and creation of alternate 'truths'?
emp17344 3 hours ago [-]
Astroturfing or marketing, I’d guess. I’ve noticed you’re no longer allowed to say negative things about AI here without significant pushback, and I’d bet this isn’t an organic shift in perception.
malfist 2 hours ago [-]
I've found that generally people reserve down votes for posts that don't add to the conversation, in general, just like we're supposed to do. Its always been down vote city if you happen to criticize political positions that benefit libertarian technologists. But lately anything critical of AI tends to get a lot of down votes. Even on older posts that you can't find on the front page anymore... It feels inorganic
malfist 2 hours ago [-]
There sure are a lot of green names on this post pushing that agenda. Makes you wonder if its astroturfing. And why its nessecary, is AI so fragile it can't let any criticism stand unchallenged?
ks2048 4 hours ago [-]
[flagged]
ragesoss 2 hours ago [-]
lol. would have written something shorter for HN, but the main expected audience for it was Wikipedians.
asyncadventure 4 hours ago [-]
[dead]
HPsquared 4 hours ago [-]
This goes much further than Wikipedia, it's just particularly visible there.
gwern 3 hours ago [-]
Thanks for the LLM comment, but that's dumb. If the problem really was as bad with humans (it obviously is not), then OP wouldn't've happened:
> For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.
chr15m 2 hours ago [-]
Agree. I'm curious about the human contribution baseline.
vibeprofessor 14 minutes ago [-]
I trust Grokipedia way more, even though it's AI-generated. Wikipedia on any current topic is dominated by various edit gangs trying to push an agenda
Rendered at 04:59:33 GMT+0000 (Coordinated Universal Time) with Vercel.
This has been a rampant problem on Wikipedia always. I can't seem to find any indicator that this has increased recently? Because they're only even investigating articles flagged as potentially AI. So what's the control baseline rate here?
Applying correct citations is actually really hard work, even when you know the material thoroughly. I just assume people write stuff they know from their field, then mostly look to add the minimum number of plausible citations after the fact, and then most people never check them, and everyone seems to just accept it's better than nothing. But I also suppose it depends on how niche the page is, and which field it's in.
[1]: https://changelog.com/podcast/668#transcript-265
[2]: https://en.wikipedia.org/w/index.php?title=Eugen_Rochko&diff...
Hope that helps you understand.
It's a big blind spot among the editors as well. When this problem was brought up here in the past, with people saying that claims on Wikipedia shouldn't be believed unless people verify the sources themselves, several Wikipedia editors came in and said this wasn't a problem and Wikipedia was trustworthy.
It's hard to see it getting fixed when so many don't see it as an issue. And framing it as a non-issue misleads users about the accuracy of the site.
> Applying correct citations is actually really hard work, even when you know the material thoroughly.
Why do you find it hard? Scholarly references can be sources for fundamental claims, review articles are a big help too.
Also, I tend to add things to Wikipedia or other wikis when I come across something valuable rather than writing something and then trying to find a source (which also is problematic for other reasons). A good thing about crowd-sourcing is that you don't have to write the article all yourself or all at once; it can be very iterative and therefore efficient.
It's more like, a lot of stuff in Wikipedia articles is somewhat "general" knowledge in a given field, where it's not always exactly obvious how to cite it, because it's not something any specific person gets credit for "inventing". Like, if there's a particular theorem then sure you cite who came up with it, or the main graduate-level textbook it's taught in. But often it's just a particular technique or fact that just kind of "exists" in tons of places but there's no obvious single place to cite it from.
So it actually takes some work to find a good reference. Like you say, review articles can be a good source, survey articles or books. But it can take a surprising amount of effort to track down a place that actually says the exact thing. I literally just last week was helping a professor (leader in their field!) try to find a citation during peer review for their paper for an "obvious fact" in the field, that was in their introduction section. It was actually really challenging, like trying to produce a citation for "the sky is blue".
I remember, years ago, creating a Wikipedia article for a particular type of food in a particular country. You can buy it at literally every supermarket there. How the heck do you cite the food and facts about it? It just... is. Like... websites for manufacturers of the food aren't really citations. But nobody's describing the food in academic survey articles either. You're not going to link to Allrecipes. What do you do? It's not always obvious.
Far more insidious, however, was something else we discovered:
More than two-thirds of these articles failed verification.
That means the article contained a plausible-sounding sentence, cited to a real, relevant-sounding source. But when you read the source it’s cited to, the information on Wikipedia does not exist in that specific source. When a claim fails verification, it’s impossible to tell whether the information is true or not. For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.
I'm not saying that AI isn't making it worse, but bad-faith editing is commonplace when it comes to hot-button topics.
I think accepting that gets us to the starting line. Then we need to apply a lot of critical thought to sometimes difficult judgments.
IMHO quality newspapers do an excellent job - generally better than any other category of source on current affairs, but far from perfect. I remember a recent article for which they intervied over 100 people, got ahold of secret documents, read thousands of pages, consulted experts .... That's not a blog post or Twitter take, or even a HN comment :), but we still need to examine it critically to find the value and the flaws.
citation needed
I agree, that's interesting, and you've aptly expressed it in your comment here.
Thank you for publishing this work. Very useful reminder to verify sources ourselves!
What if in fact a large proportion of articles were bot-written, but only the unverifiable ones were bad enough to be detected?
But it looks like Pangram is a text classifying NN trained using a technique where they get a human to write a body of text on a subject, and then get various LLMs to write a body of text on the same subject, which strikes me as a good way to approach the problem. Not that I'm in anyway qualified to properly understand ML.
More details here: https://arxiv.org/pdf/2402.14873
Its not even unique to Wikipedia. Its really not difficult to find very misleading statements cited through a citation that doesn't even support the claim when you check the original.
This happens a lot on Wikipedia. I'm not sure why, but it does and you can see its traces through the Internet as people post the mistaken information around.
One that took me a little work to fix was pointed out by someone on Twitter: https://x.com/Almost_Sure/status/1901112689138536903
When I found the source, the twitter poster was correct! Someone had decided to translate "A hundred years ago, people would have considered this an outrage. But now..." as "this function is an outrage" which honestly is ironically an outrageous translation. What the hell dude.
But it takes a lot of work to clean up stuff like that! https://en.wikipedia.org/w/index.php?title=Weierstrass_funct...
I had to go find the actual source (not the other 'sources' that repeated off Wikipedia or each other) and then make sure it was correct before dealing with it. A lie can travel halfway around the world...
From https://grokipedia.com/page/Spain#terrain-and-landforms > Spain's peninsular terrain is dominated by the Meseta Central, a vast interior plateau covering about two-thirds of the country's land area, with elevations ranging from 610 to 760 meters and averaging around 660 meters
Segovia is at 1.000 meters, and so is most of the top half of the "Meseta". https://en-gb.topographic-map.com/map-763q/Spain/?center=41....
I still stand on not trusting any of what AI spits out, be it code or text. And it takes me usually longer to check that everything is ok than doing it myself, but my brain is enticed by the "effort shortcut" that AI promised.
Meseta Central mean central tableland. Segovia is on the edge of the mountain range that surrounds that tableland, but often referred to as part of it. This is fuzzy though.
Wikipedia says: The Meseta Central (lit. 'central tableland', sometimes referred to in English as Inner Plateau) is one of the basic geographical units of the Iberian Peninsula. It consists of a plateau covering a large part of the latter's interior.[1]
Looking at the map you linked the flat part is between 610 to 760 meters.
Finally, when speaking about the Iberian Peninsula Wikipedia itself includes this:
> "About three quarters of that rough octagon is the Meseta Central, a vast plateau ranging from 610 to 760 m in altitude."[2]
[1] https://en.wikipedia.org/wiki/Meseta_Central
[2] https://en.wikipedia.org/wiki/Iberian_Peninsula
The nice thing about grokipedia is that if you have counter examples like that you can provide it as evidence to change it and it will rewrite the article to be more clear.
Encyclopedia Britannica (the website not the printed book) is the main competitor to Wikipedia and gets an order of magnitude more traffic than grokipedia. Right now grokipedia is the new kid on the block. It has yet to be seen if its just a novelty or if it has staying power but either way it still has a ways to go before its Wikipedia's primary competitor.
It seems to deflect, even gaslight TFA.
> For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.
So why deflect that into convenient other pedantry (surely not under the guise tech forums often do so)?
WSo why the discomfort for part of HN at an assertion AI is being used for nefarious purposes and creation of alternate 'truths'?
> For most of the articles Pangram flagged as written by GenAI, nearly every cited sentence in the article failed verification.