Topic on Forum:Wiki Talk

Suggesation: global typo substitution

18
Arromdee (talkcontribs)

The misspelled word "villian" (should be "villain") occurs a lot on both TV Tropes and this Wiki.

Will some admin please do a global substitution for this typo?

If we still have a page about typos, the page should of course be excluded (or rather the page should be edited to break up the word with comments so it won't be found in a global search). I don't recall if TV Tropes had that page either now or as of 2012. Also, "vaudevillian" should not be substituted. The Ambush Bug character Villian the Villain is an intentional invocation of the misspelling, but I can't find a mention of him here.

Long, long, ago on TV Tropes global subttitutions were actually done. But it hasn't been done in a long time. Surely we can do it here (upon manual intervention by an admin). At least there should be a thread for typos where people can post common typos, and admins could look them over and decide which ones are common typos and can be globally substituted without messing anything up. There are a lot more than just "villian"; it's just the most obvious one.

GethN7 (talkcontribs)

That can be done easily with the magic of a bot program.

I'll make sure to gin up a macro for that and get on it sometime tomorrow.

If you notice anything else that need fixed globally, let me know so I can add it to the bot macro.

In fact, draw up a list of what needs to be changed and skipped so I can set up the macro to the best of my ability to avoid screwups.

Arromdee (talkcontribs)

Trying some random words, "recieve" gets 288 hits ("villian" has 475). "sequal" has 85 hits. "beleif" only has 6, but looks safe to change. "embarass" has 24 and "embarassment" another 128. "Posess" has 71 and "posession" another 101. But there's no way I could list everything possible/

I found the page of typos. The_Big_List_of_Booboos_and_Blunders. "villian" is not on there; let me add it. Obviously this page should be excluded from all substitutions.

Don't forget not to change the text on the talk pages....

GethN7 (talkcontribs)

Excellent, that gives me a good place to start.

A comprehensive as possible list of all pages to be excluded would also help so I can tell the bot to skip them if need be.

Arromdee (talkcontribs)

If you intentionally want to use a misspelled word, you should be able to hide it from a global substitution with a comment:

vill<!-- -->ian
GethN7 (talkcontribs)

Generally, the bot hits main pages by default unless I specify other namespaces like talk pages, and aside from the pages you mentioned, I plan to set up the bot to hit all those word you mentioned and I'm adding a rule to match the case of the work so nothing in uppercase is affected (i.e. - misspellings that are intentional proper nouns are left alone)

Looney Toons (talkcontribs)

Geth, I am wary of this -- I dislike automated spell-correction because it inevitably runs into the Cupertino effect.

GethN7 (talkcontribs)

That is a real concern, and the reason I have yet to start the bot was a worry I might fall prey to the Cupertino effect, so can you think of any alternatives that would be less of a potential hazard?

Looney Toons (talkcontribs)

Not really, at least not so far. I'm presuming the bot will work off a dictionary list of misspellings and corresponding corrections, which should avoid the worst of the Effect. (If all it would do would be to run a spellchecker on every page it opened, I would be very vocally against it.) If my presumption is correct, I would suggest we go over the list very carefully to eliminate any potentially ambiguous targets, and hand-correct those.

Beyond that, nope, nothing really to suggest.

Arromdee (talkcontribs)

I don't think the examples above have legitimate uses (except on lampshadings, such as Villian the Villian, and on pages about errors, and in "vaudevillian"). So you won't be "fixing" any legitimate uses. If someone types "villian" when they mean "vanilla", your bot would replace the error with another error, but you haven't increased the number of errors since it was an error anyway, and you would have genuinely fixed hundreds of errors in the meantime.

Also, there's no need to ignore uppercase examples. What if the bad spelling starts a sentence? (You won't catch Villian the Villain because 1) it's on a page about errors, which you should skip anyway, and 2) I stuck a comment in the middle, so the bot shouldn't find it.)

Looney Toons (talkcontribs)

Mm. The "vaudevillian" example raises a point. We're going to have to be careful to search for whole words and whole words only.

Arromdee (talkcontribs)

If you do that, you'll miss examples like villians and supervillian.

I honestly can't think of any other legitimate uses, even as part of another word.

Looney Toons (talkcontribs)

Then we spend some time thinking on the list and making sure we have as many compound forms as we think likely.

Labster (talkcontribs)

I'm going to throw out pages like Naruto Veangance Revelaitons and My Immortal where we probably wouldn't want this. Or in any quote that has sic anywhere nearby.

I'm worried about the feature creep of the requirements here, because we're starting to approach the stage where we're trying to use regex substitutions to parse natural language. I'm not real hot in general about the idea, even though it would probably fix more things than it would break.

Arromdee (talkcontribs)

You could handle that by ignoring all pages in the fanfic category.

Really, if you want to get it perfect, we have to have a bot which looks for a word and presents instances of it to the bot user, asking for manual confirmation on each one. Of course it's easy for me to say since I'm not writing or running the bot. But we needn't let the perfect be the enemy of the good.

GethN7 (talkcontribs)

I mostly use AutoWikiBrowser, which has a GUI interface, so I could approve each page edited and which edits are made one by one if need be, albeit it would be rather tedious. Vorticity mostly uses the MediaWiki Bot script provided by Perl, which does not have this feature, though it is somewhat more flexible with exception rules for fully automated editing.

Nerdanel (talkcontribs)

It feels to me that occasional minor typos are less bad than having a very small amount of atypical examples where autocorrection goes all wrong. Quotes with [sic] are a good example.

Nerdanel (talkcontribs)

By a complete happenstance, I just learned that To Aru Majutsu no Index, a work unfamiliar to me, has a character called Princess Villian, and that's apparently not a typo. I repeat and strengthen my call againt careless search-and-replace.