Captcha, which stands for Completely Automated Public Turing test to tell Computers and Humans Apart (what a mouthful) was first conceptualized in the early 2000s. Websites were already struggling with bots, and a website known as iDrive recognized their inability to ‘see’ the way people did. Paypal, also struggling with bot attacks, began using the same method to keep brute-force attacks from getting in. This is the true essence of Captcha – in 1997, the tech was first described as anything that could differentiate robots and humans, but it wasn’t known as ‘Captcha’ until Paypal got in on it.  It’s more advanced form, the reCAPTCHA, was first coined in 2007 and then absorbed into Google in 2009.

Type These Letters, Hear These Sounds

The original style is becoming easier to get past as AI improves, but it’s still better than nothing. An AI would still leave clues that it was ‘reading’ the letters (or trying to) as it tried to decipher the captcha text from the other random lines and fuzz on screen – Cloudflare, a security website, notes that AI couldn’t do much better than keysmashing and hoping to get in that way when this was first implemented. Now that AI can ‘see’ much better than it used to thanks to endless training to recognize text out in the real world, it gets more and more accurate for every captcha box it sees. Captchas may be algorithmically generated – AIs designed to account for algorithmically generated content in front of them are now capable of deciphering the text, and Captchas are actually sometimes used as tests!

That doesn’t mean they’re obsolete or useless for protection. Just because some people can create AIs that can get past it doesn’t mean that everyone can. Many basic bot creators would much rather go to an easier, less-well-defended site than sit there and try to program an advanced, specific AI for such a simple task. It’s not perfect protection – no protection is.

However, there were problems: unimpaired users often complained that solving them was hard. For visually impaired or deaf users, the captcha might genuinely be unsolvable. Screen readers, a common tool for blind folks who use the internet, allow them to browse the web by reading the page out loud. Because a captcha is a picture, not a text box, the screen reader doesn’t know it’s there. Accessibility software is often simpler than cutting-edge bots (and incapable of reading images), and so they were left behind.

Audio versions are a better solution, but their nature still makes it difficult for screen readers to ‘see’ the play buttons. Besides, audio-to-text AI was already more advanced than picture-to-text because there’s a market for automated captions and auto-transcripted phone calls. Transcription software has been around for ages, and it only gets better at separating noise from information as time goes on – there is almost nothing a captcha could add to the sound to make it hard to interpret for a machine and not a person. As such, these captchas are less common than the fuzzy text and image ones still seen everywhere today.

“I am Not a Robot”

One of the simplest types of Captcha is the “I am Not a Robot” check box. It seems like it could be easy to trick it – and it sort of is, but it’s not a walk in the park. The box works by tracking cursor movement before the user hits the little check box. On a desktop, an AI might jump directly to the box it needs to click, with no hesitation, or it might scan the entire page to locate the box visually if it’s unable to detect the clickable element. That’s not human behavior – people don’t ordinarily need to select the entire page and then contemplate it before clicking the right area. People are also generally unable to jump directly to the clickable elements as soon as the page loads, even if they’re using the tab keys or a touchscreen device.

This was easily one of the most user-friendly kinds of Captcha out there. No reading. No listening. No selecting blurry images or trying to guess at misshapen letters. As such, it was quicker to use than a number of other types of captcha tests were, even though someone with a lot of time and determination could rig something up to bypass it.

Click These Pics

This is the previously impossible barrier that stopped AI dead in its tracks. Training an AI to see and recognize like humans do used to be impossible, but now… now it’s on the horizon. Self-driving cars will need it. Google uses it for reverse-image search. Facebook uses it to find you in friend’s photos. If humanity was going to truly master AI that behaved like people did, AI was going to have to learn how to see; that meant other, outsider AI would also be learning how to see.

The pictures are easy – you get an image separated into 9 or 16 tiles, and you select imagery that matches its request within those tiles. An AI might be able to measure ‘red’ in an image, but the sort of uncomplicated AI that most amateur hackers could crank out wouldn’t know a fire truck from a stop sign. Even if it got lucky that one time, other human users are picking all of the right squares every time – so if it misses even a sliver of the red in another tile, or over-selects, it doesn’t pass and has to go again.

Is that… being used for something?

Google is using Captchas to crowd-source training for their AI. However, doing this meant that Google had access to a metric ton of training time – Wikipedia claims that people around the globe spend 500 hours completing CAPTCHAs every week. Unlike those text and audio ones, pictures with the features they need can’t just be generated indefinitely. If you’ve noticed a decline in picture quality for these captchas, you’re not alone. The quality really is getting worse. The sharper pictures are already trained into the database, so now all that’s left is the blurry, fuzzy, poor-quality ones everywhere else, the ones that weren’t ideal for the initial training.

Now, millions of people every day are telling the computer what a red car or a street sign looks like, instead of just a large handful of researchers. Some of this research is for smart-car training, some of it’s for reverse-image searching, some is purely to advance the state of AI – once AI can recognize things in its environment visually, it can usually behave with less human intervention. And the more training it has, the less likely it is to become confused at a really inconvenient time. Tesla has famously struggled with AI mis-recognizing things, such as the moon, blinking streetlights, and partially graffiti-d signs, but the more training it gets, on worse and worse quality images, the better it will eventually perform.

Sources:

https://www.cloudflare.com/learning/bots/how-captchas-work/

https://support.google.com/a/answer/1217728?hl=en

https://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html

https://elie.net/publication/text-based-captcha-strengths-and-weaknesses/