Posts Tagged


Bad AIs Eat Their Own Data

Elizabeth Technology December 10, 2021

Poorly optimized algorithmic content is frustrating for users, in more ways than one.

Ads (For Items)

It’s a new, somewhat dystopian warning: look for gifts in Incognito Mode so the ads don’t give away what you were looking at. Unfortunately, in a world run by websites that want you to make an account for your purchase, Incognito Mode is less helpful than it used to be.

Websites take notice of what you look at and buy, and then they juggle that into a measure of intent – are you actually planning to buy X item? How many times did you check it, and how long did you look at the listing? Did you look at other listings like it? Did you message the shop owner, or ask a question? Did you ‘heart’ it? If you did, it’s going to recommend more proportionally to how much you interacted with said item. But what about gifts, you may ask? How does the algorithm know I’m not buying this nurse-themed cup and this teacher-themed lanyard for myself?

Turns out any website using Google tools to track engagement knows what data to leave out in the long-term – they’re gathering so much data that it’s not really a loss! Given enough time to read your patterns, they’ll be able to figure out you’re done looking and will squirrel that knowledge away for the Gift Finder stuff (or whatever Google does with all of the data it stores on you) later. That’s… creepy, but not necessarily worsening your experience.

But what About the Ones that Aren’t as Optimized?

What is worsening the user experience is a lack of understanding context by other, less developed and less conscientious algorithms. Google Ads was notorious for following you with an item you looked at once before their target-testing showed users didn’t like it, and it was prone to mistakes anyway; companies following Google as an example didn’t always move on when they figured that out, though. Target sending coupons out for baby carriers and bottles came across as gauche, even when it was right – you hope nothing bad ever happens, but the first trimester for a pregnant woman can be very scary, which is why it’s tradition to hold off until the second trimester to start sharing that info. Imagine a company butting in with a mailed coupon and effectively telling your household that you’re pregnant before you get to!  

And where ‘haunting’ a user with an item they glanced at is still popular, it can make it tough for users to go back to casual browsing without that item appearing, making a website less appealing to casually visit. For example, Etsy – Etsy does not seem to be able to distinguish between items you’d buy once, like musical instruments or coffee tables, and items you’d buy over and over, like soap and other consumables. As a result, if you buy an instrument off Etsy, you don’t necessarily get ads for items related to that instrument – you just get ads for more Instruments. Take these screenshots of my Etsy front page:

This was immediately after I bought an instrument from the shop OrientalMusic, and if this was candles or snacks or something, showing me more stuff from the same vendor would be reasonable – as it is, I can’t window-shop for stuff Etsy thinks I might like because all it thinks I might like right now are more instruments.

“Shuffle” and Spotify

Spotify allows its users to make playlists of songs, but it also attempts to generate separate playlists for the user. “Discover Daily” and Discover Weekly” are designed to show the user new (or new to them) music that they might like. “Release Radar” aims to get you to new songs from other bands in your playlist. And then there’s the “On Repeat” playlist, which is meant to play you the songs that you’ve heard most often.

 The obvious issue with that: if you’re a free listener, Spotify decides which songs you’ve heard most often. If you’re a mobile listener on the free plan, you don’t have the option to not shuffle on the playlists you make, so the algorithm determining what song you’re going to listen to next is also ultimately deciding the On Repeat playlist, not you. The other playlists also learn that you like those same songs more, and Spotify’s algorithms scramble to provide you recommendations based off of the songs you like the most… the songs it thinks you like the most, which aren’t songs in the playlist but are instead songs you listened to, which Spotify decided.

Effectively, Spotify is feeding itself its own data, not yours!

Even worse, the shuffle function isn’t truly random – it’s run on an algorithm too. True randomness would be a saving grace for “On Repeat” – if you have a song in multiple playlists that you listen to often, statistically, it’ll pop up in On Repeat before songs you only have in one. That is, if it were actually random – unfortunately, it’s also decided by an algorithm. If you’re getting the same three or four songs every time you start a playlist, and the same handful the majority of the time afterwards, even with plenty of other songs in the list, that’s not a coincidence.  

OneZero says that Spotify divides its functions into exploit and explore, and when it’s trying to exploit, it’s easily tricked into a feedback loop of the same music you hear all the time. Explore is in the same boat, but it uses other people’s data to suggest songs that Listeners of X liked – leading to the same conclusion every time you open the Discover playlist. If you didn’t like those songs last time, it doesn’t care – it’s recommending them again to you now because Listeners of X liked it, and you listened because the algorithm put it first in line in shuffle, which leads to it thinking you like X a lot. says that it can get itself so stuck on what it thinks you want that trying to break out and get new recommendations in your Discover playlists is better done on a fresh account. Yikes.

Youtube Recommended

Youtube’s recommended page is usually pretty good at picking up what you’d probably want to watch… as long as it has some history about you first, and also as long as you don’t stray too far from what you normally consume. Countless Youtubers have filmed themselves opening Youtube in an incognito window so they can show how few videos it takes to get into some crazy conspiracy theory videos – turns out the Flat Earth is never more than five or ten clicks away! A phenomenon that some noted was that new accounts who didn’t have any other data would get funneled into a rabbit hole once Youtube had the slightest smidge of data about them – and when conspiracy theory videos have high engagement (i.e lots of comments arguing) and enough run time for ad breaks, they’re considered above average content. Wonder why Youtube is putting those little Context bars below videos with sensitive topics now? That’s because it was forced to reckon with the algorithm’s tendency to feed misinformation to newcomers and people who ‘did their own research’ right into believing the Earth was flat and lizard people were real.


Small Sites Vs. A Big Internet

Art projects


Some little art project websites deliberately avoid indexing their page, so it’s well-hidden from traffic. Web development classes, modern art classes, and all sorts of other classes will ask students to make something online. They don’t necessarily want those websites getting shared outside of the class. Keeping a page un-indexed makes it much harder to stumble upon, but it’s not a perfect cure – people with the direct link can still post it elsewhere. If they retain it after they leave the class, and remember how cool it was, and then it ends up on Reddit… suddenly it’s a curse, especially if identifying information like names are left on-site.

Websites made as a joke in the first place can turn into a curse too! Youtuber Drew Gooden’s “Hot Dog” website was made as part of an advertising campaign for Wix, but it’s unclear if he actually wants to maintain it. It’s still in his ownership today. This is a unique problem to have! It may cause him more issues to close the site, now that it’s address has been immortalized in videos. Besides, they’ve come to expect the website to function, they’ve bookmarked it, and they’re demanding that their entertainer dance.


Real Retail Hours


Tiny DIY shopping websites sometimes get cratered by that same ‘hug of death’, especially if they accidentally go ‘viral’. Look at TikTok advertisers for example: anyone can post, and because of the app’s algorithm, it’s possible for a creator with no followers to suddenly end up with 100,000+ views on a particularly entertaining video. No ad dollars were spent, the creator was just super funny that day and it spread. This is great! Until their traffic jumps from an expected 500/day to 20,000/day, because their product has gotten much more reach than they could have prepared for. Sellouts are inevitable, frustrated users are also inevitable.

In fact, a broken or slow website will even push away people who did get to make a purchase. Unpleasant shopping experiences steer consumers away from online retailers at a horrifying rate! The same goes for lag – mobile users are unwilling to wait for an item they don’t really want, or don’t really need from that specific store. I could get a hat anywhere, for instance – why should I wait five seconds to get through to a store on mobile when I could go ding a different store? Obviously it’s not that simple, but big websites have resources that little ones don’t, and the especially wishy-washy buyers will be put off by the difference in experiences.

There are ways to handle this, but unfortunately many businesses don’t have the chance to prepare.


Welcome, But…


And then there are websites that are really hoping for growth, and it suddenly happens. It’s rare to have a site blow up overnight – most repeat visits are the result of hard work and consistent effort to capture the visitor’s attention. Unfortunately, in this era of social media, it’s very easy to accidentally blow a website out of the water. Yay, Growth! Turns into Oh No, They Aren’t Stopping. The server for the website crashes, and a lot of potential viewers are shut out from it. If the website’s lucky, the interested folks will bookmark the page and come back, so they’ve got a better distribution the second time around.

Some websites go offline a few hours after Reddit’s discovered them, to recover. The ‘hug of death’ is a well-known phenomenon – nobody’s DDoSing the website on purpose!


Lonesome Town


Single-person websites are often hoping to not be discovered by somewhere huge. Think about it: if they haven’t paid for advertising, if they don’t get revenue from hosting ads, if they don’t sell anything on their site, then they don’t make money from page views. They’re probably not looking for a giant spike in page views out of nowhere, with some exceptions like ‘public service’ projects made by civilians, or ARGs.  

 Tiny websites and tiny forums alike struggle to handle being “discovered” on websites like Reddit, Digg, or Youtube. Famously, a Buffy the Vampire superfan’s website (which I’m deliberately not linking here) was crashed by new visitors after forums made it a spectacle. Sure, the superfan posted a lot – as is their right. The information they posted helped other fans find information about meetups and appearances by the actors. The flood of people showing up on other social media to comment on and harass the single poster was unfortunate, and it could have been avoided if people hadn’t dogpiled. Even deeper, maybe people wouldn’t have dogpiled if the website hadn’t shown up on blogs. The sole commentator, maintainer, and moderator made the website private after people showed up to screw around.


Similarly, small sites get flooded when a big site ‘discovers’ them, and then suffer from community collapse and site breakdown. A forum with 200 or so regular posters isn’t going to be able to moderate new conversations from other, bigger sites – and even worse, newcomers who might have been interested in the topic get the idea that the website’s a total dumpster fire when it’s just understaffed. These sites want traffic, yeah, but they want the right kind of traffic. Well-intentioned traffic. On-topic traffic. If a community behaves itself, there’s no reason to have a team of 20 moderators. People showing up to flame the forum are going to stretch resources thin.

Don’t go spread news about some wacky website on big forums without knowing the site first. The consequences may be greater than you could imagine!