Posted on February 7, 2022 in Technology

V-Tubers: The Virtual Youtuber

You might have seen videos on Youtube’s front page for what looks like anime characters playing games. What’s the deal?

The Human Ones

We all know fans can be insane. A fan fatally shot Selenas at a concert. A group of teenaged fans targeted celebrities to steal from. Fans surround famous TikTokers’s houses and park in the street, hoping to get a picture or video of them for the app. In Japan, idols are very reluctant to date, because the insane idol culture means that male fans see them as future girlfriends, and a real boyfriend would mean they were ‘cheating’. Superfans seem to think they ‘own’ celebrities. As such, it’s kind of dangerous to actually be out in the wild as a celebrity.

A solution? Make sure people don’t know what you, your house, or your room looks like, and it makes you harder to find. Software can be used to superimpose a 2-D character over a 3-D person, and have it follow their movements. The real person never actually appears on screen, but their facial expressions and gestures are still caught on screen via their avatar. Win-win – the streamer gets to livestream their reaction to their game anonymously.

However, obscuring one’s real identity isn’t the only reason they’re in use. Some streamers use them because they’re fun and colorful, others use them because they can be used to interact with chat without actively interacting with chat – text can scroll across blank spaces on virtual wings or T-shirts. Virtual confetti can rain down on the virtual streamer with some trigger from chat, with no mess to clean up. Sometimes, the person has appeared live before, but just doesn’t want to dress up for their stream – the V-Tuber version of themselves is always perfectly dressed!

The first one, Kizuna Ai, broke ground when she first began streaming. Motion-Capture tech used to be for movies only, as it was prohibitively expensive, and usually required special kinds of suits.

Motion Capture

If you were around for the filming of The Hobbit, you might remember that video of Benedict Cumberbatch flailing around on the ground in a skintight suit covered with white dots. That was the motion capturing process. They used that footage to rig to the face of Smaug, the villain of the story.

But why?

CGI artists would eventually hit a wall if they were to only make things move by hand. Yes, in the short term, doing it manually looked better (and was faster) than motion capturing, smoothing the capture out, rendering, adding in shadows, etc. However, in the long term, motion capture provides a much more realistic experience at a fraction of the cost and time of doing it the old way, especially as models got more and more detailed.

It also caught key parts of human expression and human movement better. Grimacing has many other, smaller facial movements than just the mouth turning downwards, for example. The artist used to have to move all those little details by themselves, and then repeat that for each expression or word, over and over. The other option was an uncanny-valley creation, or one that felt flat – there just wasn’t another way before motion capture.

When filming The Hobbit, Benedict just had to make his expressions into a camera, and then the computer could use key points of the human face to connect to key points of Smaug’s face. It could register his ‘skeleton’ in the footage with those dots on his suit, and use it to create a functioning, moving Smaug shell that followed along. The computer just has to be told where to attach the dots on his suit to the Smaug shell, and Voila!

Science World compares it to three dimensional rotoscoping. Over time, facial recognition software has gotten much better. The Virtual Youtuber doesn’t even need to be wearing a suit for the virtual model to work anymore. It simply understands what a face looks like now, which is incredible. The rigs that streamers use can understand facial expressions, and as long as you tell it where the eyebrows and mouth are, it can mimic them in the virtual shell. This allows for incredible freedom when designing the character – if you want your character to have a tail, all you have to do is tell it what the tail reacts to. Wings? Same deal, you can attach them to your arms’ movements if you want, and they’ll move when you move. Some programs understand clothing physics, and can move capes according to arm movements.

Many programs are in use today[HYPERLINK V-TUBER WIKI]. CodeMiko on Twitch uses the Unreal Engine software, a program used widely by game studios. FaceRig and Animaze are also popular choices, but freeware programs exist as well. It’s entirely possible to make yourself into a V-Tuber with a little elbow grease, and a willingness to work with the models.

An Opinion: V-Tubing is Friendlier than Virtual Influencing

I like V-Tubers. I don’t like Virtual Influencers. They arrived with a kind of smugness, from both their creators and assorted news outlets: “We’re winning. We’re totally funnier and hotter and more interesting than real people.” Yeah. That’s… not really a revelation. Of course an entire team of people, none of which have to actually appear in front of the camera, is going to be more successful at being hot than a real person. Lil Miquela doesn’t have pores or acne or feelings. She is a CGI’d doll that doesn’t have to actively respond to the environment like a V-Tuber rig does. The whole draw of influencers is that they create the illusion that attractive people exist – real people will photoshop themselves too, but normally they have the decency to hide it.

Meanwhile, V-Tubers have the opposite approach. “We all win. Let’s have fun together with this system.” When people can’t show their faces, they can wear a suit that shows their expressions, allows them to interact with chat, and allows them to communicate nonverbally where they otherwise couldn’t. The rig allows them to connect more organically to their audience, not take advantage of them. They were never meant to replace real people – they’re mostly anime-like characters with big eyes and big heads. The person behind the mask is still playing the games, and talking, too; Lil Miquela barely ever has to ‘appear’ for her audience. 90% of her interaction boils down to text that someone else writes and pictures someone else makes. Meanwhile, a V-Tuber is actually behind the screen. A V-Tuber is ultimately a real person with a tool, not a tool being used to replace a real person.  

Sources:

https://www.theguardian.com/lifeandstyle/2009/oct/27/lindsay-lohan-paris-hilton-robbed

https://www.engadget.com/2014-07-14-motion-capture-explainer.html

http://www6.uniovi.es/hypgraph/animation/character_animation/motion_capture/history1.htm

https://www.sportskeeda.com/esports/what-codemiko-really-like-off-camera

https://virtualyoutuber.fandom.com/wiki/List_of_VTuber-related_software_and_resources