Synthesizing Obama from Audio

Im Mai bloggte ich über ein damals noch nicht veröffentlichtes Paper zur SigGraph2017, in dem sie eine Methode für generative Video-Obamas aus Tonspuren vorstellten. Zusammen mit Tools wie etwa dem Adobe Voice Generator (mit dem ich wiederum beliebige Sätze mit Hilfe von Sprachproben generieren kann) ist es so möglich, allen Menschen alle möglichen Sätze in den Mund zu legen und alle möglichen Videos damit anzufertigen. Once again: Kiss your reality goodbye.

Das Paper ist jetzt online: Synthesizing Obama: Learning Lip Sync from Audio, hier als PDF.

Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track. Our approach produces photorealistic results.

Ein paar Details aus dem Paper:

Our system first converts audio input to a time-varying sparse mouth shape. Based on this mouth shape, we generate photo-realistic mouth texture, that is composited into the mouth region of a target video. Before the final composite, the mouth texture sequence and the target video are matched and re-timed so that the head motion appears natural and fits the input speech.

Given a source audio track of President Barack Obama speaking, we seek to synthesize a corresponding video track. To achieve this capability, we propose to train on many hours of stock video footage of the President (from his weekly addresses) to learn how to map audio input to video output. ThŒis problem may be thought of as learning a sequence to sequence mapping, from audio to video, that is tailored for one speci€c individual. Œis problem is challenging both due both to the fact that mapping goes from a lower dimensional (audio) to a higher dimensional (video) signal, but also the need to avoid the uncanny valley, as humans are highly aŠuned to lip motion.

To make the problem easier, we focus on synthesizing the parts of the face that are most correlated to speech. At least for the Presidential address footage, we have found that the content of Obama’s speech correlates most strongly to the region around the mouth (lips, cheeks, and chin), and also aspects of head motion – his head stops moving when he pauses his speech (which we model through a retiming technique). We therefore focus on synthesizing the region around his mouth, and borrow the rest of Obama (eyes, head, upper torso, background) from stock footage.

Cellular Automata Cube

Cubes.io: Conways Game of Life als 3D-Spielzeug mit Cubes und Spheres und Schnickschnack als Evolution-Nullpunkt, von wo aus die ganzen…

Neural Network-Faces synched to Music

„My first attempt to map a song made by @kamptweets onto GAN generated proto-faces.“ Bohemian Rhapsody next. The Three Nightingans.…

Doku: Im Netz der Lügen - Falschmeldungen im Internet

Nette Doku von Mario Sixtus über die Psychologie der Fake News. Der Einstieg der Doku ist mir ein wenig zu…

AI-Animations with human Sounds

Google vor ein paar Tagen so: „Yay, wir haben hier 'ne neue AI-based Animation-Tech, hooray!“ (Paper) Hayayo Miyazaki über AI-based…

The Philosophy of the Weird and the Eerie

Right now I'm reading more books at once than should be healthy for me. Two of them are Mark Fishers…

Visual AI-Spaces Auto-Pilot

Ich habe schon ein paar mal über Mario Klingemanns Arbeiten hier gebloggt, derzeit jagt er Neural Networks durch Feedback-Loops und…

Memetische Verantwortlichkeiten des Journalismus am Beispiel von CNN und G20

Letzte Woche trugen sich zwei relativ vielbeachtete Ereignisse zu, an denen sehr deutlich wird, wie die memetischen Bedingungen des Netzes…

Generative Pearls

Cool fractal and generative art by Julien Leonard. I dig his explanation from his about-page: „I create algorithms that connect…

How To fake a Time-Mag-Cover like Trump did

Das Time Magazine hat Trump grade dazu aufgefordert, die gefälschten Time-Cover aus seinen Hotels zu entfernen. The Washington Post reported…

Moarph

Mario Klingemann does some weird shit again with CycleGAN Feedback Loops (Neural Networks feeding their results back to each other).…