Tech giant Microsoft recently launched a new artificial intelligence program called “VASA-1,” bragging that the software can “Lifelike Audio-Driven Talking Faces Generated in Real Time.”

The program can be produce hyper-realistic AI-generated animations with just a single photo reference and an audio clip, prompting fears that Microsoft has unwittingly released a deepfaking program upon the broad masses.

According to blog post on Microsoft’s website, the company says:

“We introduce VASA, a framework for generating lifelike talking faces of virtual characters with appealing visual affective skills (VAS), given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness,” Microsoft promotes.

“The core innovations include a holistic facial dynamics and head movement generation model that works in a face latent space, and the development of such an expressive and disentangled face latent space using videos. Through extensive experiments including evaluation on a set of new metrics, we show that our method significantly outperforms previous methods along various dimensions comprehensively.

“Our method not only delivers high video quality with realistic facial and head dynamics but also supports the online generation of 512×512 videos at up to 40 FPS with negligible starting latency. It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors,” they explain.

Our method is capable of not only producing precious lip-audio synchronization, but also generating a large spectrum of expressive facial nuances and natural head motions. It can handle arbitrary-length audio and stably output seamless talking face videos. 

Microsoft added

The tech giant then provides a myriad of examples and applications as to what can be created, also showing that it effectively works with sketches and art pieces, for example.

https://www.youtube.com/watch?v=0s5J2LRqQAI

In their research paper when discussing their findings, the researchers claim that this innovation in AI “paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.”

Additionally, VASA-1 was trained by several scientists from the University of Oxford, on a dataset called VoxCeleb2. That dataset contains “over 1 million utterances for 6,112 celebrities,” taken from videos uploaded to YouTube, according to to the website.

Microsoft, however, swears that this technology is not intended to give people the ability to create deepfakes. SEE: Deep Fakes: How Movie Magic Is Now Our News

Microsoft writes:


It is not intended to create content that is used to mislead or deceive. However, like other related content generation techniques, it could still potentially be misused for impersonating humans. We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection. Currently, the videos generated by this method still contain identifiable artifacts, and the numerical analysis shows that there’s still a gap to achieve the authenticity of real videos.

While acknowledging the possibility of misuse, it’s imperative to recognize the substantial positive potential of our technique. The benefits – such as enhancing educational equity, improving accessibility for individuals with communication challenges, offering companionship or therapeutic support to those in need, among many others – underscore the importance of our research and other related explorations. We are dedicated to developing AI responsibly, with the goal of advancing human well-being.

Given such context, we have no plans to release an online demo, API, product, additional implementation details, or any related offerings until we are certain that the technology will be used responsibly and in accordance with proper regulations.


AUTHOR COMMENTARY

A true witness delivereth souls: but a deceitful witness speaketh lies.

Proverbs 14:25

The lies and deceit continue to keep waxing worse (2 Timothy 3:13). This is so obviously going to be used for deepfaking. Yeah, sure, there’s imperfections but most people are not going to notice it, especially if slanderously material goes viral on social media.

Be extra vigilant and be very careful with everything you see and hear. It seems like everything these days is a lie; and I have no doubt that this will be weaponized against innocents and Bible believers in disinformation campaigns at some point. SEE: Mind Control: Study Claims That Deepfakes Will Start Manipulating Our Brains To Remember False Memories

By honour and dishonour, by evil report and good report: as deceivers, and yet true;

2 Corinthians 6:8

[7] Who goeth a warfare any time at his own charges? who planteth a vineyard, and eateth not of the fruit thereof? or who feedeth a flock, and eateth not of the milk of the flock? [8] Say I these things as a man? or saith not the law the same also? [9] For it is written in the law of Moses, Thou shalt not muzzle the mouth of the ox that treadeth out the corn. Doth God take care for oxen? [10] Or saith he it altogether for our sakes? For our sakes, no doubt, this is written: that he that ploweth should plow in hope; and that he that thresheth in hope should be partaker of his hope. (1 Corinthians 9:7-10).

The WinePress needs your support! If God has laid it on your heart to want to contribute, please prayerfully consider donating to this ministry. If you cannot gift a monetary donation, then please donate your fervent prayers to keep this ministry going! Thank you and may God bless you.

CLICK HERE TO DONATE

10 Comments

    • I’m with ya, mankind has turned God’s creation into a baston of corruption, deceit and lies!!
      The quicker the better, just remember to wait patiently for that Blessed Hope!!

  • “It is not intended to create content that is used to mislead or deceive. However, like other related content generation techniques, it could still potentially be misused for impersonating humans. We are opposed to any behavior to create misleading or harmful contents of real persons…”

    Of course it’s not “intended”! We all know the technocrats are honest and honorable, right? What could possibly go wrong?! We KNOW it will be intentionally misused and abused. With Skype, Zoom, etc., why would such technology be necessary?

    Setting people up with crimes or comments they didn’t do or make comes to my mind. There’s no way to fight against it, so when the black coats come knocking on our door, we’ll have no defense. All too easy to do with demon-possessed AI and it’s handlers (if there are any).

    Did any of you stop to wonder how they got your voice signature, to be able to pull this off?

    Have you ever got an “alarm call” from Amazon Prime, or what sounded like a telemarketer? The moment you answered your phone or launched into a tirade or conversation of any kind, they nabbed your voice signature! And, we innocently thought it was just a nuisance call!

    BTW, they used Skype, Zoom, Discord, etc. to get the faces, the facial movements, and voices! I believe they call this “Identity theft”!

  • The only real purposes I can see for anyone to use this application are impersonation and fraud for false convictions and other nefarious acts.

  • Sounds totally fake. Terrible lip sync. Ugly and stupid like all AI. Denounce it, don’t use it or participate. When you get to a recorded ai voice, get a human only. Complain and demand. Humans only.

  • the EYES in the AI generated faces are the clue….
    they do not look natural….too wide open, too “surprised” looking & slightly bug-eyed.
    windows to the soul-less.

Leave a Comment

×