Scientists use AI for brain reconstruction over speech

The algorithm created by the Massachusetts Institute of Technology (MIT), was able to find the face of a person using what it had been saying about herself

An algorithm of Artificial Intelligence developed by the Massachusetts Institute of Technology (MIT), in the United States, has been able to reconstruct the appearance of a person’s face taking as starting point only the speech.

The MIT Computer Science and Artificial Intelligence laboratory (MIT CSAIL) published Speech2Face, a tool that is able to determine factors such as age, gender and ethnicity of a person through a short speech extract.

The authors of the study have assured that their objective “is not to reconstruct an accurate image of the person, but rather to recover physical characteristics that are correlated with speech.”

This project seeks to know to what extent it is possible to know the aspect of a person from his voice, and is inspired by the way in which people build models of the traits of someone from whom we only know their voice.

Speech2Face works through a neural network of deep learning designed and trained from the open database AVSpeech, composed of more than 100,000 people speaking in short fragments of six seconds.

To demonstrate its results, the research has also used the VoxCeleb database, made up of millions of videos published on the Internet in which 7,000 famous people appear in interviews, in short fragments of at least three seconds.

The image generated is that of a person’s face in front, with a neutral gesture, and they have been exposed together with real images of the celebrities in the videos to show the resemblance to the original.

During the training, the model of the study learns from the videos of the database correlations of audio and sound between the voices of the people and their faces, focusing on physical attributes such as age, gender and ethnicity, but also adding other as several measures and craniofacial proportions.

The operation of Speech2Face takes place in an unsupervised manner, making use only of the relationships between speech and aspect observed in the videos.

K. Tovar

Source: LibertadDigital

You might also like