What can Artificial Intelligence do for audiovisual sector?

I would like to begin by pointing out that, personally, Artificial Intelligence seems to me too broad concept to describe a system, and that it is very commonly used with little doses of criterion and many doses of mercantilism. It is a word that is undoubtedly fashionable, it promises to be the fourth industrial revolution, and on the one hand, it scares and on the other, it gives hope for a better future.

It is a “magic” term that many people refer to, along with others such as virtual reality, Big Data, or 5G, when they want to express that their technology is cutting-edge. In the use-case of audiovisual sector, artificial intelligence can cover too many areas and functionalities, existing or still to be invented, it is more accurate to speak of semi-automatic image processing, voice recognition, or natural language processing processes. But since it has been installed as a reference concept in the industry, we are going to contextualize what is this about artificial intelligence, AI, in this article.

There are definitions of artificial intelligence for all tastes, since it is used in multiple areas with very different objectives, but the common element is that it is about teaching a machine to generate the same results that a human would generate in a certain task, using human intelligence as a paradigm.

The key to an artificial intelligence is in its training. Training an AI means teaching it what output data it should return to us when we input certain input data. In face recognition software, for example, we will give our tool many photos of the same person, indicating the name to which it should associate these images. Once you have learned that information, you will know what output to generate for future inputs. We will have trained the system, which is based on such suggestive concepts as neural networks or genetic algorithms.

To train the system, therefore, data is needed. And because of that, the data is trading higher. Companies like Google or Facebook know this, and in some of the thousands of clauses that we sign without paying excessive attention, we give them all our data and content to improve their algorithms.

Data has become a fundamental resource for many business models, so much so that the European Union is already legislating on commercial transactions in which you pay with data instead of money. The objective is, of course, to generate a tax on such transactions.

In the past, it has taken more than a decade for a technological revolution. In the time in which we find ourselves, with the unfortunate COVID-19 pandemic, digital processes, and the digitization of companies and processes, have been greatly accelerated and have been supported by areas such as artificial intelligence. A McKinsey survey published in October 2020 found that companies are three times more likely to conduct at least 80 percent of their customer interactions digitally, compared to the time before the COVID crisis. We are therefore in a mature third industrial revolution that is pushing us to implement the Fourth Industrial Revolution, the revolution of machines.

It can be uncomfortable a machine replaces a human in an environment such as audiovisual, where creativity plays such an important role. And they are right, the creative genius cannot, and surely won´t be able, be replaced by a machine, but there are other more repetitive tasks within the audiovisual sector in which AI can be of great help. Let us focus in this article on a task, in an area not always valued but really critical and very important: cataloging, metadata and documentation of audiovisual content.

Examples like:

We need all the goals from Messi this season, which we are going to make a summary piece for the audience leader sports program.
Today Pedro Sánchez has spoken about the problem of refugees, I need cuts from all his interventions on this issue this year.
I need that speech by Felipe González where he talked about Venezuela when he was president, do you remember?, what year was it? Around 1990, or 91 … I think …

Many professionals in the sector will have heard, or even said, phrases very similar to these. Requests for documentation are very varied, and it is always expected that documentation area will return the exact piece to which we refer. But to be able to retrieve this information, an exhaustive, and very expensive, work of indexing all the content that is produced and that reaches a chain is necessary. And that is where Artificial Intelligence is very useful to us, automating part of the indexing process, and reducing the time that each documentalist has to spend with content.

AI application in indexing can be classified into three areas, depending on the type of input data: audio, video or text. It is capable of generating a large amount of information thanks to algorithms capable of recognizing faces, detecting logos and brands, reading labels, segmenting speakers, transcribing voice to text, extracting keywords, categorizing content, etc. Next, we are going to focus on the algorithm most used in assistance to audiovisual documentation, voice recognition.

Speech recognition and its use in the first audiovisual systems have been talked about for many years. However, those systems were not prepared, the results were dirty, mediocre and a certain rejection by the industry was created for their use. Today, the scenario is different and it is important to make a new reflection on the matter, and analyze these technologies without prejudice, since the benefit they can bring is very great.

Current technology is ready for industrial exploitation. By this I do not mean that any content can be transcribed or subtitled without errors, it is necessary that the audio quality is high and that several people do not speak at the same time. But above all, specific training is necessary for each type of content and, in the best of cases, for the voice of each person. That is why creating a universal subtitler is out of the reach of even giants like Google, you just have to try YouTube’s automatic subtitle generator to see that there is still an important way to go.

However, a system can learn how to speak in a defined content type: what grammar is used and what vocabulary is the most common. With regard to the lexicon, it is critical to know the names of people, places or institutions that are going to be used; the system will never be able to recognize a word that it does not previously know. Thus, the first time we introduced a session of The Courts of Aragon in our Etiqmedia recognition system, it told me that “The Dragon government proposes to the city of Cruel a budget line to fight against the territory depopulation.” Either the recognizer had opted for fantasy literature, or he needed proper training. We decided that a training with old session transcripts would be the best option. We tried again and this time the system deduced that really “The government of Aragon proposes to the city of Teruel a budget item to fight against the territory depopulation.” Much better.

In the human part of the implementation of any new technology, a question of the type, “what now, and my work ?”, always arises: Are these systems going to put documentary filmmakers out of work? The answer right now is clear: NO. Technology is not prepared to work autonomously, all data extracted with AI has a non-zero error rate, usually between 3% and 10%. For this reason, a human supervision flow is required and established, where the documentalist corrects the data generated by the automatic system and adds relevant information that cannot be automated. The big difference compared to the manual system is the great increase in productivity, a documentary maker is now capable of indexing much more content than manually. Traditionally the documentation areas do not have enough staff and not all audiovisual companies value the importance of this area and the strong demand for staff it usually has. AI has come to help.

We are entering a transition phase in which technology will gradually be introduced into documentation systems. Like linear editing in its day, fully manual cataloging has its days numbered.

Images: Pikist y www.etiqmedia.com