LIBERATION

Artists, how much do AI owe you?

A sort of "reward" simulator, the new Reward Simulator tool allows you to find original photographs in a database that may have been used to create images generated by generative AI. While it's still imperfect, it clearly addresses the issue of copyright.

The images created by Midjourney feel like déjà vu? Do others created by Dall-E remind us of something? Yes, but what? Since their sensational appearance in the visual creation landscape, generative artificial intelligence (AGI) has been seen as black boxes. No one knows exactly what MidJourney, Dall-E, or Firefly—to name just a few—use to transform words—called prompts—into images. While we know that these tools have been trained on millions of visuals, few AI companies communicate precisely about the content of the databases from which their programs have drawn. And for good reason: this data is often protected by copyright. Highly secretive about their discreet "harvesting" or "mining," AI providers are therefore accused of plundering and plagiarizing the work of artists, photographers, filmmakers, and more. For the latter, it is almost impossible to provide proof of the use of their content. For photographer Arnaud Février, the IAGs have therefore carried out "the robbery of the century" by siphoning off "the safe of creation."

Pandora's Box of AGIs

Now, a small educational tool has just been born: the Reward Simulator, a sort of "reward" simulator. It's an open-source prototype that allows users to find original photographs in a database that may have been used to create new images. Indeed, when you submit a generated image to the Reward Simulator, the program then retrieves all the photos that may have been used in its composition by drawing from Open Images, the largest open-source database, comprising 10 million royalty-free photographs. In addition, the tool provides other information: it retrieves the name of the author of the original photos and provides a price range. With copyright being at the center of discussions at the AI ​​Action Summit, here's a clever tool that opens Pandora's Box of AGIs and concretely reveals the artists' works contained within.

"The simulator clearly demonstrates the technical and economic feasibility of fair remuneration for authors," explains Vincent Lorphelin, founder of Controv3rse, a think tank of 70 entrepreneurs and AI experts behind the tool. Organized as an association, Controv3rse aims to inform the public debate on the challenges of artificial intelligence. "This simulator is the first to calculate, on a real-life scale, the remuneration of rights holders of training data by generative AI. It works on the basis of "vector similarity." When you present it with a generated image, it calculates its vector and looks for the closest vectors." With this tool, Vincent Lorphelin aims to "unblock the debate" between artists, for the principle of "fair remuneration" - introduced by the Lang law - which compensates artists for the dissemination of their works. The idea is to apply a rate of 15% to the turnover of generative AI to distribute them by collective management organizations "according to "vector similarities"".

Rather Fanciful Prices

For now, the Reward Simulator is still imperfect. The prices it displays are rather fanciful. And, paradoxically, it draws from a free database. But it does address several issues. First, it demonstrates that pre-existing works are indeed the source from which content is generated. Second, it highlights the authors' lack of consent. Finding the artists' names quickly reveals that they had no say in the use of their works. Therefore, they could not have opted out in advance (removing the works from the training databases).

Thus, the Reward Simulator helps to highlight the importance of data transparency, a crucial point demanded by authors' societies and one of the main ethical concerns of the AI ​​Act. Because "only transparency on the sources used to train AIs in the future will make it possible to verify compliance with the opt-out but also the conditions for accessing protected content [...] To evaluate an AI and offer guarantees to its users, documentation on the sources is essential," wrote Alexandra Bensamoun, a qualified lawyer at the Higher Council for Literary and Artistic Property (CSPLA, Ministry of Culture) in 2023. "Possible responsibilities could not be characterized without transparency."

Finally, the tool has the merit of highlighting the necessary sharing of value between artificial intelligence and artists. Why would only these commercial companies receive income when they based their probabilistic tools on protected works? While providers of artificial intelligence models are taking a dim view of the AI ​​ACT and Sam Altman (Open AI) - claiming to be an American "fair use" - is criticizing European regulatory efforts, the collection and exploitation of data is more strategic than ever. This attractive, clear and effective simulator can perhaps help us ask the right questions in order to initiate a more virtuous circle between AI and creation.