Artificial Intelligence (AI) is taking up more and more space in our lives, to such an extent that in many activities we are unable to distinguish it. The clearest current example is with images, which are generated by AI tools and can now be found in search engines. The viralization of what DALL-E and Stable Diffusion generate is now dispersed throughout the web to be used openly.
More and more sensitive images are emerging from Artificial Intelligence, many linked to the medical field. The LAION Discord channel raised a user’s concern regarding data found through this situation. This led him to contact the platform to remove this information. One of the developers of the LAION dataset is Romain Beaumont who is also, according to his LinkedIn profile, a machine learning engineer at Google. Beaumont stated that “the best way to remove an image from the Internet is to ask the hosting website to stop hosting it.” He subsequently made it clear that “we are not hosting any of these images.” He was also asked if he knew of places that host them, to which he replied that “if you download it, you get the full list of URLs”.
When AI starts to worry
Fearing what may happen in the future with this technology arising from automated tools, many people ask if they are illegal, inappropriate or break certain confidentiality. A LAION spokesperson contacted Motherboard via email to discuss the situation regarding this issue. There it was stated that “in this case, we would sincerely be very happy to hear from them,…. We are very actively working on an improved system for handling removal requests.”
This incident has raised a lot of concern about the gigantic data sets used to train AI. Many of these designs “scrape” existing images that are not owned by the tool or the user who uses them for the final creation. Copyright comes into play in a situation that is starting to put the eye on Artificial Intelligence image generation software.
A lot of information to record
LAION-5B is known to have a dataset with more than 5 billion images. The concern is that within this database are thousands of photos linked to celebrity pornography and edited in Photoshop. This pirated pornography is stolen and you can see capitations of the protagonists to include the celebrity. You can also find artwork by current famous artists, works by photographers, medical images and photos of people who have not given permission.
When asked by Motherboard, Tiffany Li, an attorney specializing in technology and assistant professor of law at the University of New Hampshire School of Law, commented on the issue. Dr. Li noted that “many of these large datasets collect images from other datasets, so it can be difficult to find the original person who actually collected the image for the first time.” It is not easy to find the origin of a photo to initiate the request for its removal, this causes many sites to have people swarming and, their images, used by the AI for any purpose.