AI Trained on Child Sexual Abuse Material Spark Concerns

Researchers from the Stanford Internet Observatory have exposed a distressing reality where AI-image-generating models are being trained on datasets containing thousands of images of child sexual abuse materials.

LAION 5B, the colossal public dataset, was used to train sophisticated image-generating models backed by AI. The study raises questions about the ethical aspects and transparency of training AI tools.

The German non-profit organization, LAION, curated the LAION 5B dataset. It consists of millions of images collected from a wide range of online sources like adult entertainment sites and social media platforms.

Researchers could identify a minimum of 1,008 instances where the data set contains materials related to sexual abuse of children. The dataset contains around 5 billion images in all.

LAION was quick to take the dataset offline, citing a “zero tolerance policy for illegal content” in a statement. The organization is currently coordinating with the Internet Watch Foundation, based in the UK, to remove any remaining link to obscene content from the public database.

LAION said that they were planning to carry out a complete safety review of the dataset by January and get it republished.

The gravity of this discovery lies in the fact that sophisticated generative image AI tools can create deepfake images that closely resemble individuals. This explains the possibility of objectionable child abuse content being created using AI.

While LAION stated that it is testing its datasets through “intensive filtering tools,” the fears of the potential impacts of using such extensive datasets in training AI models persist.

The good news, as shared by the Stanford team, is that the removal process of the identified images is in progress. The report reveals that the developers of LAION 5B tried to weed out some explicit content.

However, “a wide array of content, both explicit and otherwise” was used to train a previous version of Stable Diffusion, one of the popular models for generating AI images.

Stability AI Clarifies That Their Image-Generating Model Is Safe

The researchers called for urgent action and advocated to prevent models based on Stable Diffusion 1.5 from being distributed.

The London-based startup, Stability AI, which worked on Stable Diffusion, stated that they didn’t release Stable Diffusion 1.5, an earlier version. It was a separate entity altogether, and the Stafford researchers acknowledged that Stable Diffusion 2.0 wasn’t trained on any explicit material.

The older version is still in use in certain models, though, and concerns linger since it is the most popular model to generate explicit images.

The report also highlighted the prime risks that large web-scale datasets pose. These included copyright issues and privacy concerns. According to the experts, these datasets should be restricted only for research purposes, while better-curated datasets should be used for the public distribution of AI models.

This development brings potential misuse of AI technology to public attention and highlights the critical need for stricter standards. While AI-backed image-generation tools are gaining power, the industry continues to struggle with the fallout.

Stability AI Clarifies That Their Image-Generating Model Is Safe

Leave a Reply Cancel reply