November 18, 2025

Webscraping for AI model training: what does Getty Images v Stability AI mean for the creative industries?

The recent High Court decision in Getty Images v Stability AI has been widely discussed. The general view is that it failed to live up to expectations. It had been hoped the judgment would address key issues concerning AI and copyright infringement in the UK. However, it didn’t shed much light on either of these. This article discusses the judgment and what we’re able to learn from it.

What was Getty Images v Stability AI about?

Stable Diffusion is a text-to-image model developed by Stability AI that enables users to create AI-generated images. Getty Images is a supplier of stock images, video and music, primarily for business use. Getty brought a claim against Stability which included an allegation of copyright infringement. The claims also alleged database right infringement, trade mark infringement, and passing off. Getty’s claims (including its claim for copyright infringement) were unsuccessful, apart from narrow finding around trade mark infringement.

Did the court decide on the lawfulness of web scraping for training AI models?

No. Getty dropped its primary copyright infringement claim for evidential and jurisdictional reasons. These reasons were primarily based on the fact that Stability had trained its AI models on servers based in the US, which meant that UK copyright law didn’t apply (at least not in the context of Stability’s model training). Because the servers were in the US, Getty was also unable to provide sufficient evidence that any acts of unauthorised copying had occurred in the UK. The court didn’t rule on whether scraping images, text, or video for AI model training breaches UK copyright law.

Primary vs secondary infringement: what’s the difference?

Under the Copyright Designs and Patents Act, primary infringement covers direct acts like copying a work without permission. Secondary infringement is more indirect and includes importing infringing copies; possessing or dealing with infringing copies; and providing the means for making infringing copies. Getty pleaded infringement under Sections 22, 23, and 27 of the Copyright Designs and Patents Act.  I.e. that Stability had imported into and distributed into the UK, an “article” (Stable Diffusion) which Stability knew or had reason to believe was an “infringing copy” of one or more of Getty’s copyright works. As part of this argument Getty claimed that the creation of Stable Diffusion’s model weights had infringed copyright in the training data, which meant that Stable Diffusion was itself an ‘infringing copy’. However, the court did not agree. It did not help Getty that the experts’ view was that Stable Diffusion didn’t store infringing copies or reproduce any copyright works: its model weights were just numbers – parameters that describe the features of an image.

Does this mean that AI developers can lawfully web-scrape data to train AI models?

Not necessarily. There are all kinds of laws which might apply to the web scraping of content in the UK, including copyright law, data privacy law, contract law, and the Computer Misuse Act. It also very much depends on what the AI model does. It could be risky to assume that the principles in this judgment can be applied to a different set of facts, including in relation to a completely different AI model. The judgment doesn’t mean that an AI model could never be deemed to result in secondary copyright infringement.

There is also the fact that this is a UK High Court decision. Most AI models are deployed throughout the UK, the EU, and internationally. It’s not at all clear whether other courts in other jurisdictions would take the same perspective as the High Court. There are currently many similar cases ongoing in various countries, including multiple cases in the EU and the US.

How can creative professionals protect their work against web scraping?

As it stands, the law doesn’t offer much in the way of protection for rightsholders. New laws will come eventually, but this may take a while. In the meantime, the most effective ways of protecting works from being web-scraped are likely to be through technical measures. For example, some are relying on:

• rate limiting to stop multiple requests from a single source;
• bot detection to block automatic scraping tools;
• securing API keys;
• access controls that restrict access to data; and
• clear terms of use, which prohibit web scraping.

Rights reservation: policy makers’ solution to the wrong problem?

UK policy makers are considering a “rights reservation” model to help protect creative professionals’ rights. Under this approach, AI developers will be allowed to train their models on copyright content unless the owner opts out. There has also been discussion about how rightsholders may be remunerated for licensing their copyright works, and how this might work from a practical or technical perspective.

This solution assumes that strengthening creative professionals’ ability to enforce their rights and be compensated for use of their copyright works will solve the problem. However, in practice, enforcement of rights is not really the key issue. The biggest issue is that the sector is experiencing a gradual value erosion due to AI – a form of economic displacement which, in the long term, could be difficult to reverse.

Even if rightsholders are able to reserve their rights in the UK, AI developers will simply train their models elsewhere, where there are fewer restrictions. Rights reservation will not stem the flood of synthetic content available to consumers or reduce the consumer demand for it. Legislating for greater protection of copyright holders’ rights is needed. However, policy makers will ultimately also need to somehow tackle the bigger problem of AI value erosion within the sector.

Read more from around RWK Goodman

View more articles related to Intellectual Property and Tech