Library Services
This guide © 2024 by UCL - Library Skills is licensed under CC BY-NC-SA 4.0
If GenAI models are trained on materials protected by copyright without the permission of the copyright owner, does this mean that they could be violating copyright laws?
The answer to this is complicated.
Unless copyright has expired or material has already been licensed under terms that allow reuse, e.g. under the Creative Commons Attribution licence, normally permission is required to reproduce, share and reuse material. Reproducing and sharing material without permission or a licence from the copyright owner could be unlawful.
However, in certain cases materials may be reused without permission, for specific purposes defined in the law (‘copyright exceptions’ or ‘permitted acts’) or if certain criteria are met, e.g., ‘fair use’ in the US. Training GenAI models may rely on these exceptions. Since copyright laws vary across countries, it is also crucial where the training activity took place.
Using copyrighted materials without permission to train GenAI could, therefore, be perceived as unlawful, or it could be deemed to be permitted under an exception ,e.g. the text and data mining exception in the UK, or ‘fair’. This is being decided in relevant court cases, whose outcomes help shape how copyright applies to GenAI.
A number of copyright owners have sued AI companies for copyright infringement. Key cases include:
These cases highlight the complexity of copyright as it applies in GenAI but also raises broader questions:
While these questions are being addressed in the courts, it is advised to be mindful of the issues and use tools that demonstrate some degree of transparency in the way they work and their terms and conditions.
It is also worth noting that, while the outcomes of these cases will certainly be informative, copyright considerations in academic settings are very likely to be different than criteria and considerations applied in the creative industries and commercial settings. See related commentary on fairness criteria on the SPARC website, particularly point 3.
UK legislation includes a copyright exception allowing copying for the purposes of computational analysis of text and data, as long as the use is non-commercial, the user has lawful access to the materials and the sources are acknowledged (unless it is impossible to do so for practical reasons). For more detail on the exception see our TDM guidance.
The question is whether the exception could be applied to train GenAI models. This is important if your research involves developing / training a GenAI model. A court case recently ruled in Germany (Kneschke v LAION), which has similar exceptions including one on TDM for research purposes, should help shed light on this. The photographer / copyright owner of an image sued the LAION organisation for copying the image without permission, for the purposes of creating a dataset to support AI training. The case is quite complex; full details on the case are discussed on the Kluwer copyright blog and the TechnoLlama website. Here we highlight the relevance of the court’s decision to (a) confirm that making a copy of an image in order to extract information from it is covered by the exception and (b) that the activity was non-commercial research. Although the decision did not cover the further use to train the model, comments by the judge suggest that TDM exceptions could extend to AI training.
In an academic setting, asserting the right to rely on the TDM exception to train AI in research is important. Some publishers may have clauses in their terms of use that preclude the use of their articles for AI purposes. This is being challenged; please see relevant guidance by JISC.
If your research involves TDM and you are unsure about publishers’ clauses or encounter technological barriers when copying the data, please contact us for advice.
Copyright breaches can still happen even if the training data and prompts are shared with a licence allowing reuse, such as a Creative Commons licence or an open source software licence.
If GenAI activities rely on a licence, the terms of the licence must be respected. This includes requirements to attribute the author and meeting specific terms of a licence, for example no-derivatives, share-alike and non-commercial restrictions. These points need to be addressed both if you are creating your own model and if your work is being used to train AI models. Creative Commons have a useful article and flowchart showing in which cases of GenAI activity different terms of the licences apply.
Attribution is, of course, a requirement of all six CC licences; attribution is also expected for materials that are not openly licensed, as part of good academic practice and research integrity and as part of fair dealing if relying on exceptions. There are concerns that GenAI outputs do not attribute their sources or, if they do, attributions can be inaccurate or fabricated altogether.
Solutions to this might involve a combination of approaches:
For an extensive discussion of infringement and attribution issues in GenAI, see Johnson A. Generative AI, UK Copyright and Open Licences: considerations for UK HEI copyright advice services [version 1; peer review: 2 approved]. F1000Research 2024, 13:134 (https://doi.org/10.12688/f1000research.143131.1).
As a user of GenAI tools, you will be providing prompts in the form of text, images, code, film etc. You could be breaching copyright if your prompts are someone else’s intellectual property and you don’t have permission or a licence to share them with a third party. This may include, for example, articles that UCL subscribes to which are provided for personal research and study or images for which you do not own the copyright.
A highly publicised case reflecting this involves Tesla using a still from the film Blade Runner 2049 without permission in October 2024. Tesla first approached Alcon Entertainment LLC, the producer of the film, to ask for permission to reuse the image. When this was denied, Tesla used the image as a prompt in a GenAI tool to generate a new version, which was shared as part of a promotional event. The outcome of the Alcon vs Tesla case should also provide insight on infringement in the context of GenAI.
You could also be infringing copyright if your generated output is reproducing substantial parts of original content that is protected by copyright and not licensed for reuse. Several AI tools, usually paid versions, offer indemnities to cover legal expenses in the event of a user being sued for copyright infringement. However, these indemnities are limited and not likely to offer comprehensive cover. More advice on indemnities and their limitations can be found on the Farrer&Co website.