Skip to Main Content
XClose

Library Services

Home

UCL LIBRARY SERVICES

Reusable asset guide

A collection of content boxes that can be re-used in your own guide. We will be continuously adding content here.

Text and data mining (TDM)

 Cartoon image of files and devices being poured into a machine and the words text and data mining are created at the other end.

Image by Davide Bonazzi at www.copyrightuser.org/understand/exceptions/text-data-mining/

There are various definitions of TDM which cover both the technicalities and utilities of the practice. The UK Intellectual Property Office (IPO) define TDM as: ‘The use of automated analytical techniques to analyse text and data for patterns, trends and other useful information’. Even within TDM, there are different definitions for both text and data mining. Text mining is more commonly seen as the computational process of discovering and extracting knowledge from unstructured data. Data mining, on the other hand, is the computational process of discovering and extracting knowledge from structured data. 

There has been a surge of interest in the use of TDM in academia across all disciplines ranging from the sciences to the humanities. However, TDM entails a range of legal and political issues which need to be considered, primarily centred around copyright, intellectual property rights, licences and download limits.  

Why do TDM?

Firstly, TDM can make research easier for those seeking to examine a large corpus of documents in order to discover underlying trends across multiple datasets. TDM is often cited as a way of increasing the progress of scientific discovery. But TDM is also useful for researchers working in the humanities to mine sources like journals and newspapers.

The Advanced Research Computing Centre (ARC) team at UCL work closely with a number of departments around UCL by collaborating on a range of software projects including Oceanic Exchanges, ForecastCC and the UCL-wide CloudLabs.  

Barriers to TDM

In recent years, some changes have been made to the UK’s current intellectual property framework in order to support innovation and growth. The Hargreaves Report (2011) introduced a copyright exception in UK law to allow for the use of analytics for non-commercial use. Yet, there are many barriers to TDM. Some of these issues have been studied in more detail by Michelle Brook, Peter Murray-Rust and Charles Oppenheim. They have argued that there are a number of non-technological barriers that need to be overcome in order to realise the full potential of TDM. They raise concerns about the legal issues of TDM surrounding copyright law and database rights but also offer some guidelines about how publishers can help to overcome these barriers to research. For example, this includes giving researchers lawful access to original materials and making clear distinctions about what research is regarded as ‘commercial’ and ‘non-commercial’.   

Accessing material for TDM

While a specific licence is not generally required for academic TDM, many publishers provide explicit support for TDM by academic users through a specific interface. This allows higher rates of access, and avoids problems that can come from intensively crawling the publisher sites.

For example, in the area of scholarly journals, Elsevier, Springer and Wiley all provide access to journal content for TDM through a dedicated Application Programming Interface (API). There is no 'one-stop shop' for TDM across multiple providers, but depending on the material you are looking for, it may be possible to use the Crossref, Scopus, or Web of Science APIs to get some initial data. If you are considering doing TDM work from a content provider who does not offer a specific API, and you will be downloading a large number of records, it is recommended that we consult with them first. 

How can Library Services help?

There are a number of ways in which the library can assist researchers with TDM:  

  • Provide advice on the tools available to undertake TDM alongside the type of sources you may wish to consider analysing. 
  • Refer researchers to other specialists who can assist further with the technicalities and legalities of TDM.  
  • We play an important role in continuing to promote and build TDM networks across the university and beyond.