Digital & IP Lawyers Catching Data Gremlins

12 Days of ChristmAIs: A TMT insight series

With the festive season approaching, even the North Pole would think twice before outsourcing Santa’s workshop to a mysterious algorithm. As Australian organisations increasingly embed AI models into their operations, they must also think twice about the value and sensitivity of the data feeding those systems. While AI can bring efficiencies, automation and insights, this must be balanced against the risk of increased legal, operational and reputational risk if data governance is not properly addressed. This article outlines some recurring “data gremlins” we see with the increasing deployment of AI models.

Understanding your data

The accuracy and reliability of an AI model, and the regulatory risks associated with using it, are inherently linked to the data it processes.  Before deploying an AI model, organisations must ensure they understand the datasets (both structured and unstructured) that will be ingested by that AI model.  AI models often extract more value from these datasets, and use this data for different purposes, than was originally contemplated by the customer.  Asking (and understanding the answer to) fundamental questions such as “what data do we actually hold”, “where is our data stored and how is it accessed”, and “what data will the AI model have access to”, “where will the data be processed” as well as understanding your organisation’s data flows at a granular level, is critical for assessing risk with respect to your data and any privacy, security, IP and contractual issues that might arise when deploying an AI model.

Data integrity

A key issue when using any AI model is data integrity. Users of AI models often (incorrectly) assume that any rights in data they upload (whether personal information, commercially sensitive material, or proprietary datasets) remain entirely theirs, and that the data they upload is only used by the AI model provider for a specific purpose.  In reality, AI model providers will frequently seek include in their terms the right for themselves to use such data (whether provided as part of an input, a prompt or otherwise) to train, customise and improve their AI models. This can lead to the embedding of proprietary data into training sets, giving rise to unintended uses and the potential for that data to be disclosed to third parties. Customers must be aware of this risk and include appropriate contractual provisions restricting such usage where needed.

An organisation’s own employees can also pose a risk to the integrity of its data, through the disclosure (intentional or otherwise) of proprietary information or personal information by way of prompts or inputs into the AI model. In 2023 a global technology provider banned the use of AI-powered chatbots after one of its software engineers uploaded sensitive internal source code to a chatbot. This demonstrates the implementation of appropriate internal policies and guardrails with respect to the use of AI models, and the running of regular internal training sessions for employees to reinforce the policies, are essential to minimising data leakage and resultant negative impact for an organisation.

Upstream data licensing issues can also have a profound effect on downstream customers of an AI model, especially where the AI model provider has breached third-party licensing terms when training its AI model.  Use of tools trained using improperly obtained data can create reputational and business continuity risks for customers of those tools, if the use of those tools is enjoined as part of actions brought by the owners of the training data against the provider of the AI model, or if those proceedings become publicised. Current litigation all around the world by data subjects and copyright owners against the owners of AI models demonstrates that third-party rights are (and will continue to be) a prevalent issue when it comes to licensing for AI models.  Understanding how an AI model has been trained (including on what data), and having protections (including through contractual assurances) that the AI model provider has appropriate third-party data licensing terms and permissions in place is critical. 

Ownership of output

Contracts for AI models will often provide that ownership of output defaults to the AI model provider. This creates potential limitations around how an organisation can use ‘its’ outputs, and how those outputs (which may contain elements of your training or prompt data) can subsequently be used by the model provider for training or indeed in outputs for other customers. Organisations must be alive to this and consider the nature of the outputs to be produced by the AI model. Where the output is likely to contain proprietary insights or some other form of inherent value, organisations should negotiate express ownership or exclusive rights with respect to that output.  Clear drafting is essential to avoid uncertainty about whether outputs can be commercialised, shared with third parties, or embedded in new products. 

Hallucinations and output accuracy

The accuracy and reliability of the output of an AI model is dependent on the input data and the data that the AI model was trained on. 

allucinations and errors in output frequently arise where underlying data quality is poor, or where there are inherent weaknesses in the design of the AI model (which is often due to the data the AI model was trained on).  Model drift (being the degradation in the performance of the AI model due to changes in the underlying data) is also a real concern and can lead to a decline in the effectiveness of decision-making.

Organisations can alleviate the risk of these issues through various means, including by implementing a human in the loop safety check, including quality monitoring to address model drift, having programs to retrain the models periodically, and by ensuring that it has robust validation frameworks are in place with respect to output.  However, to identify which of these is relevant for any particular use case, organisations need first to have conducted the fundamental analysis to determine the risks relevant to their particular model and intended deployment.

Privacy

The use of personal information by AI models will continue to be the subject of real scrutiny leading into 2026, especially with anticipated Australian privacy reforms on the horizon. The Office of the Australian Information Commissioner (OAIC) has released guidance with respect to the use of AI models and has reiterated that obligations under Australian privacy law will apply to any personal information input into an AI model, together with any output generated by an AI model that includes personal information. Further, as of December 2026, it will be necessary to include mandatory transparency requirements about automated decision making in privacy policies. 

If personal information will be used to train or prompt an AI model, the organisation responsible for that personal information must understand precisely what information is involved, how it was collected, the purposes for which it was collected, how it will be processed by the service provider, what secondary uses are proposed (if any), and whether all of those matters have been adequately disclosed to the data subject.  Lack of transparency can amount to a breach of Australian (and other international) privacy laws, especially if that personal information is repurposed in a way that is inconsistent with the original purpose of collection.

Customers should also strictly manage any overseas disclosures, and ensure that appropriate data security arrangements are in place. They should negotiate the contract with the service provide to include the controls necessary to address the risks they identify.

As best practice, the OAIC recommends that organisations do not enter personal information (and particularly sensitive information) into publicly available generative AI tools, because of the complex and significant privacy issues that might arise.  However, to utilise the full benefits of AI, this is often necessary and can often be done lawfully – provided proper analysis has been done, negotiations have been conducted and contractual and technical controls have been implemented.

How we can help

The above issues highlight the need for clear understanding of and controls around how an organisation’s data can be stored, used, handled and analysed. An increasing integration of AI models into an organisation’s technology ecosystem means that having appropriate data governance mechanisms and contractual protections in place to manage data risk becomes critical. Our digital and intellectual property team regularly advise organisations with respect to the procurement, safe deployment and ongoing management of AI models. We regularly negotiate contractual terms, and advise on appropriate policies and procedures, to alleviate business risk. A pre-Christmas review may be the simplest way to keep those data gremlins out of your organisation’s stocking.

This article forms part of the series, the 12 Days of ChristmAIs: A Technology, Media and Telecommunications series on artificial intelligence and its intersection with the law. You can view all the articles here.