Generative AI’s impact on data strategy and governance

Name: Baker Tilly
Price range: $

In today's data-driven world, the rapid advancement of generative artificial intelligence (AI) has emerged as a transformative force reshaping the landscape of data strategy and governance. As organizations strive to harness the power of data for competitive advantage, the capabilities of generative AI, which uses machines to generate human-like content, have introduced both unprecedented opportunities and potential challenges.

As generative AI continues to transform data generation, analysis and decision-making processes, it also raises critical questions surrounding data authenticity, privacy and ethical considerations. In this era of AI-driven innovation, understanding the impact generative AI has data strategy and governance programs is crucial for organizations aiming to navigate this evolving digital landscape effectively.

What is generative AI?

Generative AI is a is a subset of artificial intelligence that employs machine learning (ML) techniques to generate new content, such as text, images, music or even speech, based on patterns and data it has learned from the dataset it’s trained on. Most large language models, such as ChatGPT, Dolly and Bard utilize generative AI to generate human-like text and can be fine-tuned for a wide range of natural language understanding and generation tasks.

These foundational models are trained on extensive and broadly available datasets that span a variety of subject areas, enabling users to answer questions across a wide spectrum of topics by generating content relevant to specific areas of interest. As these foundational models, which were the product of years of development and substantial financial investment, become publicly available, widespread adoption continues to increase the accessibility of AI innovations in the consumer market. As a result, there has been a notable surge in the adoption and utilization of generative AI.

How are organizations utilizing generative AI?

Organizations are utilizing generative AI not only to respond to inquiries and generate documents but also to explore image manipulation, music composition and voice synthesis. Additionally, many organizations have begun creating personalized customer experiences through AI-driven chatbots, targeted advertisements and automated customer services workflows. Envision crafting personalized advertisements featuring images specifically relevant to individual customer profiles or purchase histories.

Additionally, there's a growing interest in generating synthetic data, which allows the creation and training of new models without privacy or regulatory concerns. As well as numerous opportunities to streamline internal operations, such as HR and administrative processes that typically require human intervention to address routine queries. The key focus is starting with automating processes that are easily manageable and repetitive in nature.

How can organizations leverage their data within these generative models?

Many organizations are looking to leverage their own data within these generative models with the purpose of training models that can answer questions related to proprietary data that can then be made available to internal staff or potentially publicly available to their customer base. These efforts can be categorized into the following three distinct categories:

1. Building a new large language generative model

Building a new large language generative model is a complex undertaking that requires an extensive commitment in terms of computational resources, financial investment and an abundance of data. Companies like Google, Meta and Amazon have taken on this effort, training models on millions upon millions of documents and datasets from a diverse range of subject areas across the internet. Notably, it's the larger companies that already possess the necessary infrastructure who are inclined to approach building a new AI model and as a result, it is relatively uncommon for organizations to fall into this first category.

2. Training and tuning existing models

Training and fine-tuning existing models is much more common as OpenAI, as well as various other tools, offer organizations the opportunity to train the existing foundational model using their own data and fine-tune it according to the desired responses for specific prompts. For instance, if there are predefined common use cases with associated prompts, organizations can tune the model to produce expected responses for these scenarios to ensure it generates those desired responses. OpenAI also provides an API for this purpose, or organizations can choose to enhance publicly available, foundationally trained models by further refining them using their proprietary data.

3. Prompt tuning

Prompt tuning, seen in tools like Bing chat, is a technique in which a system actively searches the internet for relevant information before presenting search results along with a naturally phrased response that incorporates references to those findings and uses a technique called retrieval-augmented generation. Prompt tuning is experiencing increased adoption because it is easy to implement and requires considerably less investment or expertise compared to the previous two categories. While a fundamental understanding of how these models operate is still essential, companies like Microsoft and OpenAI are making leveraging these technologies more accessible.

What are the key considerations prior to implementation?

Leveraging AI technologies requires organizations to return to the fundamental principles of data governance and management to consider the impacts of implementing generative AI. This entails several key considerations, including:

1. Incorporating data governance

It’s crucial to ensure that organizations have implemented a data governance program that encompasses data quality, compliance and privacy. As there may not be human oversight when the model processes and generates responses using company data, organizations need to be confident in the accuracy, freshness and overall quality of their data and ensure data quality assessments are conducted to flag unreliable or null data inputs that can negatively affect the model's output.

Organizations must address considerations around ethics, training bias as well as legal and risk aspects related to data privacy and regulatory compliance. Data privacy, data security and all key tenants of data governance that are considered when humans interact with data are equally applicable when working with these models.

2. Addressing data output

Consider how generated outputs, including synthetic data, align with your governance framework. Whether it involves documents containing synthetic data or entirely new datasets, these outputs require governance measures to safeguard the organization against potential risks. Effectively managing the outcomes generated by a generative AI model is a new consideration and organizations need to have a framework to manage and regulate the usage of generated content, especially when it contains proprietary and confidential data.

Establish clear data ownership chains and protocols for managing the output from these models. Consider the logistics of incorporating this data into your corporate network—whether it should reside in databases or be fed back into subsequent models for further training. The chain of custody is a vital consideration when it comes to generated data.

3. Necessary skill sets

Consider the necessary skill sets needed within your team, not only for data management but also for the governance of the models responsible for generating responses that people will rely on. Building this competency involves either upskilling current team members or recruiting new talent who possess a strong understanding of generative AI. Establishing this foundational expertise ensures that all the necessary groundwork is in place, allowing an organization to launch initiatives that immediately deliver value.

Getting started

To effectively harness the potential of AI technology, it's crucial to comprehend the opportunities, implications and associated risks. Begin by dedicating time to researching generative AI, delving into how it works, how large language models work, their training processes, their strengths and their limitations.

Then, focus on your organization’s data to gain a comprehensive understanding of dataset ownership, data freshness and the quality metrics associated with each dataset. Determine which datasets could potentially be integrated into a generative model.

Next, begin identifying potential use cases for this technology within your organization. Define your objectives and the value proposition of generating creative output and consider how your staff and customers can benefit from it. It's advisable to start with small-scale implementations before contemplating larger endeavors. This approach allows organizations to begin with a smaller dataset and scale up as they improve the model and better understand the implications of introducing AI technology into their organization.

Finally, it’s essential to build data literacy throughout the organization regarding generative AI. It's crucial for everyone to understand that generative AI doesn't think for you; instead, it relies on the data it was trained on to generate an output. Encouraging a broader understanding will empower individuals to make informed decisions about how to utilize this technology effectively.

Generative AI’s impact on data strategy and governance

Generative AI’s impact on data strategy and governance

What is generative AI?

How are organizations utilizing generative AI?

How can organizations leverage their data within these generative models?

1. Building a new large language generative model

2. Training and tuning existing models

3. Prompt tuning

What are the key considerations prior to implementation?

1. Incorporating data governance

2. Addressing data output

3. Necessary skill sets

Getting started

Related content