Build with our next generation AI systems

Explore models chevron_right

Gemini

Our most intelligent AI models

2.5 Pro
2.5 Flash
2.0 Flash-Lite
Learn more

Gemma

Lightweight, state-of-the-art open models

Gemma 3
Gemma 3n
ShieldGemma 2
Learn more

Generative models

Image, music and video generation models

Imagen
Lyria
Veo

Experiments

AI prototypes and experiments

Project Astra
Project Mariner
Gemini Diffusion

Our latest AI breakthroughs and updates from the lab

Explore research chevron_right

Projects

Explore some of the biggest AI innovations

Learn more

Publications

Read a selection of our recent papers

Learn more

News

Discover the latest updates from our lab

Learn more

Unlocking a new era of discovery with AI

Explore science chevron_right

AI for biology

AlphaFold
AlphaMissense
AlphaProteo

AI for climate and sustainability

WeatherNext

AI for mathematics and computer science

AlphaEvolve
AlphaProof
AlphaGeometry

AI for physics and chemistry

GNoME
Fusion
AlphaQubit

AI transparency

SynthID

Our mission is to build AI responsibly to benefit humanity

About Google DeepMind chevron_right

News

Discover our latest AI breakthroughs, projects, and updates

Learn more

Careers

We’re looking for people who want to make a real, positive impact on the world

Learn more

Milestones

For over 20 years, Google has worked to make AI helpful for everyone

Learn more

Education

We work to make AI more accessible to the next generation

Learn more

Responsibility

Ensuring AI safety through proactive security, even against evolving threats

Learn more

The Podcast

Uncover the extraordinary ways AI is transforming our world

Learn more
Models
Research
Science
About

Build with our next generation AI systems

Explore models chevron_right

Gemini

Our most intelligent AI models

2.5 Pro
2.5 Flash
2.0 Flash-Lite
Learn more

Gemma

Lightweight, state-of-the-art open models

Gemma 3
Gemma 3n
ShieldGemma 2
Learn more

Generative models

Image, music and video generation models

Imagen
Lyria
Veo

Experiments

AI prototypes and experiments

Project Astra
Project Mariner
Gemini Diffusion

Our latest AI breakthroughs and updates from the lab

Explore research chevron_right

Projects

Explore some of the biggest AI innovations

Learn more

Publications

Read a selection of our recent papers

Learn more

News

Discover the latest updates from our lab

Learn more

Unlocking a new era of discovery with AI

Explore science chevron_right

AI for biology

AlphaFold
AlphaMissense
AlphaProteo

AI for climate and sustainability

WeatherNext

AI for mathematics and computer science

AlphaEvolve
AlphaProof
AlphaGeometry

AI for physics and chemistry

GNoME
Fusion
AlphaQubit

AI transparency

SynthID

Our mission is to build AI responsibly to benefit humanity

About Google DeepMind chevron_right

News

Discover our latest AI breakthroughs, projects, and updates

Learn more

Careers

We’re looking for people who want to make a real, positive impact on the world

Learn more

Milestones

For over 20 years, Google has worked to make AI helpful for everyone

Learn more

Education

We work to make AI more accessible to the next generation

Learn more

Responsibility

Ensuring AI safety through proactive security, even against evolving threats

Learn more

The Podcast

Uncover the extraordinary ways AI is transforming our world

Learn more
Models

Build with our next generation AI systems

Explore models chevron_right

Gemini

Our most intelligent AI models

2.5 Pro
2.5 Flash
2.0 Flash-Lite
Learn more

Gemma

Lightweight, state-of-the-art open models

Gemma 3
Gemma 3n
ShieldGemma 2
Learn more

Generative models

Image, music and video generation models

Imagen
Lyria
Veo

Experiments

AI prototypes and experiments

Project Astra
Project Mariner
Gemini Diffusion
Research

Our latest AI breakthroughs and updates from the lab

Explore research chevron_right

Projects

Explore some of the biggest AI innovations

Learn more

Publications

Read a selection of our recent papers

Learn more

News

Discover the latest updates from our lab

Learn more
Science

Unlocking a new era of discovery with AI

Explore science chevron_right

AI for biology

AlphaFold
AlphaMissense
AlphaProteo

AI for climate and sustainability

WeatherNext

AI for mathematics and computer science

AlphaEvolve
AlphaProof
AlphaGeometry

AI for physics and chemistry

GNoME
Fusion
AlphaQubit

AI transparency

SynthID
About

Our mission is to build AI responsibly to benefit humanity

About Google DeepMind chevron_right

News

Discover our latest AI breakthroughs, projects, and updates

Learn more

Careers

We’re looking for people who want to make a real, positive impact on the world

Learn more

Milestones

For over 20 years, Google has worked to make AI helpful for everyone

Learn more

Education

We work to make AI more accessible to the next generation

Learn more

Responsibility

Ensuring AI safety through proactive security, even against evolving threats

Learn more

The Podcast

Uncover the extraordinary ways AI is transforming our world

Learn more
Try Google AI Studio Try Gemini
Google DeepMind
Google AI Learn about all of our AI Google DeepMind Explore the frontier of AI Google Labs Try our AI experiments Google Research Explore our research
Gemini app Chat with Gemini Google AI Studio Build with our next-gen AI models
Models Research Science About
Try Google AI Studio Try Gemini

Research

Building safer dialogue agents

Published
22 September 2022
Authors

The Sparrow team

Share

  • ×

Training an AI to communicate in a way that’s more helpful, correct, and harmless

In recent years, large language models (LLMs) have achieved success at a range of tasks such as question answering, summarisation, and dialogue. Dialogue is a particularly interesting task because it features flexible and interactive communication. However, dialogue agents powered by LLMs can express inaccurate or invented information, use discriminatory language, or encourage unsafe behaviour.

To create safer dialogue agents, we need to be able to learn from human feedback. Applying reinforcement learning based on input from research participants, we explore new methods for training dialogue agents that show promise for a safer system.

In our latest paper, we introduce Sparrow – a dialogue agent that’s useful and reduces the risk of unsafe and inappropriate answers. Our agent is designed to talk with a user, answer questions, and search the internet using Google when it’s helpful to look up evidence to inform its responses.

Our new conversational AI model replies on its own to an initial human prompt.

Sparrow is a research model and proof of concept, designed with the goal of training dialogue agents to be more helpful, correct, and harmless. By learning these qualities in a general dialogue setting, Sparrow advances our understanding of how we can train agents to be safer and more useful – and ultimately, to help build safer and more useful artificial general intelligence (AGI).

Sparrow declining to answer a potentially harmful question.

How Sparrow works

Training a conversational AI is an especially challenging problem because it’s difficult to pinpoint what makes a dialogue successful. To address this problem, we turn to a form of reinforcement learning (RL) based on people's feedback, using the study participants’ preference feedback to train a model of how useful an answer is.

To get this data, we show our participants multiple model answers to the same question and ask them which answer they like the most. Because we show answers with and without evidence retrieved from the internet, this model can also determine when an answer should be supported with evidence.

We ask study participants to evaluate and interact with Sparrow either naturally or adversarially, continually expanding the dataset used to train Sparrow.

But increasing usefulness is only part of the story. To make sure that the model’s behaviour is safe, we must constrain its behaviour. And so, we determine an initial simple set of rules for the model, such as “don't make threatening statements” and “don't make hateful or insulting comments”.

We also provide rules around possibly harmful advice and not claiming to be a person. These rules were informed by studying existing work on language harms and consulting with experts. We then ask our study participants to talk to our system, with the aim of tricking it into breaking the rules. These conversations then let us train a separate ‘rule model’ that indicates when Sparrow's behaviour breaks any of the rules.

Towards better AI and better judgments

Verifying Sparrow’s answers for correctness is difficult even for experts. Instead, we ask our participants to determine whether Sparrow's answers are plausible and whether the evidence Sparrow provides actually supports the answer. According to our participants, Sparrow provides a plausible answer and supports it with evidence 78% of the time when asked a factual question. This is a big improvement over our baseline models. Still, Sparrow isn't immune to making mistakes, like hallucinating facts and giving answers that are off-topic sometimes.

Sparrow also has room for improving its rule-following. After training, participants were still able to trick it into breaking our rules 8% of the time, but compared to simpler approaches, Sparrow is better at following our rules under adversarial probing. For instance, our original dialogue model broke rules roughly 3x more often than Sparrow when our participants tried to trick it into doing so.

Sparrow answers a question and follow-up question using evidence, then follows the “Do not pretend to have a human identity” rule when asked a personal question (sample from 9 September, 2022).

Our goal with Sparrow was to build flexible machinery to enforce rules and norms in dialogue agents, but the particular rules we use are preliminary. Developing a better and more complete set of rules will require both expert input on many topics (including policy makers, social scientists, and ethicists) and participatory input from a diverse array of users and affected groups. We believe our methods will still apply for a more rigorous rule set.

Sparrow is a significant step forward in understanding how to train dialogue agents to be more useful and safer. However, successful communication between people and dialogue agents should not only avoid harm but be aligned with human values for effective and beneficial communication, as discussed in recent work on aligning language models with human values.

We also emphasise that a good agent will still decline to answer questions in contexts where it is appropriate to defer to humans or where this has the potential to deter harmful behaviour. Finally, our initial research focused on an English-speaking agent, and further work is needed to ensure similar results across other languages and cultural contexts.

In the future, we hope conversations between humans and machines can lead to better judgments of AI behaviour, allowing people to align and improve systems that might be too complex to understand without machine help.

Eager to explore a conversational path to safe AGI? We’re currently hiring research scientists for our Scalable Alignment team.

Related posts

View all posts

  • Research

    In conversation with AI: building better language models

    Our new paper, In conversation with AI: aligning language models with human values, explores a different approach, asking what successful communication between humans and an artificial...

    6 September 2022
  • Research

    Tackling multiple tasks with a single visual language model

    We introduce Flamingo, a single visual language model (VLM) that sets a new state of the art in few-shot learning on a wide range of open-ended multimodal tasks.

    28 April 2022
  • Responsibility & Safety

    Language modelling at scale: Gopher, ethical considerations, and retrieval

    Language, and its role in demonstrating and facilitating comprehension - or intelligence - is a fundamental part of being human. It gives people the ability to communicate thoughts and concepts,...

    8 December 2021
Follow us
footer__x
footer__instagram
footer__youtube
footer__linkedin
footer__github
Build AI responsibly to benefit humanity
Models
Build with our next generation AI systems
Gemini
Gemma
Veo
Imagen
Lyria
Science
Unlocking a new era of discovery with AI
AlphaFold
SynthID
WeatherNext
Learn more
About News Careers Research Responsibility & Safety
Sign up for updates on our latest innovations

I accept Google's Terms and Conditions and acknowledge that my information will be used in accordance with Google's Privacy Policy.

Please enter a valid email (e.g., "[email protected]")
About Google
Google products
Privacy
Terms