Skip to content

[FR] Document Object Model Integration #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AdamSobieski opened this issue Jan 9, 2025 · 2 comments
Open

[FR] Document Object Model Integration #70

AdamSobieski opened this issue Jan 9, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@AdamSobieski
Copy link

AdamSobieski commented Jan 9, 2025

Introduction

What if, in addition to text-string prompts, DOM documents could be used as prompts?

This would enable model-independent multimodal prompting in a manner intuitive to Web developers.

As considered, such multimodal prompts could utilize a subset of HTML5 markup tags including those for: sections and paragraphs of text, source code, mathematics, lists, tables, images, audio, video, and embedded files and data.

The prompt() and promptStreaming() functions on sessions could distinguish between provided arguments of types string and Document.

Text

const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
const p = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'p');
p.append('This is some prompt content.');
html.append(p);
the_prompt.append(html);
const result = await session.prompt(the_prompt);

Mathematics

const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const math = the_prompt.createElementNS('http://www.w3.org/1998/Math/MathML', 'math');
// ...
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
html.append(math);
the_prompt.append(html);
const result = await session.prompt(the_prompt);

Lists

const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const ol = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'ol');
// ...
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
html.append(ol);
the_prompt.append(html);
const result = await session.prompt(the_prompt);

Tables

const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const table = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'table');
// ...
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
html.append(table);
the_prompt.append(html);
const result = await session.prompt(the_prompt);

Images

const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
const img = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'img');
img.setAttributeNS('http://www.w3.org/1999/xhtml', 'src', 'data:image/png;base64,...');
html.append(img);
the_prompt.append(html);
const result = await session.prompt(the_prompt);
const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
const img = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'img');
img.setAttributeNS('http://www.w3.org/1999/xhtml', 'src', 'https://example.org/media/picture-123.png');
html.append(img);
the_prompt.append(html);
const result = await session.prompt(the_prompt);

Audio

const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
const audio = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'audio');
audio.setAttributeNS('http://www.w3.org/1999/xhtml', 'src', 'https://example.org/media/audio-123.mp3');
html.append(audio);
the_prompt.append(html);
const result = await session.prompt(the_prompt);

Video

const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
const video = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'video');
video.setAttributeNS('http://www.w3.org/1999/xhtml', 'src', 'https://example.org/media/video-123.mpeg');
html.append(video);
the_prompt.append(html);
const result = await session.prompt(the_prompt);

Embedding Files and Data

const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
const embed = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'embed');
embed.setAttributeNS('http://www.w3.org/1999/xhtml', 'type', 'text/csv');
embed.setAttributeNS('http://www.w3.org/1999/xhtml', 'src', 'https://example.org/data/data-123.csv');
html.append(embed);
the_prompt.append(html);
const result = await session.prompt(the_prompt);

Metadata

const the_prompt = document.implementation.createDocument('...', 'prompt', null);
const html = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'html');
const head = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'head');
const body = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'body');
const meta = the_prompt.createElementNS('http://www.w3.org/1999/xhtml', 'meta');
meta.setAttributeNS('http://www.w3.org/1999/xhtml', 'name', 'author');
meta.setAttributeNS('http://www.w3.org/1999/xhtml', 'content', 'Bob Smith');
head.append(meta);
html.append(head);
body.append('This is some prompt content.');
html.append(body);
the_prompt.append(html);
const result = await session.prompt(the_prompt);

Prompt Markup Language

<prompt xmlns="..." version="1.0">
  <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
      <meta name="author" content="Bob Smith" />
    head>
    <body>
      <p>This is some prompt content.p>
      <img src="https://example.org/media/picture-123.png" />
      <embed src="https://example.org/data/data-123.csv" />
    body>
  html>
prompt>

Prompt Templates

<prompt xmlns="..." version="1.0">
  <html xmlns="http://www.w3.org/1999/xhtml" xmlns:promptml="...">
    <head>
      <meta name="author" content="Bob Smith" />
    head>
    <body>
      <p>This is some prompt content.p>
      <img src="https://example.org/media/picture-123.png" />
      <embed src="https://example.org/data/data-123.csv" />
      <p>Templates could be <promptml:template promptml:key="t1" />.p>
    body>
  html>
prompt>
const result = await session.prompt(the_prompt, { templates: { t1: "useful" } });
<prompt xmlns="..." version="1.0">
  <html xmlns="http://www.w3.org/1999/xhtml" xmlns:promptml="...">
    <head>
      <meta name="author" content="Bob Smith" />
    head>
    <body>
      <p>This is some prompt content.p>
      <img src="https://example.org/media/picture-123.png" />
      <embed src="https://example.org/data/data-123.csv" />
      <p>Templates could be useful.p>
    body>
  html>
prompt>

Prompt Events

<prompt xmlns="..." version="1.0" onenter="..." onexit="...">
  <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
      <meta name="author" content="Bob Smith" />
    head>
    <body>
      <p>This is some prompt content.p>
      <img src="https://example.org/media/picture-123.png" />
      <embed src="https://example.org/data/data-123.csv" />
    body>
  html>
prompt>
<prompt xmlns="..." version="1.0">
  <html xmlns="http://www.w3.org/1999/xhtml" xmlns:promptml="...">
    <head>
      <meta name="author" content="Bob Smith" />
    head>
    <body>
      <p>This is some prompt content.p>
      <img src="https://example.org/media/picture-123.png" />
      <embed promptml:onenter="..." promptml:onexit="..." src="..." />
    body>
  html>
prompt>

Exchange Markup Language

<exchange xmlns="..." version="1.0">
  <provide type="text/plain">This is some prompt content.provide>
  <expect type="text/plain" />
exchange>
<exchange xmlns="..." version="1.0">
  <provide type="text/plain">This is some prompt content.provide>
  <expect type="application/json">
    <schema type="application/schema+json" src="..." />
  expect>
exchange>
<exchange xmlns="..." version="1.0">
  <provide type="application/promptml+xml">
    <prompt xmlns="..." version="1.0">
      <html xmlns="http://www.w3.org/1999/xhtml">
        <p>This is some prompt content.p>
      html>
    prompt>
  provide>
  <expect type="application/json">
    <schema type="application/schema+json" src="..." />
  expect>
exchange>
domenic added a commit that referenced this issue Jan 20, 2025
Closes #40. Somewhat helps with #70.
domenic added a commit that referenced this issue Jan 20, 2025
Closes #40. Somewhat helps with #70.
@domenic domenic added the enhancement New feature or request label Jan 23, 2025
domenic added a commit that referenced this issue Feb 25, 2025
Closes #40. Somewhat helps with #70.
@AdamSobieski
Copy link
Author

AdamSobieski commented Mar 29, 2025

Here is a current version of the slideshow that I hope to present at the 03-31 meeting, time permitting: Prompt-API-Plus-DOM.pptx .

@AdamSobieski
Copy link
Author

AdamSobieski commented Mar 31, 2025

@domenic, as asked during the meeting: what are some of the benefits and use-case scenarios that would be enabled or simplified by having markup-based prompts in addition to text-based prompts?

  1. A less steep learning curve for Web developers to get started with multimodal prompts.
  2. The portability of multimodal prompts across models.
  3. Web developers could create, store, load, share, and reuse multimodal prompts as files or resources.
    1. Prompts and chat histories could be stored as files, served from servers, and stored within EPUB containers.
const response = await fetch('https://example.org/prompts/prompt-123.promptml');
const text = await response.text();
const parser = new DOMParser();
const the_prompt = parser.parseFromString(text, 'application/xml');
const result = await session.prompt(the_prompt);
  1. Prompt-related templating features.
  2. Prompt-related JavaScript events.

A related question is: which features could not be provided – either at all or in the same way – using a JavaScript library which uses or encapsulates the Prompt API?

  1. The portability of multimodal prompts across models.
    1. The transformation or transpiling of hypertext-based prompts and their components into those formats and styles preferred by individual models.
  2. Prompt-related JavaScript events.

A new question is: should a Prompt Markup Language be able to express, in addition to user prompts, system prompts and their components, e.g., tool-definition sections, and/or chat histories, e.g., sequences of prompts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants