I recently started an AI-focused educational newsletter, that already has over 170,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:

Prompt engineering is a necessary evil. As a computer scientist, I hate this idea of typing the right word and hope that it produces the right outcome. However, high quality prompts are an important component of LLM applications today so we need tools and frameworks that help with this task. One of the most interesting recent toolsets in this category comes from Character.ai, the one highflying AI startup that recently entereted in a strategic transaction with Google. PromptPoet is a flexible framework developed to design and interate over high quality prompts.

PromptPoet

Designed to assist both developers and non-technical users, PromptPoet enables efficient design and management of prompts without the need for intricate string manipulations. This tool allows users to focus on creating effective prompts, freeing them from the technicalities of coding.

Part of the secret of PromptPoet is based on its usage of Python f-strings which have become the go-to method for many developers. While straightforward at first, f-strings can quickly escalate in complexity, requiring significant manual effort to construct elaborate prompts. This complexity poses a barrier for non-technical users, as it demands programming knowledge.

Here is a basic example of using PromptPoet:

- name: system instructions
 role: system
 content: |
 Your name is  and you are meant to be helpful and never harmful to humans.

- name: user query
 role: user
 content: |
 : 

- name: response
 role: user
 content: |
 :

Drawing inspiration from UI design, PromptPoet treats a prompt as a dynamic function of its runtime state, encompassing elements like the template, data, and token limit.

Prompt Templates

PromptPoet shifts the focus from prompt engineering to design, empowering users to iterate on templates instead of code. Utilizing a blend of YAML and Jinja2, these templates are both flexible and easily adaptable, facilitating efficient prompt creation and management. Template processing involves two main stages:

Rendering: This initial phase involves Jinja2 processing input data, executing control logic, validating data, binding it to variables, and evaluating functions within the template.
Loading: Post-rendering, the output is structured into a YAML file, consisting of repeated segments, each organized into a Python data structure. These segments have several key attributes:
Name: A clear identifier for the segment.
Content: The string payload forming part of the prompt.
Role (Optional): Specifies participant roles, helping to differentiate between users and system components.
Truncation Priority (Optional): Establishes the order of truncation when needed, with segments of the same priority truncated in sequence.

The PromptPoet Library offers numerous features, including tokenization and truncation, optimizing for efficient caching and rapid responses. These capabilities are crucial for enhancing performance.

prompt.tokenize()
prompt.truncate(token_limit=TOKEN_LIMIT, truncation_step=TRUNCATION_STEP)

# Inspect prompt as a raw string.
prompt.string: str
>>> "..."

# Inpsect the prompt as raw tokens.
prompt.tokens: list[int]
>>> [...]

# Inspect the prompt as LLM API message dicts.
prompt.messages: list[dict]
>>> [...]

# Inspect the prompt as first class parts.
prompt.parts: list[PromptPart]
>>> [...

The Templating Language

Combining Jinja2 and YAML, PromptPoet provides a robust and expressive templating language. Jinja2 allows direct data bindings, function calls, and basic control flow within templates. YAML gives structure, enabling sophisticated truncation when token limits are exceeded. This combination is similar to approaches used in systems like Ansible.

Prompt Portability

At Character.AI, model enhancements are continuous to better align with user preferences. To achieve this, prompts need to be reconstructed in offline environments for evaluation and post-training tasks. Templatizing prompts enables seamless sharing among teams without needing to integrate separate code segments.

Function Calling Within Templates

A notable feature of Jinja2 is its ability to call Python functions directly within templates at runtime. This capability is essential for real-time data retrieval, manipulation, and validation, simplifying prompt construction. For instance, extract_user_query_topic can perform complex processing of user queries used in template control flow.

{% if extract_user_query_topic(user_query) == "homework_help" %}
{% for homework_example in fetch_few_shot_homework_examples(username, character_name) %}
- name: homework_example_
 role: user
 content: |
 
{% endfor %}
{% endif %}

Custom Encoding Options

By default, PromptPoet uses the TikToken “o200k_base” tokenizer. However, users can specify an alternative encoding name via the tiktoken_encoding_name option or provide a custom encoding function through the encode_func: Callable[[str], list[int]] parameter.

from tiktoken import get_encoding
encode_func = get_encoding("o200k_base")

prompt = Prompt(
 raw_template=raw_template,
 template_data=template_data,
 encode_func=encode_func
)
prompt.tokenize()
prompt.tokens
>>> [...]

Understanding Truncation

If your LLM provider supports GPU affinity and prefix cache, Character.AI’s truncation algorithm can be leveraged to maximize prefix-cache rates. This rate is determined by the ratio of cached prompt tokens to the total prompt tokens. Adjusting truncation steps and token limits can optimize cache performance, though increased steps may lead to more token truncation.

Explaining Cache-Aware Truncation

The truncation strategy is pivotal in achieving a high cache rate by optimizing message truncation. Instead of truncating to a fixed token limit each time, the strategy involves truncating up to a stable point every few turns. This approach maintains a continuous token sequence, maximizing GPU prefix cache usage. Moving the truncation point only when necessary ensures efficient resource use.

Naive Truncation Issues

In a chat scenario with messages M1 through M10, naive truncation adjusts the truncation point with each turn, minimizing cache efficiency and increasing computation costs.

Image Credit: Character.ai

Cache-Aware Truncation Benefits

Character.AI’s cache-aware algorithm keeps the truncation point consistent for every few turns, maintaining the token sequence up to the latest message. This allows reuse of cached computations from previous turns, enhancing efficiency. The parameter k reflects the balance between truncation steps and average token count, though it cannot be directly controlled.

Image Credit: Character.ai

PromptPoet is now open source and available which should encourage many developers to build on its capabilities. At the bare minimum, PromptPoet already encapsulates many of the lessons learned at Character.ai in terms of prompt design. Certainly a welcome addition to the prompt engineering space.

Original Post>

Enjoyed this article? Sign up for our newsletter to receive regular insights and stay connected.

Meet PromptPoet: The New Prompt Engineering Framework that Everyone is Talking About

PromptPoet

Explaining Cache-Aware Truncation

Like this:

Related

PromptPoet

Explaining Cache-Aware Truncation

Share this:

Like this:

Related

Discover more from Global Intelligence and Insight Platform: IT Innovation, ETF Investment, plus Health Wellbeing