Crafting Effective Prompts for Summarization Using Large Language Models



 

In an era of abundant information—think of the large numbers of articles published on every possible topic, the time spent at meetings and presentations, or the volume of e-mails you get everyday at work—the ability to distill complex content into concise, insightful summaries is invaluable. Summarization tools have existed for some time, but the recent advent of large language models such as those from the GPT, Bard, or LLaMa series among the most popular ones, has taken summarization to new levels. This is so because large language models do not work as fixed-rule summarizers but rather can “understand” the contents of the input text and produce succinct summaries with much more flexibility than tools not using language models.

In particular, passing the right prompt to the language model can achieve not only plain summarization but also rephrasing in a specific style, changing between passive and active form, changing the narrative perspective, even understanding texts that contain pieces in different languages or tones and writing a consistent summary in a single tone and language. None of these is possible with summarization tools not powered by large language models, which rapidly became rather obsolete.

From my above statement, then, it is obvious that the key to harnessing the full potential of large language models in summarization lies in crafting prompts that guide them effectively. Here I will overview the most important points you must consider when writing prompts expected to produce high-quality summaries across a range of content types.

The power of prompts

Prompts serve as the interface between human intention and AI execution. They provide the necessary instructions and context for the language model to generate summaries that meet the user’s needs. This means they must be complete, from including explicit guidelines and requests on what you expect in its output to including relevant information that provides context. You will see examples of all these points through my article.

Effective prompts share common characteristics that make them powerful tools for extracting meaningful summaries. They are clear, specific, and tailored to the content and context of the task at hand. An ideal prompt leaves no room for ambiguity. It precisely articulates what the user wants to extract from the content. For example, instead of asking, ‘Summarize this article’ a more effective prompt would be something like:

Generate a point-by-point summary of the key findings and arguments in the following article.

Context is crucial in generating relevant summaries. Prompts should provide context about the source material, the target audience, and the desired level of detail. This ensures that the language model “understands” nuances in the content and thus can produce a summary that aligns with the user’s goals. For example, you can include in a prompt information explaining that the text to be summarized is the memo of a meeting, or a scientific article, or a collage of news about a topic, or a podcast, or an e-mail, or an example of the kind of output you expect. I have already written several programs that exploit GPT-3.5 or GPT-4 in ways specialized to make summaries in some of those situations, and I could always see a big impact of the context passed in the prompt, especially at the beginning of it. See for example these two articles:

Depending on the task, prompts can also specify the desired level of abstraction in the summary. For instance, one might request a high-level overview, an in-depth analysis, a brief synthesis of key points, or a list of the key points, and so on. Prompts can also tune the scholarly level of the output; for example, one can ask a language model to summarize a hardcore scientific article “keeping scientific rigor” or “writing it for a 10 year old”, or to explain a piece of code “as pseudocode” or “only the basic ideas behind how it works”. Of course, the requested levels of abstraction and scholarly should align with the purpose of the summary.

Case study: my Summarizer Almighty chatbot

I developed a web app called “Summarizer Almighty” to tackle two problems at once: summarizing texts that are too long to fit as input to language models, and focusing on specific questions or instructions provided by the user -contrary to general summarization.

As I wrote it in first place, Summarizer Almighty utilizes the GPT-3.5-turbo language model to analyze and extract relevant information from lengthy documents based on user-provided questions or instructions. I have just updated it to GPT-4, which is much more expensive to run but provides substantially better responses in many cases.

To do its work, Summarizer Almighty splits the input document into overlapping chunks, which are individually analyzed and summarized through a special prompt of this type:

Can you reply to “[user’s question]” considering the following piece of text taken from the article under study? If not, reply ‘FALSE’ and nothing else. If yes, reply ‘TRUE’ followed by the answer. Here’s the paragraph: “[chunk of text extracted from the document, optimized to around 3000 tokens]”

Sending subsequently its inputs to the language model through its API, Summarizer Almighty accumulates partial answers into a growing paragraph that will contain several “FALSE” outputs and some “TRUE” outputs, the latter followed by explanations or answers as requested. At the very end, the program displays all partial outputs and then calls the language model once more, this time with an injected prompt that says:

Please answer to “[user’s question]” using the information provided below, taken from sections of a longer text: [concatenation of all answers provided by calls that returned TRUE].

Accessible for free (but you must provide a working OpenAI API for GPT models), I think Summarizer Almighty can be valuable in research, news writing, education, business, and other situations where time is limited and detailed information is crucial, saving time and effort. To know more, please check out my dedicated article:

From OpenAI’s own resources on how to craft better prompts

OpenAI’s large language models are probably the most successful ones in the market out there right now. And objectively speaking, they were the first really good ones to roll out. Besides, they are in my opinion the easiest to use in practice -just through API calls and no installs. And the best documented ones…

Indeed, OpenAI’s website has several resources aimed at helping you to write better prompts. And it’s not only about general rules and formats to pass in the prompt, but also about using special tokens and instructions. For example, the page dedicated specifically to best practices for prompt engineering with OpenAI’s API explains specifically that instructions should rather be passed at the beginning (as I also explained earlier in this article based on my own experience), that you can use tokens such as triple quotes ””” or hashtags ### to separate instructions from context, that providing leading words and examples of the kinds of outputs you expect is very helpful, and much more:

OpenAI’s community forum also contains a section specifically designed to asking and answering questions about prompt engineering:

Iterative Refinement

One important point I have observed in my vast experience, mainly with Google’s Bard and with language models of the GPT family, is that often, refining prompts iteratively is essential. You may need to experiment with different phrasings, instructions, or parameters to optimize the quality of the generated summaries. Very often, the first output is not what I want, specially when it has to do with summarization, because the language model can’t magically understand what your exact intentions are, where you want to focus, what tone you expect in the output, and other details that, as discussed above, you must inject into the model through the prompt. When this happens, that is most of the times, I need to either iterate with the language model a few times until I converge to what I expected, or try different starting prompts until I find the model provides the kinds of outputs I expect.

Sometimes you can get a first generation and feed it back to the model together with a new prompt designed to improve the output; but sometimes you need to start over from scratch by entering a totally new prompt. On a few occasions, checking the multiple outputs from the language model or re-running it on the same prompt can result in better generations, but this is often not the case.

One tool that helps regarding some aspects, especially about the quality of the outputs produced by GPT models when used programmatically, is keeping an eye on the probabilities associated to each produced token, as I have described here for GPT-3:

Conclusions

Effective prompts are the key to unlocking the full potential of summarization powered by large language models, ensuring that the produced summaries are relevant, concise, insightful, tailored to exactly what you need, and written the way you need. Well-crafted prompts lead to more precise and relevant summaries, ensuring that the computer program focuses on extracting the information that is relevant to you. Clear prompts reduce the need for extensive post-processing, saving valuable time.

Mastering the -as many call it and I don’t disagree- “art” of crafting prompts will be, or rather is already now, an essential skill in modern work. Whether it’s extracting insights from articles, research papers, meetings, or any other content, proper prompting guides your language model towards generating meaningful and actionable summaries. Put my advice into practice and check this out for yourself.

Original Post>