Prompt Engineering: Text Summarization


Info-overload is real
Global AI spending topped $184 billion in 2024, spawning a flood of 100-page industry reports that few humans have time (or patience) to read.
The AI boom meets privacy backlash
At the same time, the average data-breach bill climbed to $4.88 million, pushing executives to demand lightning-fast insight into privacy risks hidden inside those tomes.
The analyst's headache
Manual summarization steals hours and often drifts off-topic, leaving teams scrambling for the very nuggets senior leadership cares about - AI adoption trends and privacy fallout.
Let LLM read for you
Large-language models (LLMs) can ingest tens of thousands of tokens in seconds; give them precise marching orders, and they'll return an executive brief while you top up your coffee.
Data preparation checklist
- Extract PDF -> report.txt (strip headers and footers)
- Add a tiny
meta.json
(title, publisher, date) - Validate encoding (UTF-8) so the model doesn't choke on funky characters
The perfect prompt in four lines
You are an expert market-research analyst. Summarize the text between the three backticks in no more than 5 sentences. Highlight AI adoption and data privacy impact on customers. ```{report_text}```
Role -> Task -> Focus -> Delimiter: the classic OpenAI best-practice recipe.
Summarizing ultra-long reports
When a document busts the model's context window, use chunk -> summarize -> summary-of-summaries
: a recursive strategy proven in the OpenAI Cookbook and echoed by recent "Dynamic Chunking & Selection" research.
Sample output of a long report
"The adoption of AI technologies is significantly influenced by stringent data privacy regulations, such as GDPR and CCPA, which impose complex compliance requirements on businesses. These regulations necessitate the implementation of robust safeguards to protect customer data, thereby mitigating risks associated with data breaches and unauthorized access. As AI continues to be integrated into various sectors, companies must prioritize ethical practices and transparency to maintain consumer trust. This focus on data privacy not only helps in compliance but also serves as a competitive advantage in an increasingly privacy-conscious market. Ultimately, successful AI adoption hinges on balancing innovation with a commitment to safeguarding customer privacy."
How to do it yourself
1from __future__ import annotations
2
3import os
4import sys
5import textwrap
6from pathlib import Path
7from typing import List
8
9import tiktoken
10from dotenv import load_dotenv
11from openai import OpenAI
12
13MODEL = "gpt-4o-mini"
14CHUNK_TOKENS = 3_000 # Put the threshold you think might be appropriate
15SUMMARY_TOKENS = 256 # Set summary tokens to your comfort
16
17load_dotenv()
18api_key = os.getenv("OPENAI_API_KEY") or sys.exit(
19 "❌ OPENAI_API_KEY missing: export it or add to .env"
20)
21client = OpenAI(api_key=api_key)
22enc = tiktoken.encoding_for_model(MODEL)
23
24def num_tokens(text: str) -> int:
25 """Return rough token count using tiktoken."""
26 return len(enc.encode(text))
27
28def chunk_text(text: str, tokens_per_chunk: int) -> List[str]:
29 """
30 Naive word-based splitter to keep chunks under the model limit.
31 Swap this with a smarter sentence-similarity splitter for best results.
32 """
33 words, current, chunks = text.split(), [], []
34 for word in words:
35 current.append(word)
36 if num_tokens(" ".join(current)) >= tokens_per_chunk:
37 chunks.append(" ".join(current))
38 current = []
39 if current: # remainder
40 chunks.append(" ".join(current))
41 return chunks
42
43def summarize_chunk(chunk: str) -> str:
44 """Summarise one chunk in ONE sentence, focused on AI + privacy."""
45 prompt = textwrap.dedent(f"""
46 You are an expert market-research analyst.
47
48 Summarise the text between triple back-ticks in **one sentence**.
49 Stay laser-focused on *AI adoption* and *data-privacy impact on customers*.
50 Strictly ignore unrelated details.
51
52 ```{chunk}```
53 """)
54 res = client.chat.completions.create(
55 model=MODEL,
56 messages=[{"role": "user", "content": prompt}],
57 temperature=0.2,
58 max_tokens=SUMMARY_TOKENS,
59 )
60 return res.choices[0].message.content.strip()
61
62def summarize_file(path: Path) -> str:
63 """Main orchestration – handles short and long texts alike."""
64 raw = path.read_text(encoding="utf-8")
65
66 # If small enough, single pass is fine
67 if num_tokens(raw) <= CHUNK_TOKENS:
68 return summarize_chunk(raw)
69 print(f"Chunk {num_tokens(raw)}")
70
71 # Otherwise: chunk ➜ first-pass ➜ final synthesis
72 chunks = chunk_text(raw, CHUNK_TOKENS)
73 first_pass = [summarize_chunk(c) for c in chunks]
74
75 second_prompt = textwrap.dedent(f"""
76 You are an expert market-research analyst.
77
78 Combine the following bullet summaries into a cohesive **five-sentence**
79 executive briefing. Emphasise ONLY insights related to AI adoption
80 and data-privacy impact on customers.
81
82 Bullets:
83 - {'\n- '.join(first_pass)}
84 """)
85 res = client.chat.completions.create(
86 model=MODEL,
87 messages=[{"role": "user", "content": second_prompt}],
88 temperature=0.2,
89 max_tokens=SUMMARY_TOKENS,
90 )
91 return res.choices[0].message.content.strip()
92
93
94def main() -> None:
95 if len(sys.argv) != 2:
96 sys.exit("Usage: python summarize.py path/to/report.txt")
97
98 report_path = Path(sys.argv[1])
99 if not report_path.exists():
100 sys.exit(f"File not found: {report_path}")
101
102 summary = summarize_file(report_path)
103 print("\n— Executive Summary —\n")
104 print(summary)
105
106
107if __name__ == "__main__":
108 main()
This will allow you to summarize small reports as well as ultra-long reports.
Here are also some cool resources you can use in your report.txt
: https://www.researchgate.net/publication/385781993_Privacy_and_data_security_concerns_in_AI
What are the benefits of prompt engineering...well, where to start?
- Time-saving: 3 hours is now 3 minutes per report
- Improved relevance: Fine-tunes outputs to focus on the most important information
- Scalability: Handles thousands of documents automatically, without human bottlenecks
- Cost optimization: Minimizes manual labor and API usage through precise prompts
Conclusion
With the right prompt, mountains of prose become your highlight reel - so you spend less time wrestling with words and more time spotting the patterns that matter. Prompt-engineered summarization isn't magic; It's your compass through the data deluge, guiding you straight to the "aha" moments. Whether you're an analyst racing against deadlines or a curious mind exploring new domains, these tiny tweaks to your instructions unlock clarity, creativity, and surprising depth. So the next time you're staring down a thousand-page report, remember: a carefully crafted prompt is your best co-pilot. Give it a spin, tweak it, make it yours - and watch as the flood of information turns into a steady stream of actionable insight.