Lazar Stankovic

Info-overload is real

Global AI spending topped $184 billion in 2024, spawning a flood of 100-page industry reports that few humans have time (or patience) to read.

The AI boom meets privacy backlash

At the same time, the average data-breach bill climbed to $4.88 million, pushing executives to demand lightning-fast insight into privacy risks hidden inside those tomes.

The analyst's headache

Manual summarization steals hours and often drifts off-topic, leaving teams scrambling for the very nuggets senior leadership cares about - AI adoption trends and privacy fallout.

Let LLM read for you

Large-language models (LLMs) can ingest tens of thousands of tokens in seconds; give them precise marching orders, and they'll return an executive brief while you top up your coffee.

Data preparation checklist

Extract PDF -> report.txt (strip headers and footers)
Add a tiny meta.json (title, publisher, date)
Validate encoding (UTF-8) so the model doesn't choke on funky characters

The perfect prompt in four lines

You are an expert market-research analyst. Summarize the text between the three backticks in no more than 5 sentences. Highlight AI adoption and data privacy impact on customers. ```{report_text}```

Role -> Task -> Focus -> Delimiter: the classic OpenAI best-practice recipe.

Summarizing ultra-long reports

When a document busts the model's context window, use chunk -> summarize -> summary-of-summaries: a recursive strategy proven in the OpenAI Cookbook and echoed by recent "Dynamic Chunking & Selection" research.

Sample output of a long report

"The adoption of AI technologies is significantly influenced by stringent data privacy regulations, such as GDPR and CCPA, which impose complex compliance requirements on businesses. These regulations necessitate the implementation of robust safeguards to protect customer data, thereby mitigating risks associated with data breaches and unauthorized access. As AI continues to be integrated into various sectors, companies must prioritize ethical practices and transparency to maintain consumer trust. This focus on data privacy not only helps in compliance but also serves as a competitive advantage in an increasingly privacy-conscious market. Ultimately, successful AI adoption hinges on balancing innovation with a commitment to safeguarding customer privacy."

How to do it yourself

1from __future__ import annotations
2
3import os
4import sys
5import textwrap
6from pathlib import Path
7from typing import List
8
9import tiktoken
10from dotenv import load_dotenv
11from openai import OpenAI
12
13MODEL = "gpt-4o-mini"
14CHUNK_TOKENS = 3_000 # Put the threshold you think might be appropriate
15SUMMARY_TOKENS = 256 # Set summary tokens to your comfort
16
17load_dotenv()
18api_key = os.getenv("OPENAI_API_KEY") or sys.exit(
19    "❌  OPENAI_API_KEY missing: export it or add to .env"
20)
21client = OpenAI(api_key=api_key)
22enc = tiktoken.encoding_for_model(MODEL)
23
24def num_tokens(text: str) -> int:
25    """Return rough token count using tiktoken."""
26    return len(enc.encode(text))
27
28def chunk_text(text: str, tokens_per_chunk: int) -> List[str]:
29    """
30    Naive word-based splitter to keep chunks under the model limit.
31    Swap this with a smarter sentence-similarity splitter for best results.
32    """
33    words, current, chunks = text.split(), [], []
34    for word in words:
35        current.append(word)
36        if num_tokens(" ".join(current)) >= tokens_per_chunk:
37            chunks.append(" ".join(current))
38            current = []
39    if current:                       # remainder
40        chunks.append(" ".join(current))
41    return chunks
42
43def summarize_chunk(chunk: str) -> str:
44    """Summarise one chunk in ONE sentence, focused on AI + privacy."""
45    prompt = textwrap.dedent(f"""
46        You are an expert market-research analyst.
47
48        Summarise the text between triple back-ticks in **one sentence**.
49        Stay laser-focused on *AI adoption* and *data-privacy impact on customers*.
50        Strictly ignore unrelated details.
51
52        ```{chunk}```
53    """)
54    res = client.chat.completions.create(
55        model=MODEL,
56        messages=[{"role": "user", "content": prompt}],
57        temperature=0.2,
58        max_tokens=SUMMARY_TOKENS,
59    )
60    return res.choices[0].message.content.strip()
61
62def summarize_file(path: Path) -> str:
63    """Main orchestration – handles short and long texts alike."""
64    raw = path.read_text(encoding="utf-8")
65
66    # If small enough, single pass is fine
67    if num_tokens(raw) <= CHUNK_TOKENS:
68        return summarize_chunk(raw)
69    print(f"Chunk {num_tokens(raw)}")
70
71    # Otherwise: chunk ➜ first-pass ➜ final synthesis
72    chunks = chunk_text(raw, CHUNK_TOKENS)
73    first_pass = [summarize_chunk(c) for c in chunks]
74
75    second_prompt = textwrap.dedent(f"""
76        You are an expert market-research analyst.
77
78        Combine the following bullet summaries into a cohesive **five-sentence**
79        executive briefing. Emphasise ONLY insights related to AI adoption
80        and data-privacy impact on customers.
81
82        Bullets:
83        - {'\n- '.join(first_pass)}
84    """)
85    res = client.chat.completions.create(
86        model=MODEL,
87        messages=[{"role": "user", "content": second_prompt}],
88        temperature=0.2,
89        max_tokens=SUMMARY_TOKENS,
90    )
91    return res.choices[0].message.content.strip()
92
93
94def main() -> None:
95    if len(sys.argv) != 2:
96        sys.exit("Usage: python summarize.py path/to/report.txt")
97
98    report_path = Path(sys.argv[1])
99    if not report_path.exists():
100        sys.exit(f"File not found: {report_path}")
101
102    summary = summarize_file(report_path)
103    print("\n— Executive Summary —\n")
104    print(summary)
105
106
107if __name__ == "__main__":
108    main()

This will allow you to summarize small reports as well as ultra-long reports.

Here are also some cool resources you can use in your report.txt: https://www.researchgate.net/publication/385781993_Privacy_and_data_security_concerns_in_AI

What are the benefits of prompt engineering...well, where to start?

Time-saving: 3 hours is now 3 minutes per report
Improved relevance: Fine-tunes outputs to focus on the most important information
Scalability: Handles thousands of documents automatically, without human bottlenecks
Cost optimization: Minimizes manual labor and API usage through precise prompts

Conclusion

With the right prompt, mountains of prose become your highlight reel - so you spend less time wrestling with words and more time spotting the patterns that matter. Prompt-engineered summarization isn't magic; It's your compass through the data deluge, guiding you straight to the "aha" moments. Whether you're an analyst racing against deadlines or a curious mind exploring new domains, these tiny tweaks to your instructions unlock clarity, creativity, and surprising depth. So the next time you're staring down a thousand-page report, remember: a carefully crafted prompt is your best co-pilot. Give it a spin, tweak it, make it yours - and watch as the flood of information turns into a steady stream of actionable insight.