5 Powerful Python Decorators to Optimize LLM Applications

Image by Editor

# Introduction

Python decorators are tailor-made solutions that are designed to help simplify complex software logic in a variety of applications, including LLM-based ones. Dealing with LLMs often involves coping with unpredictable, slow—and frequently expensive—third-party APIs, and decorators have a lot to offer for making this task cleaner by wrapping, for instance, API calls with optimized logic.

Let’s take a look at five useful Python decorators that will help you optimize your LLM-based applications without noticeable extra burden.

The accompanying examples illustrate the syntax and approach to using each decorator. They are sometimes shown without actual LLM use, but they are code excerpts ultimately designed to be part of larger applications.

# 1. In-memory Caching

This solution comes from Python’s functools standard library, and it is useful for expensive functions like those using LLMs. If we had an LLM API call in the function defined below, wrapping it in an LRU (Least Recently Used) decorator adds a cache mechanism that prevents redundant requests containing identical inputs (prompts) in the same execution or session. This is an elegant way to optimize latency issues.

This example illustrates its use:

from functools import lru_cache
import time

@lru_cache(maxsize=100)
def summarize_text(text: str) -> str:
print(“Sending text to LLM…”)
time.sleep(1) # A simulation of network delay
return f”Summary of {len(text)} characters.”

print(summarize_text(“The quick brown fox.”)) # Takes one second
print(summarize_text(“The quick brown fox.”)) # Instant

# 2. Caching On Persistent Disk

Speaking of caching, the external library diskcache takes it a step further by implementing a persistent cache on disk, namely via a SQLite database: very useful for storing results of time-consuming functions such as LLM API calls. This way, results can be quickly retrieved in later calls when needed. Consider using this decorator pattern when in-memory caching is not sufficient because the execution of a script or application may stop.

import time
from diskcache import Cache

# Creating a lightweight local SQLite database directory
cache = Cache(“.local_llm_cache”)

@cache.memoize(expire=86400) # Cached for 24 hours
def fetch_llm_response(prompt: str) -> str:
print(“Calling expensive LLM API…”) # Replace this by an actual LLM API call
time.sleep(2) # API latency simulation
return f”Response to: {prompt}”

print(fetch_llm_response(“What is quantum computing?”)) # 1st function call
print(fetch_llm_response(“What is quantum computing?”)) # Instant load from disk happens here!

# 3. Network-resilient Apps

Since LLMs may often fail due to transient errors as well as timeouts and “502 Bad Gateway” responses on the Internet, using a network resilience library like tenacity along with the @retry decorator can help intercept these common network failures.

The example below illustrates this implementation of resilient behavior by randomly simulating a 70% chance of network error. Try it several times, and sooner or later you will see this error coming up: totally expected and intended!

import random
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type

class RateLimitError(Exception): pass

# Retrying up to 4 times, waiting 2, 4, and 8 seconds between each attempt
@retry(
wait=wait_exponential(multiplier=2, min=2, max=10),
stop=stop_after_attempt(4),
retry=retry_if_exception_type(RateLimitError)
)
def call_flaky_llm_api(prompt: str):
print(“Attempting to call API…”)
if random.random() < 0.7: # Simulating a 70% chance of API failure
raise RateLimitError(“Rate limit exceeded! Backing off.”)
return “Text has been successfully generated!”

print(call_flaky_llm_api(“Write a haiku”))

# 4. Client-side Throttling

This combined decorator uses the ratelimit library to control the frequency of calls to a (usually highly demanded) function: useful to avoid client-side limits when using external APIs. The following example does so by defining Requests Per Minute (RPM) limits. The provider will reject prompts from a client application when too many concurrent prompts are launched.

from ratelimit import limits, sleep_and_retry
import time

# Strictly enforcing a 3-call limit per 10-second window
@sleep_and_retry
@limits(calls=3, period=10)
def generate_text(prompt: str) -> str:
print(f”[{time.strftime(‘%X’)}] Processing: {prompt}”)
return f”Processed: {prompt}”

# First 3 print immediately, the 4th pauses, thereby respecting the limit
for i in range(5):
generate_text(f”Prompt {i}”)

# 5. Structured Output Binding

The fifth decorator on the list uses the magentic library in conjunction with Pydantic to provide an efficient interaction mechanism with LLMs via API, and obtain structured responses. It simplifies the process of calling LLM APIs. This process is important for coaxing LLMs to return formatted data like JSON objects in a reliable fashion. The decorator would handle underlying system prompts and Pydantic-led parsing, optimizing the usage of tokens as a result and helping keep a cleaner codebase.

To try this example out, you will need an OpenAI API key.

# IMPORTANT: An OPENAI_API_KEY set is required to run this simulated example
from magentic import prompt
from pydantic import BaseModel

class CapitalInfo(BaseModel):
capital: str
population: int

# A decorator that easily maps the prompt to the Pydantic return type
@prompt(“What is the capital and population of {country}?”)
def get_capital_info(country: str) -> CapitalInfo:
… # No function body needed here!

info = get_capital_info(“France”)
print(f”Capital: {info.capital}, Population: {info.population}”)

# Wrapping Up

In this article, we listed and illustrated five Python decorators based on diverse libraries that take on particular significance when used in the context of LLM-based applications to simplify logic, make processes more efficient, or improve network resilience, among other aspects.

Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

What's Hot

Nuki’s one-touch retrofit smart lock got its first-ever discount

The Sony WF-1000XM6 just quietly scored their first-ever discount during Amazon’s Big Spring Sale

At Gaza’s Al-Shifa Hospital, the War Isn’t Over

At Gaza’s Al-Shifa Hospital, the War Isn’t Over

Build an AI Meeting Summarizer with Claude Code + MCP

Iran Is Winning the AI Slop Propaganda War

7 Free Web APIs Every Developer and Vibe Coder Should Know

One Way or Another, Most of Our Electricity Comes From Solar Power

Google Releases Gemini 3.1 Flash Live: A Real-Time Multimodal Voice Model for Low-Latency Audio, Video, and Tool Use for AI Agents

Nuki’s one-touch retrofit smart lock got its first-ever discount

The Sony WF-1000XM6 just quietly scored their first-ever discount during Amazon’s Big Spring Sale

At Gaza’s Al-Shifa Hospital, the War Isn’t Over

Nuki’s one-touch retrofit smart lock got its first-ever discount

The Sony WF-1000XM6 just quietly scored their first-ever discount during Amazon’s Big Spring Sale

At Gaza’s Al-Shifa Hospital, the War Isn’t Over

Usefull link

categories

What's Hot

5 Powerful Python Decorators to Optimize LLM Applications

# Introduction

# 1. In-memory Caching

# 2. Caching On Persistent Disk

# 3. Network-resilient Apps

# 4. Client-side Throttling

# 5. Structured Output Binding

# Wrapping Up

Related Posts

Usefull link

categories