Python, from the ground up Lesson 39 / 60

Working with APIs: requests, retries, rate limits

The HTTP toolbox in Python, the retry patterns that don't make things worse, and the rate-limit handling that keeps you welcome.

The other half of ingestion is APIs. Someone else’s HTTP service has the data you need. You write a Python client, pull pages, write to disk or a database, do it again tomorrow. Sounds easy. The first naive version always works. The second time you run it, the API is having a bad day, and you discover that “easy” has a long tail of failure modes: timeouts, 503s, 429s, expired tokens, opaque pagination, JSON that’s secretly XML. This lesson is the toolbox for that long tail.

The HTTP libraries in 2026

Three libraries you’ll encounter.

requests — the classic. Sync only. Hasn’t changed much in years; doesn’t need to. The simplest, most readable HTTP code you can write in Python. If you’re writing a one-off script and don’t care about concurrency, requests is still a perfectly fine choice in 2026.

import requests

r = requests.get("https://api.example.com/orders", params={"limit": 100}, timeout=10)
r.raise_for_status()
data = r.json()

httpx — modern, sync and async, mostly drop-in compatible with requests. HTTP/2 support out of the box. The standard recommendation for new code in 2026, and the one I default to.

import httpx

with httpx.Client(timeout=10.0) as client:
    r = client.get("https://api.example.com/orders", params={"limit": 100})
    r.raise_for_status()
    data = r.json()

The async version reads almost the same:

import asyncio
import httpx

async def fetch_all(urls: list[str]) -> list[dict]:
    async with httpx.AsyncClient(timeout=10.0) as client:
        responses = await asyncio.gather(*(client.get(u) for u in urls))
    return [r.json() for r in responses]

When you have 200 URLs to hit and an individual request takes 200ms, the sync version takes 40 seconds. The async version takes 2. That difference is why httpx exists.

aiohttp — async-only, the older async option. Still excellent, still maintained, still common in production. If you inherit a codebase using it, you’re fine. For new code I’d pick httpx for the sync/async unification.

For this lesson we’ll use httpx. The patterns translate to requests almost line-for-line.

The basics, done right

Four things to get right on every request:

r = client.get(
    "https://api.example.com/orders",
    params={"since": "2026-04-01", "limit": 100},
    headers={"Authorization": f"Bearer {token}", "Accept": "application/json"},
    timeout=10.0,
)
r.raise_for_status()
data = r.json()
  • params — query string parameters as a dict. Don’t concatenate strings into URLs; httpx will encode for you.
  • headers — auth, accept, user-agent. Set a User-Agent identifying your app; some APIs reject blank ones.
  • timeoutalways set this. The default in some libraries is “wait forever,” which is a great way to hang your pipeline behind a hung connection. 10 seconds is a reasonable starting point.
  • raise_for_status() — raises an exception for 4xx/5xx responses. Without it, your code happily continues with r.json() on a 500 error response page and you get a confusing JSON parse error instead of a clear HTTP error.

A connection-reusing client is meaningfully faster than calling httpx.get directly each time, because it pools TCP connections:

with httpx.Client(
    base_url="https://api.example.com",
    headers={"Authorization": f"Bearer {token}"},
    timeout=10.0,
) as client:
    for endpoint in endpoints:
        r = client.get(endpoint)
        ...

Use the client. Always. It’s one extra line and a noticeable speedup over a chatty API.

Retries with tenacity

Some failures are transient. The server hiccupped. The TCP connection got reset. A load balancer flipped. The right response is “wait a moment and try again.” The wrong response is “fail loudly to the user and put a manual rerun on the on-call’s plate.”

The library is tenacity. The decorator is @retry, and it composes a few smaller policies.

from tenacity import (
    retry,
    stop_after_attempt,
    wait_exponential,
    retry_if_exception_type,
)
import httpx

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=60),
    retry=retry_if_exception_type((httpx.TransportError, httpx.HTTPStatusError)),
    reraise=True,
)
def fetch(client: httpx.Client, url: str) -> dict:
    r = client.get(url)
    r.raise_for_status()
    return r.json()

What that says: try up to 5 times; wait 1s, 2s, 4s, 8s between attempts (capped at 60s); retry on network errors and HTTP errors; if we still fail after 5 tries, reraise the last exception so the caller sees a real error.

The math behind exponential backoff: each retry doubles the wait. The reason isn’t superstition — it’s that when a service is overloaded, retrying immediately just adds load. Exponential backoff lets the queue drain. Add a touch of jitter (random wiggle in the wait time) so a thousand clients don’t all retry at the same instant:

from tenacity import wait_random_exponential

wait=wait_random_exponential(multiplier=1, max=60),

That’s “exponential up to 60s, with jitter.” The tenacity docs call this the recommended default for talking to external services, and it’s what I reach for first.

What to retry, what not to retry

Not every error deserves a retry. The split is roughly:

Retry: 5xx server errors, 408 request timeout, 429 too many requests, network errors (ConnectError, ReadTimeout, RemoteProtocolError). These are transient.

Don’t retry: 4xx client errors (except 408/429). 400 bad request means your request was malformed — retrying makes no difference. 401 means your token is invalid — retrying just hammers the auth endpoint. 404 means the resource isn’t there. 422 means validation failed.

Tenacity gives you the granularity:

def is_retryable(exc: BaseException) -> bool:
    if isinstance(exc, httpx.TransportError):
        return True
    if isinstance(exc, httpx.HTTPStatusError):
        status = exc.response.status_code
        return status >= 500 or status in (408, 429)
    return False

@retry(
    stop=stop_after_attempt(5),
    wait=wait_random_exponential(multiplier=1, max=60),
    retry=retry_if_exception(is_retryable),
    reraise=True,
)
def fetch(client: httpx.Client, url: str) -> dict:
    r = client.get(url)
    r.raise_for_status()
    return r.json()

That’s the production-grade retry decorator. Steal it.

Rate limits: respect the response

When an API tells you to slow down, slow down. Most APIs signal this with HTTP 429 and a Retry-After header:

HTTP/1.1 429 Too Many Requests
Retry-After: 30

Honor it:

def fetch(client: httpx.Client, url: str) -> dict:
    while True:
        r = client.get(url)
        if r.status_code == 429:
            wait_s = float(r.headers.get("Retry-After", "5"))
            log.warning("rate limited, sleeping %ss", wait_s)
            time.sleep(wait_s)
            continue
        r.raise_for_status()
        return r.json()

That’s the floor: read the header, sleep, retry. The ceiling is proactive throttling: rate-limit yourself before the API has to. If the API allows 100 requests per minute, run your client at 90/minute and you’ll never see a 429.

The limits library gives you a clean rate-limiter:

from limits import RateLimitItemPerMinute
from limits.storage import MemoryStorage
from limits.strategies import MovingWindowRateLimiter

storage = MemoryStorage()
limiter = MovingWindowRateLimiter(storage)
quota = RateLimitItemPerMinute(90)

def fetch(client: httpx.Client, url: str) -> dict:
    while not limiter.hit(quota, "api.example.com"):
        time.sleep(0.1)
    r = client.get(url)
    r.raise_for_status()
    return r.json()

Or roll your own token bucket — it’s about 30 lines. The point is conscious rate management. APIs throttle you because someone in their ops team got paged at 3am because of a misbehaving client. Don’t be that client.

Where possible, batch. If the API has a bulk endpoint (POST /orders/lookup taking 100 IDs), use it instead of 100 individual GETs.

Pagination patterns

Three flavors you’ll meet, in roughly increasing order of niceness.

Offset/limit?offset=0&limit=100, then ?offset=100&limit=100, etc. The classic, but it has a flaw: if rows are inserted while you’re paginating, you can miss or duplicate rows. Acceptable for static datasets, sketchy for live ones.

def paginate_offset(client: httpx.Client, url: str, limit: int = 100):
    offset = 0
    while True:
        r = client.get(url, params={"offset": offset, "limit": limit})
        r.raise_for_status()
        page = r.json()["data"]
        if not page:
            return
        yield from page
        offset += len(page)

Cursor-based — the API returns an opaque next_cursor token; you pass it back to get the next page. Stable across writes, the modern default.

def paginate_cursor(client: httpx.Client, url: str):
    cursor: str | None = None
    while True:
        params = {"cursor": cursor} if cursor else {}
        r = client.get(url, params=params)
        r.raise_for_status()
        body = r.json()
        yield from body["data"]
        cursor = body.get("next_cursor")
        if not cursor:
            return

Link header (RFC 5988) — the API puts the next-page URL in the Link HTTP header. GitHub uses this. Less common but elegant: you don’t construct the next URL, you just follow.

def paginate_link(client: httpx.Client, url: str):
    while url:
        r = client.get(url)
        r.raise_for_status()
        yield from r.json()
        # httpx parses Link headers into r.links
        next_link = r.links.get("next")
        url = next_link["url"] if next_link else None

Read the docs, pick the right one, write a generator. Generators are the ideal shape here — the caller doesn’t need to know how many pages there are or care about page boundaries.

Authentication

The variations:

  • API keys in headers — the most common, and the right place for them. Authorization: Bearer <token> or a custom header like X-API-Key.
  • API keys in query strings — some legacy APIs do this. Avoid where possible: query strings end up in server access logs and browser history.
  • OAuth 2.0 client credentials — for server-to-server. POST to a token endpoint, get back an access token, use it for an hour, refresh.
  • OAuth 2.0 authorization code — for “act on behalf of a user” flows. Browser redirects, scopes, refresh tokens. Not what you want for batch jobs.

A small refresh-token wrapper for the client-credentials flow:

import time
from dataclasses import dataclass

@dataclass
class Token:
    value: str
    expires_at: float

class TokenManager:
    def __init__(self, client_id: str, client_secret: str, token_url: str):
        self.client_id = client_id
        self.client_secret = client_secret
        self.token_url = token_url
        self._token: Token | None = None

    def get(self, client: httpx.Client) -> str:
        if self._token and self._token.expires_at > time.time() + 60:
            return self._token.value
        r = client.post(self.token_url, data={
            "grant_type": "client_credentials",
            "client_id": self.client_id,
            "client_secret": self.client_secret,
        })
        r.raise_for_status()
        body = r.json()
        self._token = Token(body["access_token"], time.time() + body["expires_in"])
        return self._token.value

The 60-second buffer is on purpose: refresh slightly before expiry so you don’t get a 401 mid-request because the clock skewed.

Webhooks vs polling

Quick aside. If the API offers webhooks (“we’ll POST to your URL when something happens”), they’re almost always better than polling. You stop hammering an API for “anything new?” twelve times an hour and let the source push when there’s actually news. The cost is running a small HTTP server to receive them. For high-volume or low-latency integrations, this is the way.

Polling is fine for low-frequency batch jobs, or when you don’t control infrastructure to receive webhooks, or when the API doesn’t support them.

A note on AI-generated API clients

AI assistants are excellent at producing the boilerplate for this kind of code. “Write me an httpx client with tenacity retries that paginates by cursor and handles rate limits” gets you 80% of a working client in 15 seconds. The retry decorators, the pagination loops, the auth flows — all very pattern-shaped, all things AI does near-perfectly.

The catch: AI assistants sometimes invent endpoint names that don’t exist. They’ve seen ten thousand API clients and pattern-matched yours to a similar one, and they’ll confidently produce code that calls GET /api/v2/users/me/orders when the actual API has GET /v2/customer/orders. Always verify endpoint names, parameter names, and response shapes against the actual API documentation before you ship. Treat the AI’s output as a draft of the structure, not the source of truth for the API itself.

A worked example: paginated pull to Parquet

Putting it together — a script that pulls every page of a paginated API, retries on errors, respects rate limits, and writes to Parquet:

"""pull_orders.py — fetch all orders, write to Parquet."""
from __future__ import annotations
import logging
import time
from pathlib import Path

import httpx
import pyarrow as pa
import pyarrow.parquet as pq
from tenacity import (
    retry,
    retry_if_exception,
    stop_after_attempt,
    wait_random_exponential,
)

logging.basicConfig(level=logging.INFO)
log = logging.getLogger("pull")

API = "https://api.example.com"
TOKEN = "..."


def is_retryable(exc: BaseException) -> bool:
    if isinstance(exc, httpx.TransportError):
        return True
    if isinstance(exc, httpx.HTTPStatusError):
        return exc.response.status_code >= 500 or exc.response.status_code in (408, 429)
    return False


@retry(
    stop=stop_after_attempt(5),
    wait=wait_random_exponential(multiplier=1, max=60),
    retry=retry_if_exception(is_retryable),
    reraise=True,
)
def fetch(client: httpx.Client, url: str, params: dict | None = None) -> dict:
    r = client.get(url, params=params)
    if r.status_code == 429:
        wait_s = float(r.headers.get("Retry-After", "5"))
        log.warning("429 rate limited, sleeping %ss", wait_s)
        time.sleep(wait_s)
        r.raise_for_status()  # triggers retry
    r.raise_for_status()
    return r.json()


def paginate(client: httpx.Client, path: str):
    cursor: str | None = None
    while True:
        params = {"limit": 200, "cursor": cursor} if cursor else {"limit": 200}
        body = fetch(client, path, params=params)
        yield from body["data"]
        cursor = body.get("next_cursor")
        if not cursor:
            return


def main(out: Path) -> None:
    headers = {"Authorization": f"Bearer {TOKEN}", "User-Agent": "narcis-pull/1.0"}
    with httpx.Client(base_url=API, headers=headers, timeout=15.0) as client:
        rows = list(paginate(client, "/v1/orders"))
    log.info("fetched %d rows", len(rows))
    table = pa.Table.from_pylist(rows)
    pq.write_table(table, out, compression="zstd")
    log.info("wrote %s", out)


if __name__ == "__main__":
    main(Path("orders.parquet"))

That’s the whole shape: paginate cleanly, retry on transient failures, honor rate limits, write a typed columnar file at the end. Drop it under cron or your orchestrator, point lesson 38’s ingestion pipeline at the Parquet output, and you’ve stitched two halves of a real ETL together.

Lesson 41 is where we wrap an orchestrator around the lot — but that’s a story for next week.

Citations

Search