itertools + functools: the higher-order toolkit

There are two stdlib modules that pay back the time you spend learning them faster than almost anything else in Python: itertools and functools. The first gives you iterator algebra — slice, chain, group, combine — without rewriting the same nested loops over and over. The second gives you function-level tools — caching, currying, dispatch — that are normal in functional languages and slightly underused here.

This lesson is the high-leverage subset. Not every recipe in either module, but the dozen or so members you’ll reach for again and again.

itertools: iterator algebra

Everything in itertools returns an iterator, not a list. That means it’s lazy, it’s memory-light, and you have to remember to wrap it in list(...) if you want to look at the values more than once.

chain: concatenate iterables

from itertools import chain

a: list[int] = [1, 2, 3]
b: list[int] = [4, 5, 6]
c: tuple[int, ...] = (7, 8, 9)

list(chain(a, b, c))  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

chain(*iterables) walks through them in order, one after the other. The big advantage over a + b + c is that the inputs don’t have to be the same type, and nothing is materialized until you iterate. There’s also chain.from_iterable(it) for the case where you have an iterable of iterables — flattening one level deep:

pages: list[list[int]] = [[1, 2], [3, 4], [5, 6]]
list(chain.from_iterable(pages))  # [1, 2, 3, 4, 5, 6]

islice: slice an iterable

You can’t do my_iterator[10:20] because iterators don’t support indexing. islice is the iterator-aware substitute:

from itertools import islice, count

# count() is infinite — islice rescues us
first_five: list[int] = list(islice(count(), 5))      # [0, 1, 2, 3, 4]
window: list[int] = list(islice(count(), 10, 15))     # [10, 11, 12, 13, 14]
every_other: list[int] = list(islice(count(), 0, 10, 2))  # [0, 2, 4, 6, 8]

The signature mirrors regular slicing: islice(it, stop) or islice(it, start, stop[, step]). Unlike list slicing, negative indices aren’t allowed — there’s no “end” of an iterator to count back from.

The most common real use: pagination over a generator. You have a function that yields rows from somewhere expensive, and you want page 3 of 50 items each:

from collections.abc import Iterator
from itertools import islice

def fetch_rows() -> Iterator[dict]:
    ...

page_size: int = 50
page_num: int = 3
start: int = (page_num - 1) * page_size
page: list[dict] = list(islice(fetch_rows(), start, start + page_size))

tee: split into N independent iterators

from itertools import tee

source = iter([1, 2, 3, 4, 5])
a, b = tee(source, 2)

list(a)  # [1, 2, 3, 4, 5]
list(b)  # [1, 2, 3, 4, 5]

tee(it, n) returns n iterators that each produce the same values as the original. Useful when you want to walk the same sequence in two ways without materializing it to a list — say, computing a sum and a max in one pass each.

The catch: tee works by buffering values that one iterator has consumed and another hasn’t yet. If one branch races far ahead of the other, the buffer grows. For balanced consumption, fine. For “consume all of a, then all of b,” you’ve effectively materialized the whole sequence in memory — at which point a list would have been simpler.

groupby: sequential grouping

from itertools import groupby

data: list[tuple[str, int]] = [
    ("a", 1), ("a", 2), ("b", 3), ("b", 4), ("a", 5)
]

for key, group in groupby(data, key=lambda x: x[0]):
    print(key, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3), ('b', 4)]
# a [('a', 5)]

Read that carefully. The key "a" appears twice in the output — once at the start, once at the end. groupby groups consecutive runs of the same key. It does not sort.

If you want SQL-style “group by key” semantics where order doesn’t matter, you have two choices: sort first, or use defaultdict(list) from the previous lesson. groupby is the right tool when the data is already sorted, or when sequential runs are exactly what you want — runs of the same log level, runs of the same status code, etc.

accumulate: running totals

from itertools import accumulate
import operator

values: list[int] = [1, 2, 3, 4, 5]

list(accumulate(values))                   # [1, 3, 6, 10, 15] — running sum
list(accumulate(values, operator.mul))     # [1, 2, 6, 24, 120] — running product
list(accumulate(values, max))              # [1, 2, 3, 4, 5] — running max

accumulate(it, func=operator.add) yields the cumulative result of applying func left-to-right. The default is addition, which makes it the lazy version of “running total.” With a custom function, anything fold-shaped works.

The optional initial keyword (3.8+) lets you set a starting value:

list(accumulate([1, 2, 3], initial=100))  # [100, 101, 103, 106]

combinations, permutations, product: combinatorics

from itertools import combinations, permutations, product

list(combinations("abc", 2))   # [('a', 'b'), ('a', 'c'), ('b', 'c')]
list(permutations("abc", 2))   # [('a', 'b'), ('a', 'c'), ('b', 'a'), ('b', 'c'), ('c', 'a'), ('c', 'b')]
list(product([1, 2], ["x", "y"]))  # [(1, 'x'), (1, 'y'), (2, 'x'), (2, 'y')]

combinations(it, r): all r-length subsets, order doesn’t matter, no repeats.
permutations(it, r): all r-length orderings, order matters, no repeats.
product(*its, repeat=1): Cartesian product. The cleanest replacement for nested for-loops over independent dimensions.

Replacing nested loops:

# Before
for env in ["dev", "staging", "prod"]:
    for region in ["us", "eu", "ap"]:
        for size in ["small", "large"]:
            deploy(env, region, size)

# After
for env, region, size in product(
    ["dev", "staging", "prod"],
    ["us", "eu", "ap"],
    ["small", "large"],
):
    deploy(env, region, size)

Same behavior, one level of indentation, easier to add a fourth dimension.

count, cycle, repeat: infinite iterators

from itertools import count, cycle, repeat

count(10)            # 10, 11, 12, 13, ... forever
count(0, 0.25)       # 0, 0.25, 0.5, 0.75, ... forever
cycle("ab")          # 'a', 'b', 'a', 'b', ... forever
repeat("x", 3)       # 'x', 'x', 'x'
repeat("x")          # 'x', 'x', 'x', ... forever

Always pair these with something that stops: islice, zip against a finite iterable, or a break condition. Otherwise you’ve written an infinite loop.

A real use: numbering log lines starting at 1, regardless of how many there are.

from itertools import count

for n, line in zip(count(1), open("log.txt"), strict=False):
    print(f"{n:>5}: {line}", end="")

zip against the file iterator stops naturally when the file ends.

batched: chunking (3.12+)

from itertools import batched

rows: list[int] = list(range(13))
for chunk in batched(rows, 5):
    print(chunk)
# (0, 1, 2, 3, 4)
# (5, 6, 7, 8, 9)
# (10, 11, 12)

batched(it, n) was added in Python 3.12. Before that, every Python codebase had its own custom batched or chunked function. Now you don’t need one. Useful for batching API calls, database inserts, anything with a per-request limit.

functools: function tools

reduce: yes, it’s still fine

from functools import reduce
import operator

reduce(operator.add, [1, 2, 3, 4, 5])         # 15
reduce(operator.mul, [1, 2, 3, 4, 5])         # 120
reduce(lambda a, b: a | b, [{1}, {2}, {3}])   # {1, 2, 3}

reduce(func, iterable[, initial]) folds the iterable left-to-right with the binary function. It’s the imperative-language version of left fold.

Guido moved reduce out of builtins in Python 3 because, in his words, most uses are clearer as a loop. He’s right most of the time — sum, min, max, and any/all cover the common cases. But for legitimate folds that aren’t one of those — set unions, dictionary merges, custom monoids — reduce is exactly the right tool. Don’t avoid it on principle.

# Merge a list of dicts (later keys win)
from functools import reduce
configs: list[dict[str, str]] = [base, env_overrides, cli_overrides]
final: dict[str, str] = reduce(lambda a, b: {**a, **b}, configs, {})

partial: pre-fill arguments

from functools import partial

def power(base: float, exp: float) -> float:
    return base ** exp

square: callable = partial(power, exp=2)
cube: callable = partial(power, exp=3)

square(5)  # 25
cube(5)    # 125

partial(func, *args, **kwargs) returns a new callable with some arguments fixed. Common uses:

Adapting a function to a callback signature that takes fewer arguments.
Building specialized variants without writing wrapper functions.
Pre-configuring a logger, an HTTP session, a database connection.

import requests
from functools import partial

api_get = partial(requests.get, timeout=10, headers={"User-Agent": "myapp/1.0"})
api_get("https://example.com/users")  # timeout and headers already applied

cache and lru_cache: memoization

from functools import cache, lru_cache

@cache
def fib(n: int) -> int:
    if n < 2:
        return n
    return fib(n - 1) + fib(n - 2)

@lru_cache(maxsize=128)
def fetch_user(user_id: int) -> dict:
    ...  # expensive call

@cache (3.9+) is an unbounded cache — every call with new arguments is remembered forever. @lru_cache(maxsize=N) is the bounded version: keep the N most recently used results, evict the rest.

Use unbounded @cache when:

The argument space is small and known (small ints, fixed strings).
You’re memoizing a pure recursive function.

Use bounded @lru_cache when:

The argument space could grow unboundedly (user IDs, URLs, file paths).
You need a memory ceiling.

Both require the arguments to be hashable. Don’t decorate a function that takes a list or a dict as an argument — it’ll crash on the first call.

singledispatch: function overloading by type

from functools import singledispatch
from pathlib import Path
import json
import tomllib

@singledispatch
def load_config(source) -> dict:
    raise TypeError(f"Cannot load config from {type(source).__name__}")

@load_config.register
def _(source: Path) -> dict:
    text: str = source.read_text()
    if source.suffix == ".json":
        return json.loads(text)
    if source.suffix == ".toml":
        return tomllib.loads(text)
    raise ValueError(f"Unsupported file type: {source.suffix}")

@load_config.register
def _(source: dict) -> dict:
    return source

@load_config.register
def _(source: str) -> dict:
    return json.loads(source)

@singledispatch lets you write a function that dispatches on the type of its first argument. The base function is the fallback. Each @func.register adds an implementation for a specific type, picked up via the type annotation on the parameter.

Calling load_config(Path("app.toml")) runs the Path branch. Calling load_config({"key": "value"}) runs the dict branch. load_config(42) raises TypeError from the base.

This is Python’s clean answer to “I want to handle different input types differently without a chain of isinstance checks.” It’s a little heavier than a match statement, but it composes — third-party code can register a handler for its own type without touching your code.

AI assistance note. Coding assistants are good at recognizing when an if isinstance(...) elif isinstance(...) elif ... chain could become a @singledispatch. They’re also good at over-using it. If you have two branches, the chain is fine. The dispatch pays off at three or more, especially when the function might be extended later.

wraps: keep your decorator honest

Already covered in lesson 5. Quick reminder: @functools.wraps(fn) on the inner function of a decorator copies __name__, __doc__, and a few other attributes from the wrapped function to the wrapper. Without it, every decorated function looks like wrapper to debuggers, docs generators, and help().

When the stdlib isn’t enough: more-itertools

The PyPI package more-itertools is the production extension. Things like chunked, windowed, unique_everseen, partition, take, interleave — recipes that have lived in the official itertools documentation for years as “you could write this yourself” examples, packaged up so you don’t have to.

from more_itertools import unique_everseen, windowed, partition

list(unique_everseen([1, 2, 1, 3, 2, 4]))  # [1, 2, 3, 4] — preserves first-seen order
list(windowed(range(5), 3))                # [(0,1,2), (1,2,3), (2,3,4)]
evens, odds = partition(lambda n: n % 2, range(10))

If a teammate or AI assistant suggests writing a custom helper that sounds itertools-shaped, check more-itertools first.

Putting it together

A small example that uses pieces of both modules — process a stream of events into hourly summaries:

from itertools import groupby
from functools import reduce
from collections import Counter
from datetime import datetime
from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class Event:
    timestamp: datetime
    user_id: int
    action: str

def hour_key(e: Event) -> datetime:
    return e.timestamp.replace(minute=0, second=0, microsecond=0)

def summarize(events: list[Event]) -> dict[datetime, Counter[str]]:
    events_sorted: list[Event] = sorted(events, key=lambda e: e.timestamp)
    grouped = groupby(events_sorted, key=hour_key)
    return {
        hour: reduce(lambda c, e: c + Counter([e.action]), group, Counter())
        for hour, group in grouped
    }

sorted to make the data fit groupby’s sequential-runs contract. groupby to chunk by hour. reduce to fold each hour’s events into a Counter. Three modules, ten lines, no nested conditionals.

That’s the higher-order toolkit. Use it when it makes the intent clearer; reach for a plain loop when it doesn’t.

References: itertools — Functions creating iterators, functools — Higher-order functions, more-itertools, PEP 443 — Single-dispatch generic functions. Retrieved 2026-05-01.