Functional vs non-functional requirements

In the last lesson we landed on a working definition: architecture is the set of decisions that are expensive to change later. Now we have to ask the obvious follow-up. What drives those decisions? What’s the input you actually use to figure out which database, which deployment topology, which communication style is right for your system?

The wrong answer, and the one most teams reach for in week one of a project, is “the feature list.” Product owners hand engineering a list of things the system has to do. Sign up. Log in. Browse a catalogue. Place an order. Send a confirmation email. The team reads the list, sketches a quick diagram, picks a stack, and starts building. Six months later they discover they made the wrong choice on three of the year-rung decisions and now they’re paying for it.

The reason this happens is that feature lists, on their own, do not contain enough information to design a system. Two systems with identical feature lists can require radically different architectures. The thing that separates them is the second class of requirements, the one that gets ignored or underspecified, and that is the subject of this lesson.

Functional requirements: what the system does

Functional requirements describe the behaviour of the system. They answer the question “what does it do?”

A small e-commerce site might have functional requirements like:

Users can create an account with email and password.
Users can browse a product catalogue.
Users can add products to a cart and check out.
Users can pay by credit card or SEPA direct debit.
The system sends an order confirmation email after a successful checkout.
Admins can add and edit products via an internal panel.
The system supports VAT calculation for EU member states.

These are the bullets that go on the product owner’s roadmap and into the user stories. They are essential, they’re how the team and the business agree on what’s being built, and they are not enough to design the system. Every one of those bullets could be implemented as a single Python script running on someone’s laptop, or as a globally-distributed system handling a million transactions per minute. The bullets don’t tell you which.

There’s a famous exercise that engineering interviewers love: “design Twitter.” If you read the brief literally, the functional requirements are roughly:

Users can post a short message.
Users can follow other users.
Users can see a feed of messages from people they follow.

Those four bullets. That’s the entire feature list. You could implement it in an afternoon as a Postgres database, a Flask app, and a SELECT ... ORDER BY created_at DESC LIMIT 50 query. Done.

Of course, that’s not what makes the design problem interesting. The interesting version of “design Twitter” only emerges when the interviewer adds the second category of requirements: 500 million users, 100 million daily active users, peak load of half a million tweets per minute, average feed reads of two billion per day, p99 latency under 200 milliseconds, 99.9% availability target, regulatory data residency in three jurisdictions. Now the design is interesting. Now the database choice matters, the caching strategy matters, the geographic deployment matters, and the question of “should the feed be computed on read or pre-computed on write” becomes the central architectural debate.

Same feature list. Wildly different architecture. The thing that changed is the non-functional requirements.

Non-functional requirements: how well

Non-functional requirements describe the qualities of the system. They answer the question “how well does it do what it does, and under what conditions?”

This is the category of requirements that drives architecture. Get them right and the architectural choices mostly fall out of them. Get them wrong, or worse, ignore them, and you’ll build the right system for the wrong universe.

There are roughly seven non-functional qualities that come up repeatedly when you’re shaping a system. Different authors organise them slightly differently and the boundaries are a bit blurry, but this is the working set.

Latency

Latency is how long a single request takes to complete, measured from the user’s perspective. It’s almost always quoted as a percentile, not an average, because averages hide the tail.

A typical latency target looks like: “p50 under 80 ms, p95 under 200 ms, p99 under 500 ms” for an API endpoint. That means half of all requests finish in under 80 ms, 95% in under 200 ms, 99% in under 500 ms. The 1% beyond that is the tail, and the tail is where bad user experiences live. The average (“p50” is the median, but average is similar for well-behaved distributions) is the number that looks good in slides. The p99 is the number you actually feel.

Latency targets drive: where you put your servers (closer to users equals lower latency), whether you cache aggressively, whether you precompute results, whether you use synchronous or asynchronous communication, and whether you’re allowed to make a database call at all on the hot path.

Throughput

Throughput is how many requests, events, or units of work the system handles per unit of time. Often quoted as requests per second (RPS) for APIs, transactions per second (TPS) for databases, messages per second for queues, or events per second for streaming systems.

A typical target: “5,000 requests per second sustained, 20,000 per second peak, with peak lasting up to 30 minutes during a flash sale.” The interesting bit is the ratio between sustained and peak, and the duration of peak. A system that handles 5,000 RPS happily but melts at 8,000 RPS is fine if your traffic is flat and a disaster if you have spikes.

Throughput targets drive: how you scale (vertically or horizontally), whether you need a load balancer, how you partition data, how you size your worker pools, and whether you need to absorb bursts with a queue.

Availability

Availability is the percentage of time the system is up and serving requests successfully. It’s quoted in nines.

99% availability is “two nines”: 3.65 days of downtime per year. Acceptable for an internal tool.
99.9% is “three nines”: 8.76 hours per year. Typical SaaS.
99.95% is “three and a half nines”: 4.38 hours per year. Common SLO for paid B2B.
99.99% is “four nines”: 52.6 minutes per year. Aggressive. You can’t deploy carelessly anymore.
99.999% is “five nines”: 5.26 minutes per year. Telecoms. You need real redundancy at every layer.

The thing that catches people out is that each additional nine costs roughly an order of magnitude more in engineering effort. Going from 99% to 99.9% is mostly a discipline thing: don’t ship bugs to production, have a rollback story, monitor what you have. Going from 99.9% to 99.99% is a redundancy thing: multiple availability zones, automatic failover, no single points of failure. Going from 99.99% to 99.999% is an everything thing: multi-region, formal change management, chaos engineering, pages on a Sunday morning.

Availability targets drive: redundancy (how many copies of each service), deployment topology (single AZ, multi-AZ, multi-region), how you handle deployments (blue/green, canary, rolling), and whether you can take the system down at all for maintenance.

Durability

Durability is the probability that data, once committed, will still be there when you ask for it. It’s quoted in nines, like availability, but the numbers are bigger.

S3, the canonical example, is documented as “11 nines of durability.” That’s 99.999999999%. The interpretation is that if you store 10,000,000 objects, you’d expect to lose one every 10,000 years on average. This is achieved by storing every object across multiple physically separated facilities and continuously verifying integrity.

A relational database with no backups has durability roughly equal to the durability of the disk it lives on, which is “good until the disk dies, then catastrophic.” That’s why backups, replication, and cross-region replication exist.

Durability targets drive: backup strategy, replication topology, how often you snapshot, where you store snapshots, and whether you do point-in-time recovery.

Consistency

Consistency is the guarantee about what readers see after a writer updates the data. There’s a whole zoo of consistency models (we’ll spend much of module 3 and 8 on this); the headline distinction for now is strong consistency versus eventual consistency.

Strong consistency: after a write completes, every subsequent read sees the new value. No exceptions. This is what a single Postgres database gives you, and what most developers naively assume they have everywhere.

Eventual consistency: after a write completes, reads might see the old value for some period of time, but eventually they’ll all see the new value. This is what you get from most distributed key-value stores, from cross-region replication, and from anything involving caching.

The CAP theorem (we’ll cover it in lesson 27) says that in a distributed system, when the network partitions, you have to choose between consistency and availability. You cannot have both. Most production systems choose availability and live with eventual consistency, but the choice has to be conscious because the consequences ripple through the entire user experience.

Consistency targets drive: choice of database, whether you can use a cache, how you handle reads-after-writes, and how user-facing flows recover when the data hasn’t propagated yet.

Security

Security is, broadly, about who is allowed to do what, how identities are established and verified, how data is protected at rest and in transit, and how the system resists abuse. It’s a non-functional requirement in the sense that it doesn’t change what the system does for legitimate users; it changes the constraints around how it does it.

Security requirements typically cover: authentication (who are you), authorization (what can you do), encryption in transit (TLS everywhere), encryption at rest (disk and database encryption), secrets management (where do API keys live), audit logging (who did what when), compliance (GDPR, HIPAA, PCI-DSS, SOC 2 depending on your sector), and threat modelling (what attacks should we worry about).

Security drives: choice of auth provider, network topology (public internet versus private VPC), whether you use a managed service or roll your own, where data physically lives (data residency), and the entire deployment process.

Evolvability and maintainability

Evolvability is how easy the system is to change over time. Maintainability is the operational cousin: how easy it is to keep running. These two are sometimes split and sometimes lumped, and they cover the long-tail concerns that don’t show up on day one but dominate years two through ten.

A system with good evolvability has clear boundaries between modules, low coupling, automated tests at the right level, decent documentation, and the ability to absorb new requirements without major rewrites. A system with good maintainability has good observability, predictable deployments, useful logs, and a small enough surface area that a new engineer can be productive in two weeks instead of two months.

Evolvability and maintainability are easy to deprioritise because they don’t show up on the launch checklist. They are also the difference between a system that is still helping the business in five years and one that gets rewritten because nobody can touch it without breaking something.

Trade-offs are mandatory, not optional

The trap that catches inexperienced architects is treating non-functional requirements as a wishlist where every quality is set to “high” and then declaring victory. Real systems involve trade-offs. Tightening one quality almost always loosens another.

A short list of the trade-offs you cannot avoid:

Higher availability costs money. Multi-AZ deployment is roughly 2x the cost of single-AZ. Multi-region is roughly 3-4x. The savings from being down less often have to justify the spend.
Higher availability often costs consistency. If you want to keep serving traffic during a network partition, you have to accept that some reads will be stale. CAP again.
Lower latency often costs throughput. A system tuned for the lowest possible per-request latency does less work in parallel and tops out at lower aggregate throughput. A system tuned for maximum throughput typically batches and queues, which raises per-request latency.
Stronger consistency costs latency. Synchronously replicating a write across three machines means the user waits for all three. Asynchronously replicating gives you faster writes and weaker guarantees.
Higher security costs developer velocity. Every secret rotation, every IAM policy, every WAF rule is friction. Worth it, but real.
Better evolvability costs upfront effort. Drawing clean module boundaries on day one is slower than just shipping a single file. The payoff comes in year two.

You don’t escape these trade-offs by being clever. You navigate them by picking what to optimise for in this system, given this business, at this stage of growth. The architecture is the shape of those choices.

Eliciting NFRs from product owners

Most product specs you receive will have functional requirements front and centre and non-functional requirements completely absent. This is normal and not malicious; product owners think in features because that’s what their job rewards them for. Your job, before you draw a single box, is to elicit the non-functional requirements that aren’t in the doc.

Here is a working list of questions to ask, ranked by how often they reveal something that changes the architecture.

How many users do we expect at launch? In six months? In two years? Anchors throughput targets and informs scaling strategy.
What’s the peak load relative to the average? Flash sales, business hours, time zones, viral spikes. A 10x peak/average ratio is normal; a 100x ratio means you have to design for the spike.
What’s the latency budget for the most-used endpoint? “Feels instant” usually means under 100 ms. “Feels responsive” usually means under 300 ms. “Feels slow” is anything past a second.
What happens if the system is down for 5 minutes? For an hour? For a day? This is the question that calibrates availability. If the answer to “down for an hour” is “shrug, it’s an internal tool,” you don’t need four nines. If the answer is “we lose €2M and the regulator calls,” you do.
What happens if we lose data? Calibrates durability. Some data is critical (financial transactions, user-uploaded photos). Some data is regenerable (cached search results, computed aggregates).
Are there hard latency or location requirements from regulators? GDPR, data residency, China’s cybersecurity law, etc. These are often year-rung decisions you cannot back out of.
What’s the realistic worst-case bad actor? A bored teenager, a sophisticated criminal group, a nation-state, an internal employee. The answer changes the security posture significantly.
How often do we expect to change the system? Every week? Every quarter? Once in five years? Calibrates evolvability investment.

If you ask these questions and the product owner’s answer is “I don’t know, what do you recommend?” that’s actually fine. It means you have the floor to propose targets, get them written down, and refer back to them when someone later complains that the system is “too expensive” or “too slow.” NFRs are a contract with the rest of the business as much as they are an engineering input.

flowchart TD
    A[New feature request] --> B{What's the load?}
    B -->|low, less than 100 RPS| C[Single instance is fine]
    B -->|medium, 100 to 5000 RPS| D{What's the latency budget?}
    B -->|high, 5000 RPS plus| E[Horizontal scaling, caching, sharding]
    D -->|under 100 ms| F[Cache aggressively, colocate data]
    D -->|under 1 second| G[Standard web stack]
    D -->|background, async OK| H[Queue and worker pool]
    C --> I{What's the availability target?}
    F --> I
    G --> I
    H --> I
    E --> I
    I -->|99 percent| J[Single AZ deployment]
    I -->|99.9 percent| K[Multi-AZ with health checks]
    I -->|99.99 percent plus| L[Multi-region, formal SLOs]
    J --> M[Architectural choices follow]
    K --> M
    L --> M

The diagram is deliberately simplistic. The real version has more branches and more interactions between the qualities, but the shape of the reasoning is right: you walk down from requirements to architectural implications, and you do it explicitly, on paper, with the product owner in the room.

What you should walk away from this lesson with

Functional requirements are necessary and not sufficient. Non-functional requirements are what drive the shape of the system. There are roughly seven of them: latency, throughput, availability, durability, consistency, security, evolvability. Each one trades off against at least one of the others. Your job before you design anything is to pin down realistic numerical targets for the ones that matter, by asking the right questions of the people who own the business outcome.

In the next lesson we’ll switch from “what to ask” to “how to draw,” and look at the C4 model: a four-zoom-level diagramming convention that makes “let’s sketch the system” actually productive.

References

Bass, Clements, Kazman. Software Architecture in Practice, 4th edition (2021), chapters 4 to 13 cover quality attributes in depth.
Beyer et al. Site Reliability Engineering (2016), chapter 4 on Service Level Objectives.
Werner Vogels, “Eventually Consistent” (CACM 2009), the foundational article on consistency trade-offs.
ISO/IEC 25010 quality model (2011), the standardised vocabulary for software quality attributes.