How partitioning actually works in SQL Server

If you’ve ever inherited a fact table with two billion rows and been told “just add partitioning,” you already know the docs make this sound far simpler than it is. Partitioning in SQL Server is deeply useful and deeply fiddly, and most of the tutorials on the internet leave out the parts that actually bite you on a Tuesday night in production.

This is the version I wish someone had handed me on day one. If it were simpler, I wouldn’t have spent three months fixing a sliding window job whose previous owner had typed <= where they meant <.

What partitioning is — and isn’t

Partitioning splits one logical table into many physical chunks based on the value of a column. To anyone running SELECT, the table still looks like one table. To the storage engine, it’s a row of buckets. Queries that filter on the partition column only open the buckets they need, which is called partition elimination.

Two things partitioning is not:

A free performance win. A bad partition scheme is slower than no partitioning at all, because the optimizer now has to deal with partition elimination on top of everything else, and if your query doesn’t filter on the partition column you get zero of the benefit and all of the complexity.
A replacement for an index. Partitioning helps you prune whole chunks of the table. Indexes help you find rows inside a chunk. You almost always want both.

The real win is operational. You can swap a billion rows in or out of a partitioned table in milliseconds using ALTER TABLE ... SWITCH, because all SQL Server has to do is re-point a metadata pointer. No data movement. That is the main reason people use partitioning on SQL Server. Not query speed — swap speed.

The three pieces you need

Every partitioned table in SQL Server is built from three things, and they always appear in the same order:

A partition function — defines the boundaries between buckets.
A partition scheme — maps those buckets onto filegroups.
A table or index built on the scheme.

Here’s the smallest end-to-end example. We’re partitioning an Orders table by month on OrderDate.

-- 1. Partition function: one boundary per month
CREATE PARTITION FUNCTION pf_OrdersByMonth (DATETIME2(0))
AS RANGE RIGHT FOR VALUES (
    '2025-01-01', '2025-02-01', '2025-03-01',
    '2025-04-01', '2025-05-01', '2025-06-01'
);

-- 2. Partition scheme: send every partition to PRIMARY
--    (in production you'd usually spread these across filegroups)
CREATE PARTITION SCHEME ps_OrdersByMonth
AS PARTITION pf_OrdersByMonth
ALL TO ([PRIMARY]);

-- 3. The table itself, built on the scheme
CREATE TABLE dbo.Orders (
    OrderId     BIGINT        NOT NULL,
    CustomerId  INT           NOT NULL,
    OrderDate   DATETIME2(0)  NOT NULL,
    Amount      DECIMAL(18,2) NOT NULL,
    CONSTRAINT pk_Orders PRIMARY KEY CLUSTERED (OrderDate, OrderId)
        ON ps_OrdersByMonth (OrderDate)
);

Two things to read twice before you go further:

RANGE RIGHT vs RANGE LEFT. RANGE RIGHT puts the boundary value into the upper partition. '2025-02-01 00:00:00' lives in the February bucket, not January. This is almost always what you want for date partitioning, because it lets you describe a month as a clean half-open interval [start, next_start). Using RANGE LEFT with dates creates off-by-one bugs that take a frustrating afternoon to debug. Just always use RANGE RIGHT for time partitioning. Life is too short.
The partition column has to be in the clustered key. That’s why the primary key above is (OrderDate, OrderId) instead of (OrderId). You’ll see this show up in every SQL Server partition tutorial and it’ll feel wrong every single time. Live with it. If you skip this, SQL Server will reject the CREATE TABLE with an error that sounds like it was translated from Latin.

Sanity-checking with `$PARTITION`

Before you do any operation on a partitioned table — split, merge, switch — run this:

SELECT
    $PARTITION.pf_OrdersByMonth(OrderDate) AS partition_number,
    MIN(OrderDate)  AS min_date,
    MAX(OrderDate)  AS max_date,
    COUNT(*)        AS row_count
FROM dbo.Orders
GROUP BY $PARTITION.pf_OrdersByMonth(OrderDate)
ORDER BY partition_number;

This tells you exactly how many rows live in each partition, which partition number each month lives in, and whether there’s any garbage in the “partition 0” bucket that shouldn’t be there. I run this every time. I will run it before archiving last year’s data. I will run it after archiving last year’s data. I have never regretted running it.

The sliding window, the pattern that makes partitioning earn its keep

Here’s the pattern that turns partitioning from “neat” into “indispensable.” Every month (or day, or hour, depending on your grain) you add a new empty partition at the leading edge and switch out the oldest partition into a staging table. The live table stays the same size. Old data moves out of the live table in milliseconds. You archive, drop, or back up the staging table at your leisure.

-- Step 1: add a new partition for next month.
-- Tell the scheme which filegroup the new partition will live on.
ALTER PARTITION SCHEME ps_OrdersByMonth
    NEXT USED [PRIMARY];

ALTER PARTITION FUNCTION pf_OrdersByMonth ()
    SPLIT RANGE ('2025-07-01');

-- Step 2: create a staging table that matches the oldest partition.
-- Same schema, same indexes, same constraints, same filegroup.
CREATE TABLE dbo.Orders_Archive_2024_12 (
    OrderId     BIGINT        NOT NULL,
    CustomerId  INT           NOT NULL,
    OrderDate   DATETIME2(0)  NOT NULL,
    Amount      DECIMAL(18,2) NOT NULL,
    CONSTRAINT pk_Orders_Archive_2024_12
        PRIMARY KEY CLUSTERED (OrderDate, OrderId),
    CONSTRAINT ck_Orders_Archive_2024_12_Range
        CHECK (OrderDate >= '2024-12-01' AND OrderDate < '2025-01-01')
);

-- Step 3: switch the oldest partition out.
-- This is a metadata operation. Milliseconds. No data moves.
ALTER TABLE dbo.Orders SWITCH PARTITION 1 TO dbo.Orders_Archive_2024_12;

-- Step 4: collapse the empty partition left behind.
ALTER PARTITION FUNCTION pf_OrdersByMonth ()
    MERGE RANGE ('2025-01-01');

That SWITCH is the magic moment. Regardless of whether the partition holds a hundred rows or a hundred million, the operation completes in milliseconds. SQL Server just re-parents the partition’s underlying pages to the new table. You then drop, archive, or back up Orders_Archive_2024_12 whenever you want, without holding any locks on the live Orders table.

Automate the whole thing with an Agent job that runs on the first of the month. One job that creates next month’s partition and switches out the month that just fell off the twelve-month window. If you run this in production, put TRY ... CATCH around every step and send yourself an email if anything fails — silent errors on a monthly job are how you wake up in May to discover nothing has rolled over since January.

The gotchas nobody writes about until they’ve been bitten

1. Always keep an empty partition at each end

Splitting a partition that contains rows is logged per row. Splitting an empty partition is a metadata operation.

Scenario: you have partitions going back 12 months, all full, and you run SPLIT RANGE on the most-recent one to add a new month. SQL Server logs every single row in that partition as it moves. On a billion-row partition, this can take hours and fill your transaction log. A maintenance job that was supposed to take 10 seconds blows up and wakes you at 2am.

The fix is to always have an empty partition waiting at the leading edge. Make “create next month’s empty partition” a job that runs weeks in advance, not on the day you need it.

2. Non-aligned indexes block partition switching

Every non-clustered index on a partitioned table should be partitioned the same way (ON ps_OrdersByMonth (OrderDate)). An index that isn’t partitioned on the same scheme is called non-aligned.

Non-aligned indexes will silently block SWITCH with the SQL Server error equivalent of “no, and I won’t tell you exactly why.” If you ever see “ALTER TABLE SWITCH statement failed” and have no idea why, check sys.indexes for your non-clustered indexes and look at their partition scheme. Odds are one of them is aligned on something else or not partitioned at all. Always align every index on a partitioned table.

3. The check constraint is load-bearing

The CHECK constraint on the staging table is what tells SQL Server the SWITCH is safe. If the boundary in the constraint doesn’t exactly match the boundary in the partition function, SQL Server refuses the switch.

Classic bug: RANGE RIGHT FOR VALUES ('2025-01-01') says January starts at midnight on the 1st. Your check constraint says OrderDate BETWEEN '2024-12-01' AND '2024-12-31'. It misses the last day of December (everything between 2024-12-31 00:00:01 and 2024-12-31 23:59:59.999 is in the partition but outside the constraint). You chase this error message for 40 minutes before you realize BETWEEN is inclusive and you meant < '2025-01-01'.

Rule: use half-open intervals. CHECK (OrderDate >= '2024-12-01' AND OrderDate < '2025-01-01'). Copy-paste the boundary dates directly from your partition function. Don’t retype them. Every retype is a bug waiting to happen.

4. Statistics go stale on the live partition

All the inserts happen in the most recent partition. That partition’s statistics get out of date fastest. The optimizer then makes bad cardinality estimates for queries that filter on the current month, because it thinks there are 100 rows in there when there are actually 12 million.

The fix is to update statistics on the partitioned index with a higher sampling rate on a schedule, or rely on Ola Hallengren’s maintenance scripts (lesson 35) which do it correctly out of the box. The out-of-the-box SQL Server maintenance plan does not do this correctly. Do not trust it.

When not to partition

Partitioning costs you query-plan complexity, more moving parts at deploy time, and a whole new class of bugs (non-aligned indexes, statistics skew, mismatched check constraints, weekend calls). It pays for itself when:

You have a clear time-based access pattern — the last week or last month is hot, older data is cold.
You need fast bulk loads or fast bulk deletes that don’t lock the table.
The table is big enough that scanning the whole thing for archival is painful — usually meaning hundreds of millions of rows or more.

If your table is 5 million rows and you’re partitioning it because someone read a blog post on LinkedIn, please don’t. A good clustered index, a covering non-clustered index, and a query rewrite will get you most of the way there, with none of the operational overhead. Partitioning is a power tool. Use it when the weight is worth the fiddliness, not because it sounds impressive at the next team meeting.

Run this on your own machine

Fire up tempdb and walk through the whole flow yourself. Create the function, scheme, table. Insert a few rows spread across months. Run the $PARTITION query to see them in their buckets. Split a new partition. Switch one out to a staging table. Merge the empty partition away. Put the whole thing in a script you keep in your “SQL Server toolbox” folder, because the next time you need to build a sliding window in production you will not want to write it from scratch at midnight.

USE tempdb;
-- (paste the CREATE PARTITION FUNCTION / SCHEME / TABLE from above)

INSERT INTO dbo.Orders (OrderId, CustomerId, OrderDate, Amount)
VALUES
    (1, 10, '2025-01-15', 100.00),
    (2, 10, '2025-02-20', 200.00),
    (3, 20, '2025-03-10', 300.00),
    (4, 20, '2025-04-05', 400.00),
    (5, 30, '2025-05-12', 500.00);

-- See rows per partition
SELECT $PARTITION.pf_OrdersByMonth(OrderDate) AS p, COUNT(*)
FROM dbo.Orders
GROUP BY $PARTITION.pf_OrdersByMonth(OrderDate);

When those five rows land in five different partitions and the $PARTITION query shows you exactly which, that’s the “oh, this actually makes sense” moment. From there, the sliding window pattern is just a month-to-month choreography of split, switch, merge.

Next lesson: filegroups, data files, and what the log file is actually doing while you’re not looking.