The first time someone told me a clustered index “is” the table, I nodded politely and pretended to understand. It took me a year of running into surprising query plans before it actually clicked. This is the explanation I wish I’d had on day one.
What a clustered index actually is
In SQL Server, a table is stored as a B-tree. The clustered index is the rule for how that B-tree is sorted on disk. There is exactly one per table, because a table can only be physically sorted one way at a time.
When you write:
CREATE TABLE dbo.Orders (
OrderId BIGINT NOT NULL PRIMARY KEY,
CustomerId INT NOT NULL,
OrderDate DATETIME2(0) NOT NULL,
Amount DECIMAL(18,2) NOT NULL
);
…SQL Server quietly creates a clustered index on OrderId for you, because the primary key defaults to clustered. The leaves of that B-tree are the rows of Orders. There’s no separate “table” structure underneath. The index is the table.
A non-clustered index is a second, smaller B-tree that lives next to the table. Its leaves don’t contain the rows — they contain the indexed columns plus a pointer back to the clustered index. So a query that uses a non-clustered index does two lookups: one in the non-clustered tree to find the pointer, then a “key lookup” back into the clustered index for the rest of the columns. (Unless the non-clustered index is covering, meaning it has every column the query asks for in its leaf pages or INCLUDE list.)
If a table has no clustered index it’s called a heap, and a heap is almost always a mistake. Heaps don’t sort their pages, fragment unpredictably, and force every non-clustered index to use a physical row identifier that breaks if pages move. Make every table have a clustered index. This is the one rule with almost no exceptions.
Picking the clustered key
The clustered key is the most important schema decision you make on a table, because it dictates the physical order of every row and every non-clustered index also stores a copy of it. The classic checklist (this is from Kimberly Tripp’s writing and it has held up for fifteen years):
- Narrow. Every byte of the clustered key is repeated in every non-clustered index. A wide clustered key inflates every other index on the table.
- Static. Updating the clustered key means physically moving the row, which is expensive and updates every non-clustered index. Pick a column that doesn’t change.
- Unique. If your clustered key isn’t unique, SQL Server quietly adds a 4-byte “uniquifier” behind your back. Better to be unique on purpose.
- Ever-increasing. New rows then go onto the trailing page of the B-tree, which avoids page splits in the middle of the tree. This is why
BIGINT IDENTITYandDATETIME2(for append-only event tables) are the two clustered-key workhorses.
This is also why people warn you off GUID clustered keys: they’re wide (16 bytes), random (every insert lands in a different page), and they cause constant fragmentation. If you must use a GUID as your logical key, make it a non-clustered unique index and cluster on a BIGINT IDENTITY instead. This change alone has rescued more than one production system I’ve worked on.
What columns belong in non-clustered indexes
Build non-clustered indexes for the queries you actually run. The rule of thumb:
- Equality columns first, then range columns, then ordering columns, then
INCLUDEthe rest. - For
WHERE customer_id = @c AND order_date >= @start, the index(customer_id, order_date)is right;(order_date, customer_id)is wrong, because once you do a range scan onorder_datethe second column is no longer in sorted order under your filter. - Use
INCLUDEfor columns the query needs to return but doesn’t need to filter or sort on. They live in the leaf pages, make the index covering, and they don’t bloat the upper levels of the tree.
Don’t index every column “just in case.” Each non-clustered index has to be maintained on every INSERT, UPDATE, and DELETE. Five well-chosen indexes will outperform fifteen badly-chosen ones, because the writes don’t pile up.
The write cost everyone forgets
Indexes make reads faster and writes slower. The clustered index is special because it is the table — you can’t avoid it. But every non-clustered index multiplies your write cost. A table with 10 non-clustered indexes pays for 11 writes on every INSERT (the table plus each index), and the same on DELETE, and on UPDATE if any of the indexed columns changed.
This is the part nobody mentions in tutorials. I’ve watched a write-heavy ingestion job go from 90 seconds to 14 minutes because someone added two “helpful” non-clustered indexes during a debugging session and forgot to remove them. The query plans on the read side looked beautiful. The pipeline died.
A useful rule: before adding a non-clustered index to a hot table, look at how many writes that table takes per minute and ask whether the read benefit is worth a few extra microseconds on every one of them.
Fragmentation, and how to actually deal with it
Two kinds of fragmentation matter:
- External (logical) fragmentation — pages that are no longer in physical order on disk. Hurts large range scans (a
SELECTover a million rows reads pages out of order). - Internal fragmentation — pages that are half-empty because of
UPDATEs and page splits. Wastes cache and disk.
The rule of thumb that has worked for me — also originally from Microsoft’s own guidance:
SELECT
OBJECT_NAME(ips.object_id) AS table_name,
i.name AS index_name,
ips.avg_fragmentation_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') AS ips
JOIN sys.indexes AS i
ON i.object_id = ips.object_id
AND i.index_id = ips.index_id
WHERE ips.page_count > 1000 -- ignore tiny indexes
ORDER BY ips.avg_fragmentation_in_percent DESC;
Then:
- 5–30% fragmented →
ALTER INDEX ... REORGANIZE. Online, low-impact, doesn’t fully fix the index but defragments leaf pages. - > 30% fragmented →
ALTER INDEX ... REBUILD. Heavier, rebuilds the index entirely. UseWITH (ONLINE = ON)on Enterprise Edition to keep the table available; on Standard Edition the rebuild takes a schema lock, so you do it in a maintenance window. - Below 5% → leave it alone. Defragmenting tiny amounts is pure I/O for no benefit.
The biggest win, though, is not letting fragmentation happen in the first place: pick an ever-increasing clustered key, set a sensible FILLFACTOR (90% on tables that take a lot of UPDATEs, 100% on append-only ones), and accept that SHRINKDATABASE is almost always a mistake — it re-fragments every index it touches.
The mental model
When you understand that the clustered index is the table and that every non-clustered index carries a copy of the clustered key, a lot of the seemingly-arbitrary advice falls into place:
- Wide clustered key → every non-clustered index gets bigger.
- Random clustered key → every insert fragments the table.
- Updating a clustered-key column → physically moves the row and updates every non-clustered index.
- Too many non-clustered indexes → writes slow down by a multiple of however many you have.
Indexes are a budget. Spend it on the queries that matter, leave the rest alone, and your future self will thank you when the on-call alert at 3 a.m. is for someone else’s table.