Runehold’s ops channel, Tuesday 09:14: “Hey, the order-search endpoint got slow overnight. Anyone change anything?” Half the team denies deployment. The other half says “I only changed the CSS.” The DBA opens Query Store.
Query Store is a built-in historical record of every query your server has run and every plan the optimizer used. It’s a time machine. Before it existed (pre-SQL Server 2016), finding the “plan that was working last week” was detective work. With Query Store, it’s three clicks.
This lesson covers the plan cache (the short-term memory of the engine) and Query Store (the long-term record), how they relate, and how to actually use them to fix production problems.
The plan cache
The plan cache is RAM-resident. Every compiled plan lives there, keyed by the query’s text hash. When you execute a query, SQL Server checks the cache first; if the plan exists, it’s reused; otherwise, a fresh one is compiled and cached.
Contents of the cache flush when:
- SQL Server restarts.
- Memory pressure evicts cold plans.
- Someone runs
DBCC FREEPROCCACHE. - Certain schema changes invalidate plans.
-- Top 20 queries by total CPU from the plan cache
SELECT TOP (20)
qs.execution_count,
qs.total_worker_time / 1000 AS total_cpu_ms,
qs.total_logical_reads,
SUBSTRING(st.text,
qs.statement_start_offset / 2 + 1,
(CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset) / 2 + 1) AS statement_text,
qp.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
ORDER BY qs.total_worker_time DESC;
Click query_plan in the results → plan opens in SSMS. Useful, but this data disappears on restart. That’s the limitation Query Store solves.
Query Store, enabled
Query Store is a database-level feature. Enable per database:
ALTER DATABASE Runehold SET QUERY_STORE = ON (
OPERATION_MODE = READ_WRITE,
CLEANUP_POLICY = (STALE_QUERY_THRESHOLD_DAYS = 30),
DATA_FLUSH_INTERVAL_SECONDS = 900,
MAX_STORAGE_SIZE_MB = 1000,
INTERVAL_LENGTH_MINUTES = 60,
QUERY_CAPTURE_MODE = AUTO
);
What each option does:
OPERATION_MODE = READ_WRITE— normal operation. Data collected and forced plans used.STALE_QUERY_THRESHOLD_DAYS = 30— keep 30 days of query history.DATA_FLUSH_INTERVAL_SECONDS = 900— flush in-memory stats to disk every 15 minutes. Smaller = less data loss on crash, more write overhead.MAX_STORAGE_SIZE_MB = 1000— 1 GB quota. Resize based on volume.INTERVAL_LENGTH_MINUTES = 60— bucket statistics into hourly slots for trend analysis.QUERY_CAPTURE_MODE:AUTOis smart (captures significant queries, skips trivial ones).ALLcaptures everything;NONEdisables capture but keeps forced plans.
AUTO is the right default for most workloads.
Once enabled, Query Store is running. Let it collect data for a week or two before relying on trends.
The Query Store UI in SSMS
Expand your database in SSMS → Query Store folder. Seven built-in reports:
- Regressed Queries — queries whose performance got worse.
- Overall Resource Consumption — which metrics are climbing.
- Top Resource Consuming Queries — the usual suspects.
- Queries With Forced Plans — what you’ve pinned.
- Queries With High Variation — same query, very different runtimes (parameter sniffing smoke).
- Query Wait Statistics — wait-type breakdown.
- Tracked Queries — follow one specific query over time.
The first report — Regressed Queries — is what you open on Tuesday morning when something broke. Pick a metric (CPU, Duration, Logical Reads), pick a time window, and SQL Server shows you queries whose averages got noticeably worse. Click on one and you see every plan that query has used over time. Identify the plan from before the regression, compare it to the current one, and decide whether to force the old plan.
Forcing a plan
Two clicks in the UI: select the good plan in the Tracked Query view, click “Force Plan.”
Programmatically:
-- Find the query you care about
SELECT q.query_id, qt.query_sql_text
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q ON q.query_text_id = qt.query_text_id
WHERE qt.query_sql_text LIKE '%GetOrdersForCustomer%';
-- Look at the plans for that query
SELECT p.plan_id, p.query_id, p.is_forced_plan, p.last_execution_time,
p.avg_duration, p.count_executions
FROM sys.query_store_plan AS p
WHERE p.query_id = 19
ORDER BY p.last_execution_time DESC;
-- Force the good plan
EXEC sp_query_store_force_plan @query_id = 19, @plan_id = 42;
-- Later, if you change your mind
EXEC sp_query_store_unforce_plan @query_id = 19, @plan_id = 42;
The forced plan is applied automatically to every subsequent execution. No code change. No hint. Reversible.
When forcing fails: if the schema changes in a way that invalidates the forced plan (you drop an index the plan uses, for example), Query Store unforces it automatically and flags it. Check periodically:
SELECT query_id, plan_id, force_failure_count, last_force_failure_reason_desc
FROM sys.query_store_plan
WHERE is_forced_plan = 1 AND force_failure_count > 0;
Built-in regression hunt query
Useful query: “queries that got slower in the last day compared to the previous week.”
WITH recent AS (
SELECT p.query_id, p.plan_id,
AVG(rs.avg_duration) AS recent_avg_us
FROM sys.query_store_plan AS p
JOIN sys.query_store_runtime_stats AS rs ON rs.plan_id = p.plan_id
JOIN sys.query_store_runtime_stats_interval AS rsi ON rsi.runtime_stats_interval_id = rs.runtime_stats_interval_id
WHERE rsi.end_time >= DATEADD(HOUR, -24, GETDATE())
GROUP BY p.query_id, p.plan_id
),
baseline AS (
SELECT p.query_id, p.plan_id,
AVG(rs.avg_duration) AS baseline_avg_us
FROM sys.query_store_plan AS p
JOIN sys.query_store_runtime_stats AS rs ON rs.plan_id = p.plan_id
JOIN sys.query_store_runtime_stats_interval AS rsi ON rsi.runtime_stats_interval_id = rs.runtime_stats_interval_id
WHERE rsi.end_time BETWEEN DATEADD(DAY, -7, GETDATE()) AND DATEADD(DAY, -1, GETDATE())
GROUP BY p.query_id, p.plan_id
)
SELECT r.query_id, r.plan_id,
r.recent_avg_us / 1000 AS recent_ms,
b.baseline_avg_us / 1000 AS baseline_ms,
r.recent_avg_us * 1.0 / NULLIF(b.baseline_avg_us, 0) AS ratio
FROM recent AS r
JOIN baseline AS b ON b.query_id = r.query_id AND b.plan_id = r.plan_id
WHERE r.recent_avg_us > b.baseline_avg_us * 2
ORDER BY ratio DESC;
“Queries whose recent 24-hour average is at least 2× their baseline” — classic regression alert. Wire this to a monitoring tool and you’ll see regressions before support does.
Capture filter
On busy servers, Query Store can grow. Use QUERY_CAPTURE_POLICY (2019+) to skip low-frequency queries:
ALTER DATABASE Runehold SET QUERY_STORE (
QUERY_CAPTURE_POLICY = (
STALE_CAPTURE_POLICY_THRESHOLD = 24 HOURS,
EXECUTION_COUNT = 30,
TOTAL_COMPILE_CPU_TIME_MS = 1000,
TOTAL_EXECUTION_CPU_TIME_MS = 100
)
);
Requires a query to hit at least one of the thresholds before being captured. Dramatically reduces storage for high-volume systems without losing the queries that actually matter.
Azure SQL Database
On Azure SQL Database, Query Store is on by default and can’t be turned off. You also get automatic plan correction — Azure can automatically force the previous plan when it detects a regression. Opt in:
ALTER DATABASE CURRENT SET AUTOMATIC_TUNING (FORCE_LAST_GOOD_PLAN = ON);
Astonishingly useful. Saved me an on-call weekend in 2023 when the query planner rewrote a plan badly after a stats update and the Azure engine fixed itself before the alert went off.
Run this on your own machine
USE Runehold;
GO
-- 1. Enable Query Store if it's not already on
ALTER DATABASE Runehold SET QUERY_STORE = ON;
-- 2. Check it's running
SELECT actual_state_desc, readonly_reason, current_storage_size_mb
FROM sys.database_query_store_options;
-- 3. Run some queries so there's data to see
EXEC dbo.GetOrdersForCustomer @CustomerId = 1;
EXEC dbo.GetOrdersForCustomer @CustomerId = 2;
EXEC dbo.GetOrdersForCustomer @CustomerId = 42;
-- 4. Look at the top expensive queries
SELECT TOP (10)
qt.query_sql_text,
rs.avg_duration / 1000 AS avg_ms,
rs.count_executions
FROM sys.query_store_query_text AS qt
JOIN sys.query_store_query AS q ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan AS p ON p.query_id = q.query_id
JOIN sys.query_store_runtime_stats AS rs ON rs.plan_id = p.plan_id
ORDER BY rs.avg_duration DESC;
-- 5. Force a plan (fill in real query_id and plan_id from the UI or queries above)
-- EXEC sp_query_store_force_plan @query_id = 1, @plan_id = 1;
-- 6. Get the detailed stats for one query
SELECT
rsi.start_time, rsi.end_time,
rs.count_executions,
rs.avg_duration / 1000 AS avg_ms,
rs.avg_cpu_time / 1000 AS avg_cpu_ms,
rs.avg_logical_io_reads
FROM sys.query_store_query AS q
JOIN sys.query_store_plan AS p ON p.query_id = q.query_id
JOIN sys.query_store_runtime_stats AS rs ON rs.plan_id = p.plan_id
JOIN sys.query_store_runtime_stats_interval AS rsi ON rsi.runtime_stats_interval_id = rs.runtime_stats_interval_id
WHERE q.query_id = 1
ORDER BY rsi.start_time DESC;
Explore the Query Store UI in SSMS. Right-click a database → Query Store → Top Resource Consuming Queries. The graphs tell a story. Once you’ve used it twice, you won’t debug production without it.
Module 5 done. Execution plans, statistics, parameter sniffing, Query Store — the performance-diagnostics toolkit. The rest of the course shifts toward DBA operations: lesson 30 on filegroups and the log file, then lessons 31-36 on backups, SQL Agent, security, tempdb, Ola, and DMVs.