SQL Server, from the ground up Lesson 40 / 40

A 30-minute SQL Server health check: the script you actually run

The ordered checklist I run on any SQL Server I've never seen before. Printable, pasteable, and the closing lesson of the course.

You are handed the keys to a SQL Server you’ve never seen before. “Tell us what’s wrong with it by 5pm.”

This lesson is the 30-minute checklist I run. Every query, in order, with the thing you’re looking for and what to do if you find it. It’s the closing lesson of the course because it pulls together everything we’ve covered. You can print it. You can paste it into your toolbox. The day a new client hands you a SQL Server, you pull this out and you’re productive in half an hour.

Thanks for sticking through 40 lessons. Let’s finish with one scripted run through the entire DBA toolkit.

Step 1 — The server overview (2 minutes)

SELECT
    @@SERVERNAME,
    @@VERSION,
    SERVERPROPERTY('Edition'),
    SERVERPROPERTY('ProductVersion'),
    SERVERPROPERTY('MachineName'),
    cpu_count, hyperthread_ratio, physical_memory_kb / 1024 AS physical_memory_mb,
    sqlserver_start_time
FROM sys.dm_os_sys_info;

Looking for:

  • SQL Server version — 2019+ ideally.
  • Edition — Enterprise unlocks more features.
  • Uptime — if < 1 week, your wait stats and cache will be misleading.
  • Memory — compare to max server memory setting.

Step 2 — sp_Blitz (5 minutes)

EXEC sp_Blitz;

Read the priority 1-20 findings. Triage:

  • P1 “No backup” → fix immediately.
  • P1 “Corruption” → fix right now, see lesson 31.
  • P10 “SA account weak password” → note, raise to security.
  • P50 configuration weirdness → document, fix later.

Step 3 — Recovery model and backup status (2 minutes)

SELECT d.name,
       d.recovery_model_desc,
       DATEDIFF(HOUR, MAX(CASE WHEN bs.type = 'D' THEN bs.backup_finish_date END), GETDATE()) AS hours_since_last_full,
       DATEDIFF(HOUR, MAX(CASE WHEN bs.type = 'L' THEN bs.backup_finish_date END), GETDATE()) AS hours_since_last_log
FROM sys.databases AS d
LEFT JOIN msdb.dbo.backupset AS bs ON bs.database_name = d.name
WHERE d.name NOT IN ('tempdb')
GROUP BY d.name, d.recovery_model_desc;

Looking for:

  • FULL recovery with no recent log backup → log is filling up; fix or switch to SIMPLE.
  • SIMPLE recovery on a business-critical database → confirm this is intentional.
  • No full backup in > 48 hours → schedule one now.

Step 4 — tempdb configuration (1 minute)

SELECT name, size * 8 / 1024 AS size_mb, growth, is_percent_growth, type_desc
FROM tempdb.sys.database_files;

Looking for:

  • Single data file → add more (one per vCPU up to 8).
  • is_percent_growth = 1 with small values → fix to absolute MB.
  • Small sizes → pre-grow to an expected working size.

Step 5 — Wait stats (3 minutes)

(The filtered query from lesson 36.) Looking for:

  • PAGEIOLATCH_* on top → slow disk or undersized buffer pool.
  • LCK_* on top → blocking issues, see Step 8.
  • CXPACKET on top → parallelism issues; probably not urgent.
  • ASYNC_NETWORK_IO on top → app reading slowly or huge result sets.
  • SOS_SCHEDULER_YIELD high → CPU pressure.

Top waits define the next investigation.

Step 6 — sp_BlitzFirst for live wait stats (2 minutes)

EXEC sp_BlitzFirst @ExpertMode = 1;

5-second sample of current activity. Tells you what’s happening right now (vs cumulative since restart).

Step 7 — Top expensive queries (3 minutes)

USE DBA;
EXEC sp_BlitzCache @Top = 20, @SortOrder = 'CPU';
EXEC sp_BlitzCache @Top = 20, @SortOrder = 'Reads';

Read the warnings column. Common offenders:

  • Implicit conversion → fix the query or column type.
  • Key lookup-heavy → add a covering index.
  • Parameter sniffing → recompile or Query Store force.
  • Spills to tempdb → more memory or better index.

Step 8 — Current blocking and live activity (1 minute)

EXEC sp_WhoIsActive @find_block_leaders = 1, @get_plans = 1;

Looking for:

  • Long-running blockers → decide whether to kill.
  • Sessions sleeping with open transactions → zombie; kill.
  • Heavy active queries → correlate with top CPU.

Step 9 — Index audit (5 minutes)

USE Runehold;   -- adjust for each user database
EXEC sp_BlitzIndex;

Looking for:

  • P1 “Duplicate indexes” → drop duplicates.
  • Heap tables → add clustered index.
  • Unused indexes → candidates to drop (verify across time).
  • Missing index suggestions → evaluate, pick a few top-impact.

Step 10 — Query Store posture (1 minute)

SELECT name, is_query_store_on
FROM sys.databases WHERE name NOT IN ('master', 'tempdb', 'model', 'msdb');

If any user database has Query Store off, turn it on (lesson 28). Without it, you can’t easily find regressions later.

Step 11 — File growth and disk usage (2 minutes)

SELECT DB_NAME(mf.database_id) AS db, mf.name, mf.type_desc,
       mf.size * 8 / 1024 AS size_mb, mf.growth, mf.is_percent_growth
FROM sys.master_files AS mf
WHERE mf.database_id > 4
ORDER BY size_mb DESC;

Looking for:

  • Log files with is_percent_growth = 1 → change to absolute MB.
  • Tiny fixed growth (e.g., 10 MB) → increase to a sensible size.
  • Largest databases → confirm disk headroom.

Step 12 — Agent job health (2 minutes)

SELECT TOP (20)
    j.name, h.step_id,
    CASE h.run_status
        WHEN 0 THEN 'Failed' WHEN 1 THEN 'Succeeded' WHEN 2 THEN 'Retry'
        WHEN 3 THEN 'Cancelled' WHEN 4 THEN 'In Progress' END AS status,
    CONVERT(VARCHAR, msdb.dbo.agent_datetime(h.run_date, h.run_time), 120) AS run_time,
    h.message
FROM msdb.dbo.sysjobhistory AS h
JOIN msdb.dbo.sysjobs       AS j ON j.job_id = h.job_id
ORDER BY h.run_date DESC, h.run_time DESC;

Looking for:

  • Recent failures on backup or maintenance jobs.
  • Jobs that haven’t run in a long time (schedule disabled? Agent stopped?).
  • “In Progress” jobs that should have finished an hour ago.

Step 13 — Check for Ola Hallengren + First Responder Kit (1 minute)

SELECT OBJECT_SCHEMA_NAME(object_id) + '.' + name AS proc_name
FROM DBA.sys.procedures
WHERE name IN ('IndexOptimize', 'DatabaseBackup', 'DatabaseIntegrityCheck',
               'sp_Blitz', 'sp_BlitzIndex', 'sp_BlitzCache',
               'sp_BlitzFirst', 'sp_WhoIsActive');

Install any that are missing. Minutes well spent.

Step 14 — Deadlock history (1 minute)

-- From lesson 20
SELECT TOP (10)
    XEvent.value('(@timestamp)[1]', 'DATETIME2') AS ts,
    XEvent.query('.')                              AS deadlock_xml
FROM (
    SELECT CAST(target_data AS XML) AS td
    FROM sys.dm_xe_session_targets st
    JOIN sys.dm_xe_sessions s ON s.address = st.event_session_address
    WHERE s.name = 'system_health' AND st.target_name = 'ring_buffer'
) AS source
CROSS APPLY td.nodes('RingBufferTarget/event[@name="xml_deadlock_report"]') AS X(XEvent)
ORDER BY ts DESC;

Looking for recent deadlocks. If > 1 per day, something’s worth fixing.

Step 15 — Write it up (5 minutes)

Take your notes and turn them into a short report:

Runehold SQL Server Health Check — 2026-04-16

  • Overall: Yellow. Three P1 findings.
  • Critical:
    • Sales_archive DB in FULL recovery with no log backup in 7 days. Log file 87% used. Fix today.
    • No backups on Reporting_Staging for 48 hours. Reason: disabled Agent job Stage Backup. Re-enable.
  • High:
    • tempdb has 1 data file on a server with 8 cores. Expected PAGELATCH contention. Add 7 more files.
    • Implicit conversion in top CPU query sp_GetOrdersByCountry. Fix data type.
  • Medium:
    • 22% of non-clustered indexes unused for 30+ days. Evaluate drops.
    • Query Store disabled on 3 user databases. Enable.
  • Low:
    • Old sa password (policy says 90-day rotation). Rotate.
    • 140+ Agent job history rows older than a year in msdb. Purge.
  • Follow-up:
    • Schedule quarterly restore drill.
    • Install Ola Hallengren scripts.
    • Migrate from SQL 2019 to 2022 — already supported.

That’s your deliverable. Short, actionable, prioritized.

What’s next after this course

You now know the equivalent of a year of on-the-job SQL Server experience, squeezed into 40 lessons. What next:

  1. Read Brent Ozar’s blog — weekly new posts on everything SQL Server. Opinionated, practical, funny.
  2. Paul Randal’s blog at SQLskills — deep technical dives on the storage engine, corruption, and DBCC. The internals expert.
  3. Erik Darling’s blog and videos — opinionated query-tuning advice, a lot of CTE and window-function wizardry.
  4. brentozar.com/training — Brent’s team runs training classes. Excellent if you can afford them or your employer pays.
  5. Contribute to open source — Ola’s scripts, the First Responder Kit, sp_WhoIsActive all take pull requests.
  6. Practice on a real workload — a sandbox with Adventure Works or your own dataset. Run the health check. Fix findings. Repeat monthly.

Congratulations

You made it through 40 lessons of SQL Server. You now know:

  • The engine (Modules 1-3).
  • Indexing and performance (Modules 4-5).
  • Storage and partitioning (Module 6).
  • DBA essentials — backups, Agent, security, tempdb, maintenance, DMVs (Module 7).
  • The emergency toolkit — First Responder Kit and sp_WhoIsActive (Module 8).

If this course has done its job, you’re ready to open a SQL Server you’ve never seen, diagnose it in 30 minutes, fix the top issues, and not cry at 3am when it pages you. That’s the bar. That’s what “knowing SQL Server” means at this level.

Go take care of your production systems. They’re depending on you.

— Thanks for reading. If you spotted errors, have suggestions, or want to argue about MERGE, say hi.

Search