SQL Server, from the ground up Lesson 32 / 40

SQL Agent jobs without tears

Scheduling, alerts, operator emails, and the 'my job silently fails' pattern. The Azure SQL equivalent via Elastic Jobs. Building maintenance that actually runs.

SQL Server Agent is the built-in scheduler. Every backup that runs at 2am, every maintenance job that rebuilds indexes on Sunday night, every hourly report that ships to finance — it’s Agent, quietly ticking over.

Until something fails silently, the email alert doesn’t go out because the operator wasn’t configured, and you find out on Wednesday when Finance asks “hey, where’s Monday’s report?”

This lesson is how to build Agent jobs that actually tell you when they break. Plus the Azure SQL Database equivalent, because Azure doesn’t have Agent proper.

What SQL Agent is

A separate Windows service (SQLSERVERAGENT) that runs alongside SQL Server. Its job: execute scheduled jobs. Agent lives inside msdb, which is why lesson 3 said msdb is “where SQL Agent lives.”

Jobs have:

  • Steps — one or more. Each is T-SQL, a PowerShell script, an OS command, an SSIS package, or a few other types.
  • Schedules — when to run. Cron-style patterns.
  • Alerts — what to do on failure (email operator, run another job, etc.).
  • Owner — usually sa; running under an impersonated account is possible.

Creating a simple job

Via T-SQL (the UI does the same thing behind the scenes):

USE msdb;
GO

EXEC sp_add_job
    @job_name = N'Runehold - Nightly Full Backup',
    @enabled = 1,
    @description = N'Takes a compressed FULL backup of Runehold to D:\Backups';

EXEC sp_add_jobstep
    @job_name = N'Runehold - Nightly Full Backup',
    @step_name = N'Backup',
    @subsystem = N'TSQL',
    @command = N'
        BACKUP DATABASE Runehold
        TO DISK = N''D:\Backups\Runehold_FULL.bak''
        WITH FORMAT, INIT, COMPRESSION, CHECKSUM;
    ',
    @database_name = N'master';

EXEC sp_add_schedule
    @schedule_name = N'Nightly 02:00',
    @freq_type = 4,                  -- daily
    @freq_interval = 1,               -- every day
    @active_start_time = 020000;      -- 02:00:00

EXEC sp_attach_schedule
    @job_name = N'Runehold - Nightly Full Backup',
    @schedule_name = N'Nightly 02:00';

EXEC sp_add_jobserver
    @job_name = N'Runehold - Nightly Full Backup',
    @server_name = @@SERVERNAME;

Ugly. The SSMS GUI is usually less painful for one-off jobs. Scripts like above are for deploying jobs across many environments.

Alerts and operators

By default, a failed job writes an entry in the Windows Event Log and… that’s it. Nobody knows. To get notified:

Configure Database Mail

SQL Server needs a mail profile:

EXEC msdb.dbo.sysmail_add_account_sp
    @account_name = N'Runehold SMTP',
    @description = N'SMTP account for ops notifications',
    @email_address = N'ops@runehold.example',
    @display_name = N'Runehold Ops',
    @mailserver_name = N'smtp.runehold.example',
    @port = 587,
    @enable_ssl = 1,
    @username = N'ops@runehold.example',
    @password = N'secret';

EXEC msdb.dbo.sysmail_add_profile_sp @profile_name = N'Default';
EXEC msdb.dbo.sysmail_add_profileaccount_sp
    @profile_name = N'Default',
    @account_name = N'Runehold SMTP',
    @sequence_number = 1;

Define an operator

EXEC msdb.dbo.sp_add_operator
    @name = N'DBA Team',
    @enabled = 1,
    @email_address = N'dba@runehold.example';

Tell the job to notify on failure

EXEC msdb.dbo.sp_update_job
    @job_name = N'Runehold - Nightly Full Backup',
    @notify_level_email = 2,         -- 2 = on failure
    @notify_email_operator_name = N'DBA Team';

notify_level_email values: 0 (never), 1 (on success), 2 (on failure), 3 (on completion).

Always set this to 2 for every job. “Job ran but failed silently” is the single most common avoidable operations disaster.

Multi-step jobs

A job can have multiple steps. Each step can branch on success or failure:

  • On success → go to next step, quit with success, or jump to a specific step.
  • On failure → quit with failure, quit with success (!), or jump elsewhere.

The “quit with success on failure” branch is useful when a step is optional. For example: “update stats; if that fails for one reason (locked), skip; if it fails for another, abort.”

Generally: keep steps atomic and let failures propagate. Swallowing failures silently is how quiet disasters happen.

Reading job history

The UI shows recent runs for each job. Programmatically:

SELECT TOP (20)
    j.name                                              AS job_name,
    s.step_id,
    s.step_name,
    CASE h.run_status WHEN 0 THEN 'Failed' WHEN 1 THEN 'Succeeded' WHEN 2 THEN 'Retry' WHEN 3 THEN 'Cancelled' WHEN 4 THEN 'In Progress' END AS status,
    CONVERT(VARCHAR, msdb.dbo.agent_datetime(h.run_date, h.run_time), 120) AS run_time,
    h.run_duration,
    h.message
FROM msdb.dbo.sysjobhistory AS h
JOIN msdb.dbo.sysjobs       AS j ON j.job_id = h.job_id
LEFT JOIN msdb.dbo.sysjobsteps AS s
    ON s.job_id = h.job_id AND s.step_id = h.step_id
ORDER BY h.run_date DESC, h.run_time DESC;

Shows every recent step of every job with its outcome. Run this on an unfamiliar server to quickly see what’s been breaking.

Token-expansion gotcha

Step command text can use Agent tokens like $(ESCAPE_SQUOTE(JOBNAME)). Since SQL 2005, SQL Server requires “escape macros” around tokens to prevent injection. A command written without them will fail with “Unable to parse token” or similar. If you inherit old jobs, watch for this when editing.

Agent jobs that everyone should have

Minimum set for any production SQL Server:

  1. Nightly FULL backup (or DIFF + daily FULL).
  2. Frequent LOG backups (every 15-30 min if in FULL recovery).
  3. Weekly IndexOptimize — Ola’s script, covered in lesson 35.
  4. Weekly DatabaseIntegrityCheck — Ola’s wrapper around DBCC CHECKDB.
  5. Monitoring / heartbeat — a job that fails if something about the server is unhealthy, emitting an alert.
  6. Log Agent job history cleanup — Agent retains history forever unless you clean it. After a year or two on a busy server, msdb gets huge.

The Ola Hallengren install script (lesson 35) creates #3 and #4 for you.

Agent on Azure SQL Database

Azure SQL Database is missing Agent. The equivalent is Elastic Jobs: a separate Azure resource that runs T-SQL on a schedule against target databases.

  • Create an Elastic Job Agent resource in Azure.
  • Target databases can be individual or a whole server / pool.
  • Jobs are defined in T-SQL and schedule-configurable.
  • Monitoring via Azure Portal or DMVs.

Azure SQL Managed Instance does have SQL Agent, so if you’re migrating from on-prem and need Agent, MI is the drop-in path.

For Runehold, running on Azure SQL MI, we use Agent as on-prem. If we were on Azure SQL DB, we’d rebuild maintenance jobs in Elastic Jobs.

Monitoring Agent from outside SQL

For serious shops, sending Agent job outcomes to a central monitoring system (Datadog, Grafana, Azure Monitor, Prometheus) is non-optional. The DBA tool of choice is:

  • A dedicated script that runs every N minutes, queries msdb.dbo.sysjobhistory, and emits a metric per job per outcome.
  • PowerShell or a sidecar agent does the polling.

Once this exists, “a backup job that didn’t run” becomes an alert in your normal ops channels, not a surprise on Wednesday.

Run this on your own machine

USE msdb;
GO

-- 1. See jobs that exist
SELECT name, enabled, date_created, date_modified FROM msdb.dbo.sysjobs;

-- 2. See recent history
SELECT TOP (10)
    j.name                        AS job_name,
    CASE h.run_status WHEN 0 THEN 'Failed' WHEN 1 THEN 'Succeeded' ELSE CAST(h.run_status AS VARCHAR) END AS status,
    CONVERT(VARCHAR, msdb.dbo.agent_datetime(h.run_date, h.run_time), 120) AS run_time,
    h.message
FROM msdb.dbo.sysjobhistory AS h
JOIN msdb.dbo.sysjobs       AS j ON j.job_id = h.job_id
ORDER BY h.run_date DESC, h.run_time DESC;

-- 3. Create a simple smoke-test job
EXEC sp_add_job @job_name = N'Smoke Test';
EXEC sp_add_jobstep @job_name = N'Smoke Test',
    @step_name = N'Check',
    @subsystem = N'TSQL',
    @command = N'SELECT GETDATE() AS ping;',
    @database_name = N'master';
EXEC sp_add_schedule @schedule_name = N'Every 5 min',
    @freq_type = 4, @freq_interval = 1,
    @freq_subday_type = 4, @freq_subday_interval = 5;
EXEC sp_attach_schedule @job_name = N'Smoke Test', @schedule_name = N'Every 5 min';
EXEC sp_add_jobserver @job_name = N'Smoke Test', @server_name = @@SERVERNAME;

-- 4. Run it now
EXEC sp_start_job @job_name = N'Smoke Test';

-- 5. Check history
SELECT TOP (5) * FROM msdb.dbo.sysjobhistory
WHERE job_id = (SELECT job_id FROM msdb.dbo.sysjobs WHERE name = N'Smoke Test')
ORDER BY run_date DESC, run_time DESC;

-- 6. Clean up
EXEC sp_delete_job @job_name = N'Smoke Test';

Agent is unglamorous. It runs whether anyone is watching or not. That makes it either your best friend or your quietest enemy, depending on whether you wired up the alerts.

Next: security — logins, users, roles, and why the public role is the trap most people fall into.

Search