String handling in SQL Server went from “write your own CLR function” to “there’s a built-in for that” somewhere between 2016 and 2022. Today’s lesson is the current state of the art: the functions you’ll actually use in modern T-SQL for names, addresses, SKU codes, CSV imports, and the forty other places strings show up in Runehold’s data.
Concatenation: CONCAT and CONCAT_WS
The + operator concatenates strings, but has two gotchas:
NULL + anythingisNULL. One null value zeros the whole result.- Mixing types needs explicit casts:
'Order #' + CAST(OrderId AS NVARCHAR(20)).
CONCAT() fixes both:
-- NULL-safe, auto-casts
SELECT CONCAT('Order #', o.OrderId, ' - €', o.Total, ' from ', c.Name) AS description
FROM Sales.Orders AS o
JOIN Sales.Customer AS c ON c.CustomerId = o.CustomerId;
CONCAT treats NULL as an empty string and implicitly converts non-string arguments. Much safer.
CONCAT_WS — With Separator — adds a delimiter between every non-NULL argument:
SELECT CONCAT_WS(', ',
c.Name,
c.AddressLine1,
c.AddressLine2, -- NULL rows silently skipped
c.PostalCode,
c.City,
c.Country) AS mailing_label
FROM Sales.Customer AS c;
No trailing commas, no double delimiters when a field is NULL. Address labels, CSV rows, breadcrumb strings — CONCAT_WS is built for exactly these.
STRING_AGG: aggregate rows into a delimited string
The other half of that CSV superpower. Combine many rows of one column into a single string:
-- List all products in each order, comma-separated
SELECT o.OrderId,
STRING_AGG(p.Name, ', ') WITHIN GROUP (ORDER BY p.Name) AS products
FROM Sales.OrderLine AS ol
JOIN Sales.Orders AS o ON o.OrderId = ol.OrderId
JOIN Catalog.Product AS p ON p.ProductId = ol.ProductId
GROUP BY o.OrderId
ORDER BY o.OrderId;
STRING_AGG(col, separator) WITHIN GROUP (ORDER BY col) — concatenate all non-NULL values within each group, sorted the way you want.
Before STRING_AGG (added in SQL Server 2017), this required the infamous FOR XML PATH trick — about 15 lines of unreadable code. STRING_AGG makes it one line. If you see FOR XML PATH('') in old queries, you’re looking at a pre-2017 workaround that deserves to be refactored.
STRING_SPLIT: turn a delimited string into rows
The opposite: split a CSV string into individual rows.
-- Simple split
SELECT value FROM STRING_SPLIT('NL,IT,DE,FR,ES', ',');
-- Returns 5 rows.
-- With ordinal (SQL Server 2022+, requires enable_ordinal)
SELECT value, ordinal
FROM STRING_SPLIT('apple,banana,cherry', ',', 1);
-- Returns rows with their position
Useful for:
- Breaking tag lists into individual tags.
- Parsing CSV imports.
- Accepting a “list” parameter in a stored procedure.
Classic parameter pattern:
CREATE PROCEDURE Sales.GetOrdersByCountries
@CountryCsv NVARCHAR(500)
AS
BEGIN
SELECT o.OrderId, o.Total, o.CountryCode
FROM Sales.Orders AS o
WHERE o.CountryCode IN (SELECT value FROM STRING_SPLIT(@CountryCsv, ','));
END;
-- Called from the app:
EXEC Sales.GetOrdersByCountries @CountryCsv = 'NL,IT,DE';
A simple way to pass a list of values as a single parameter. Not the only way — table-valued parameters are more efficient for large lists — but it’s clean for up to a few dozen values.
TRIM and friends
SELECT LTRIM(RTRIM(' hello world ')); -- 'hello world', old style
SELECT TRIM(' hello world '); -- 'hello world', since 2017
SELECT TRIM(',' FROM ',,,trimmed,,,'); -- 'trimmed', any char (2022+)
TRIM defaults to whitespace. You can also specify a set of characters to trim:
SELECT TRIM(' ,.;' FROM ' , .Mr. Bloch; ');
-- 'Mr. Bloch'
Useful for cleaning up imported data that has inconsistent padding, stray commas, or BOM characters.
Pattern matching: LIKE, PATINDEX, CHARINDEX
LIKE
-- Emails ending in @gmail.com
WHERE Email LIKE '%@gmail.com';
-- Names starting with 'Anne'
WHERE Name LIKE 'Anne%';
-- Exactly 5 characters
WHERE Code LIKE '_____';
-- Character classes (single char)
WHERE Code LIKE '[A-Z]%'; -- starts with uppercase letter
WHERE Code LIKE '[^0-9]%'; -- starts with a non-digit
Wildcards: % matches any string (including empty), _ matches exactly one char, [abc] matches one of the listed chars, [^abc] matches one char not in the set.
SARGability reminder (lesson 7): LIKE 'prefix%' is SARGable (can seek an index). LIKE '%suffix' and LIKE '%middle%' are not — they force a scan. If you need full-text search, use SQL Server’s Full-Text Search feature.
PATINDEX and CHARINDEX
Find position of a pattern or substring:
SELECT CHARINDEX('@', Email) AS at_position FROM Sales.Customer;
-- Position of '@' in the email.
SELECT PATINDEX('%[0-9]%', 'order 123') AS first_digit_pos;
-- 7. Pattern match like LIKE, returns position.
Returns 0 if not found. Useful for splitting strings manually when STRING_SPLIT isn’t enough.
Common string operations
-- Convert case
SELECT UPPER(Name), LOWER(Email) FROM Sales.Customer;
-- Substring
SELECT SUBSTRING(Email, 1, CHARINDEX('@', Email) - 1) AS localpart,
SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000) AS domain
FROM Sales.Customer;
-- Replace
SELECT REPLACE('Apples, oranges, bananas', ',', ';') AS csv_semi;
-- Length
SELECT LEN(Name) AS length_char, -- character count (trims trailing spaces)
DATALENGTH(Name) AS length_bytes -- byte count (2x for NVARCHAR)
FROM Sales.Customer;
-- Reverse (rarely useful, occasionally handy)
SELECT REVERSE('goblin'); -- 'nilbog'
-- Padding
SELECT RIGHT(REPLICATE('0', 10) + CAST(OrderId AS NVARCHAR(10)), 10) AS padded_id
FROM Sales.Orders;
-- Zero-pad to 10 chars.
-- Or FORMAT, slower but cleaner
SELECT FORMAT(OrderId, '0000000000') AS padded_id
FROM Sales.Orders;
LEN vs DATALENGTH matters: LEN trims trailing spaces and returns character count; DATALENGTH returns bytes (so 2× for Unicode). If you care about storage or you’re working with VARBINARY, use DATALENGTH. Otherwise LEN.
FORMAT is convenient but known to be slow — it calls into .NET’s System.Globalization framework on every row. For dashboards and light queries it’s fine; for hot queries, use CONVERT + explicit patterns.
Regular expressions (the bad news)
SQL Server doesn’t have native regex. LIKE is limited pattern matching, not regex.
Your options:
- Simulate common patterns with
LIKE. Works for 80% of cases. - Write a CLR function — possible, rare, adds deployment complexity.
- SQL Server 2025+ has native
REGEXP_LIKE,REGEXP_REPLACE,REGEXP_INSTR,REGEXP_SUBSTR. If you’re on 2025, celebrate and use them. - Parse in the application — for truly complex regex needs, the application code is often the right place.
-- SQL Server 2025+
SELECT Email FROM Sales.Customer WHERE REGEXP_LIKE(Email, '^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$');
If you inherit a database with user-defined CLR functions called RegexMatch, RegexReplace, etc. — that was the pre-2025 workaround. When you migrate to 2025, replace them with the native functions.
Collation and case sensitivity in strings
Remember collation from lesson 4: it affects sorting and equality comparisons for strings.
-- Case-sensitive check on a case-insensitive column
SELECT Email FROM Sales.Customer
WHERE Email = 'Anne@Example.COM' COLLATE Latin1_General_CS_AS;
-- This will match 'Anne@Example.COM' exactly, not 'anne@example.com'.
COLLATE <name> overrides the collation for a single comparison. Useful for forcing case-sensitivity in an otherwise case-insensitive database.
Useful-but-overused: ISJSON and JSON extraction
T-SQL since 2016 has ISJSON, JSON_VALUE, JSON_QUERY, OPENJSON for parsing JSON stored in NVARCHAR columns. SQL Server 2025 adds a native JSON type.
-- Product attributes stored as JSON
SELECT ProductId,
JSON_VALUE(Attributes, '$.color') AS color,
JSON_VALUE(Attributes, '$.material') AS material,
JSON_VALUE(Attributes, '$.dimensions.cm') AS dim_cm
FROM Catalog.Product
WHERE JSON_VALUE(Attributes, '$.color') = 'black';
Useful when your schema has flexible attributes. Warning: the JSON_VALUE call isn’t SARGable on that column unless you create a computed column + index. For filter-heavy queries, extract into a real column.
Run this on your own machine
USE Runehold;
GO
-- 1. CONCAT and CONCAT_WS
SELECT CustomerId,
CONCAT_WS(' - ', Name, Country, Email) AS label
FROM Sales.Customer;
-- 2. STRING_AGG: product list per order
-- (needs OrderLine and Product tables; here's a lightweight version)
SELECT CustomerId,
STRING_AGG(CAST(OrderId AS NVARCHAR(20)), ', ')
WITHIN GROUP (ORDER BY OrderId) AS orders_csv,
COUNT(*) AS order_count
FROM Sales.Orders
GROUP BY CustomerId
ORDER BY order_count DESC;
-- 3. STRING_SPLIT: filter by a CSV parameter
DECLARE @countries NVARCHAR(100) = 'NL,IT,DE';
SELECT OrderId, CountryCode, Total
FROM Sales.Orders
WHERE CountryCode IN (SELECT value FROM STRING_SPLIT(@countries, ','))
ORDER BY CountryCode;
-- 4. Extract email parts with SUBSTRING + CHARINDEX
SELECT Email,
SUBSTRING(Email, 1, CHARINDEX('@', Email) - 1) AS local_part,
SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000) AS domain
FROM Sales.Customer
WHERE CHARINDEX('@', Email) > 0;
-- 5. Domain-frequency report
SELECT SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000) AS domain,
COUNT(*) AS customers
FROM Sales.Customer
WHERE CHARINDEX('@', Email) > 0
GROUP BY SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000)
ORDER BY customers DESC;
-- 6. TRIM with a set of characters
SELECT TRIM(' ,.;' FROM ' ;Piotr Nowak, ') AS cleaned;
Run each. Look at the outputs. Notice how much shorter and cleaner these are than before the modern functions existed.
Next lesson: dates and times. The timezone swamp. Why DATETIMEOFFSET exists, what AT TIME ZONE does, and the bugs that come from storing local time naively.