String functions: STRING_AGG, STRING_SPLIT, TRIM

String handling in SQL Server went from “write your own CLR function” to “there’s a built-in for that” somewhere between 2016 and 2022. Today’s lesson is the current state of the art: the functions you’ll actually use in modern T-SQL for names, addresses, SKU codes, CSV imports, and the forty other places strings show up in Runehold’s data.

Concatenation: `CONCAT` and `CONCAT_WS`

The + operator concatenates strings, but has two gotchas:

NULL + anything is NULL. One null value zeros the whole result.
Mixing types needs explicit casts: 'Order #' + CAST(OrderId AS NVARCHAR(20)).

CONCAT() fixes both:

-- NULL-safe, auto-casts
SELECT CONCAT('Order #', o.OrderId, ' - €', o.Total, ' from ', c.Name) AS description
FROM Sales.Orders AS o
JOIN Sales.Customer AS c ON c.CustomerId = o.CustomerId;

CONCAT treats NULL as an empty string and implicitly converts non-string arguments. Much safer.

CONCAT_WS — With Separator — adds a delimiter between every non-NULL argument:

SELECT CONCAT_WS(', ',
                 c.Name,
                 c.AddressLine1,
                 c.AddressLine2,    -- NULL rows silently skipped
                 c.PostalCode,
                 c.City,
                 c.Country) AS mailing_label
FROM Sales.Customer AS c;

No trailing commas, no double delimiters when a field is NULL. Address labels, CSV rows, breadcrumb strings — CONCAT_WS is built for exactly these.

STRING_AGG: aggregate rows into a delimited string

The other half of that CSV superpower. Combine many rows of one column into a single string:

-- List all products in each order, comma-separated
SELECT o.OrderId,
       STRING_AGG(p.Name, ', ') WITHIN GROUP (ORDER BY p.Name) AS products
FROM Sales.OrderLine AS ol
JOIN Sales.Orders    AS o ON o.OrderId = ol.OrderId
JOIN Catalog.Product AS p ON p.ProductId = ol.ProductId
GROUP BY o.OrderId
ORDER BY o.OrderId;

STRING_AGG(col, separator) WITHIN GROUP (ORDER BY col) — concatenate all non-NULL values within each group, sorted the way you want.

Before STRING_AGG (added in SQL Server 2017), this required the infamous FOR XML PATH trick — about 15 lines of unreadable code. STRING_AGG makes it one line. If you see FOR XML PATH('') in old queries, you’re looking at a pre-2017 workaround that deserves to be refactored.

STRING_SPLIT: turn a delimited string into rows

The opposite: split a CSV string into individual rows.

-- Simple split
SELECT value FROM STRING_SPLIT('NL,IT,DE,FR,ES', ',');
-- Returns 5 rows.

-- With ordinal (SQL Server 2022+, requires enable_ordinal)
SELECT value, ordinal
FROM STRING_SPLIT('apple,banana,cherry', ',', 1);
-- Returns rows with their position

Useful for:

Breaking tag lists into individual tags.
Parsing CSV imports.
Accepting a “list” parameter in a stored procedure.

Classic parameter pattern:

CREATE PROCEDURE Sales.GetOrdersByCountries
    @CountryCsv NVARCHAR(500)
AS
BEGIN
    SELECT o.OrderId, o.Total, o.CountryCode
    FROM Sales.Orders AS o
    WHERE o.CountryCode IN (SELECT value FROM STRING_SPLIT(@CountryCsv, ','));
END;

-- Called from the app:
EXEC Sales.GetOrdersByCountries @CountryCsv = 'NL,IT,DE';

A simple way to pass a list of values as a single parameter. Not the only way — table-valued parameters are more efficient for large lists — but it’s clean for up to a few dozen values.

TRIM and friends

SELECT LTRIM(RTRIM('   hello world   '));  -- 'hello world', old style
SELECT TRIM('   hello world   ');          -- 'hello world', since 2017
SELECT TRIM(',' FROM ',,,trimmed,,,');     -- 'trimmed', any char (2022+)

TRIM defaults to whitespace. You can also specify a set of characters to trim:

SELECT TRIM(' ,.;' FROM '  , .Mr. Bloch;  ');
-- 'Mr. Bloch'

Useful for cleaning up imported data that has inconsistent padding, stray commas, or BOM characters.

Pattern matching: LIKE, PATINDEX, CHARINDEX

`LIKE`

-- Emails ending in @gmail.com
WHERE Email LIKE '%@gmail.com';

-- Names starting with 'Anne'
WHERE Name LIKE 'Anne%';

-- Exactly 5 characters
WHERE Code LIKE '_____';

-- Character classes (single char)
WHERE Code LIKE '[A-Z]%';   -- starts with uppercase letter
WHERE Code LIKE '[^0-9]%';  -- starts with a non-digit

Wildcards: % matches any string (including empty), _ matches exactly one char, [abc] matches one of the listed chars, [^abc] matches one char not in the set.

SARGability reminder (lesson 7): LIKE 'prefix%' is SARGable (can seek an index). LIKE '%suffix' and LIKE '%middle%' are not — they force a scan. If you need full-text search, use SQL Server’s Full-Text Search feature.

`PATINDEX` and `CHARINDEX`

Find position of a pattern or substring:

SELECT CHARINDEX('@', Email) AS at_position FROM Sales.Customer;
-- Position of '@' in the email.

SELECT PATINDEX('%[0-9]%', 'order 123') AS first_digit_pos;
-- 7. Pattern match like LIKE, returns position.

Returns 0 if not found. Useful for splitting strings manually when STRING_SPLIT isn’t enough.

Common string operations

-- Convert case
SELECT UPPER(Name), LOWER(Email) FROM Sales.Customer;

-- Substring
SELECT SUBSTRING(Email, 1, CHARINDEX('@', Email) - 1) AS localpart,
       SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000) AS domain
FROM Sales.Customer;

-- Replace
SELECT REPLACE('Apples, oranges, bananas', ',', ';') AS csv_semi;

-- Length
SELECT LEN(Name) AS length_char,      -- character count (trims trailing spaces)
       DATALENGTH(Name) AS length_bytes  -- byte count (2x for NVARCHAR)
FROM Sales.Customer;

-- Reverse (rarely useful, occasionally handy)
SELECT REVERSE('goblin');  -- 'nilbog'

-- Padding
SELECT RIGHT(REPLICATE('0', 10) + CAST(OrderId AS NVARCHAR(10)), 10) AS padded_id
FROM Sales.Orders;
-- Zero-pad to 10 chars.

-- Or FORMAT, slower but cleaner
SELECT FORMAT(OrderId, '0000000000') AS padded_id
FROM Sales.Orders;

LEN vs DATALENGTH matters: LEN trims trailing spaces and returns character count; DATALENGTH returns bytes (so 2× for Unicode). If you care about storage or you’re working with VARBINARY, use DATALENGTH. Otherwise LEN.

FORMAT is convenient but known to be slow — it calls into .NET’s System.Globalization framework on every row. For dashboards and light queries it’s fine; for hot queries, use CONVERT + explicit patterns.

Regular expressions (the bad news)

SQL Server doesn’t have native regex. LIKE is limited pattern matching, not regex.

Your options:

Simulate common patterns with LIKE. Works for 80% of cases.
Write a CLR function — possible, rare, adds deployment complexity.
SQL Server 2025+ has native REGEXP_LIKE, REGEXP_REPLACE, REGEXP_INSTR, REGEXP_SUBSTR. If you’re on 2025, celebrate and use them.
Parse in the application — for truly complex regex needs, the application code is often the right place.

-- SQL Server 2025+
SELECT Email FROM Sales.Customer WHERE REGEXP_LIKE(Email, '^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$');

If you inherit a database with user-defined CLR functions called RegexMatch, RegexReplace, etc. — that was the pre-2025 workaround. When you migrate to 2025, replace them with the native functions.

Collation and case sensitivity in strings

Remember collation from lesson 4: it affects sorting and equality comparisons for strings.

-- Case-sensitive check on a case-insensitive column
SELECT Email FROM Sales.Customer
WHERE Email = 'Anne@Example.COM' COLLATE Latin1_General_CS_AS;
-- This will match 'Anne@Example.COM' exactly, not 'anne@example.com'.

COLLATE <name> overrides the collation for a single comparison. Useful for forcing case-sensitivity in an otherwise case-insensitive database.

Useful-but-overused: `ISJSON` and JSON extraction

T-SQL since 2016 has ISJSON, JSON_VALUE, JSON_QUERY, OPENJSON for parsing JSON stored in NVARCHAR columns. SQL Server 2025 adds a native JSON type.

-- Product attributes stored as JSON
SELECT ProductId,
       JSON_VALUE(Attributes, '$.color')        AS color,
       JSON_VALUE(Attributes, '$.material')     AS material,
       JSON_VALUE(Attributes, '$.dimensions.cm') AS dim_cm
FROM Catalog.Product
WHERE JSON_VALUE(Attributes, '$.color') = 'black';

Useful when your schema has flexible attributes. Warning: the JSON_VALUE call isn’t SARGable on that column unless you create a computed column + index. For filter-heavy queries, extract into a real column.

Run this on your own machine

USE Runehold;
GO

-- 1. CONCAT and CONCAT_WS
SELECT CustomerId,
       CONCAT_WS(' - ', Name, Country, Email) AS label
FROM Sales.Customer;

-- 2. STRING_AGG: product list per order
-- (needs OrderLine and Product tables; here's a lightweight version)
SELECT CustomerId,
       STRING_AGG(CAST(OrderId AS NVARCHAR(20)), ', ')
           WITHIN GROUP (ORDER BY OrderId) AS orders_csv,
       COUNT(*) AS order_count
FROM Sales.Orders
GROUP BY CustomerId
ORDER BY order_count DESC;

-- 3. STRING_SPLIT: filter by a CSV parameter
DECLARE @countries NVARCHAR(100) = 'NL,IT,DE';

SELECT OrderId, CountryCode, Total
FROM Sales.Orders
WHERE CountryCode IN (SELECT value FROM STRING_SPLIT(@countries, ','))
ORDER BY CountryCode;

-- 4. Extract email parts with SUBSTRING + CHARINDEX
SELECT Email,
       SUBSTRING(Email, 1, CHARINDEX('@', Email) - 1) AS local_part,
       SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000) AS domain
FROM Sales.Customer
WHERE CHARINDEX('@', Email) > 0;

-- 5. Domain-frequency report
SELECT SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000) AS domain,
       COUNT(*) AS customers
FROM Sales.Customer
WHERE CHARINDEX('@', Email) > 0
GROUP BY SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000)
ORDER BY customers DESC;

-- 6. TRIM with a set of characters
SELECT TRIM(' ,.;' FROM '  ;Piotr Nowak,  ') AS cleaned;

Run each. Look at the outputs. Notice how much shorter and cleaner these are than before the modern functions existed.

Next lesson: dates and times. The timezone swamp. Why DATETIMEOFFSET exists, what AT TIME ZONE does, and the bugs that come from storing local time naively.

Concatenation: CONCAT and CONCAT_WS