SQL Server, from the ground up Lesson 14 / 40

String functions: STRING_AGG, STRING_SPLIT, TRIM

Modern T-SQL string handling. Concatenation, splitting, pattern matching, trimming, plus the death of CLR functions for string work.

String handling in SQL Server went from “write your own CLR function” to “there’s a built-in for that” somewhere between 2016 and 2022. Today’s lesson is the current state of the art: the functions you’ll actually use in modern T-SQL for names, addresses, SKU codes, CSV imports, and the forty other places strings show up in Runehold’s data.

Concatenation: CONCAT and CONCAT_WS

The + operator concatenates strings, but has two gotchas:

  1. NULL + anything is NULL. One null value zeros the whole result.
  2. Mixing types needs explicit casts: 'Order #' + CAST(OrderId AS NVARCHAR(20)).

CONCAT() fixes both:

-- NULL-safe, auto-casts
SELECT CONCAT('Order #', o.OrderId, ' - €', o.Total, ' from ', c.Name) AS description
FROM Sales.Orders AS o
JOIN Sales.Customer AS c ON c.CustomerId = o.CustomerId;

CONCAT treats NULL as an empty string and implicitly converts non-string arguments. Much safer.

CONCAT_WSWith Separator — adds a delimiter between every non-NULL argument:

SELECT CONCAT_WS(', ',
                 c.Name,
                 c.AddressLine1,
                 c.AddressLine2,    -- NULL rows silently skipped
                 c.PostalCode,
                 c.City,
                 c.Country) AS mailing_label
FROM Sales.Customer AS c;

No trailing commas, no double delimiters when a field is NULL. Address labels, CSV rows, breadcrumb strings — CONCAT_WS is built for exactly these.

STRING_AGG: aggregate rows into a delimited string

The other half of that CSV superpower. Combine many rows of one column into a single string:

-- List all products in each order, comma-separated
SELECT o.OrderId,
       STRING_AGG(p.Name, ', ') WITHIN GROUP (ORDER BY p.Name) AS products
FROM Sales.OrderLine AS ol
JOIN Sales.Orders    AS o ON o.OrderId = ol.OrderId
JOIN Catalog.Product AS p ON p.ProductId = ol.ProductId
GROUP BY o.OrderId
ORDER BY o.OrderId;

STRING_AGG(col, separator) WITHIN GROUP (ORDER BY col) — concatenate all non-NULL values within each group, sorted the way you want.

Before STRING_AGG (added in SQL Server 2017), this required the infamous FOR XML PATH trick — about 15 lines of unreadable code. STRING_AGG makes it one line. If you see FOR XML PATH('') in old queries, you’re looking at a pre-2017 workaround that deserves to be refactored.

STRING_SPLIT: turn a delimited string into rows

The opposite: split a CSV string into individual rows.

-- Simple split
SELECT value FROM STRING_SPLIT('NL,IT,DE,FR,ES', ',');
-- Returns 5 rows.

-- With ordinal (SQL Server 2022+, requires enable_ordinal)
SELECT value, ordinal
FROM STRING_SPLIT('apple,banana,cherry', ',', 1);
-- Returns rows with their position

Useful for:

  • Breaking tag lists into individual tags.
  • Parsing CSV imports.
  • Accepting a “list” parameter in a stored procedure.

Classic parameter pattern:

CREATE PROCEDURE Sales.GetOrdersByCountries
    @CountryCsv NVARCHAR(500)
AS
BEGIN
    SELECT o.OrderId, o.Total, o.CountryCode
    FROM Sales.Orders AS o
    WHERE o.CountryCode IN (SELECT value FROM STRING_SPLIT(@CountryCsv, ','));
END;

-- Called from the app:
EXEC Sales.GetOrdersByCountries @CountryCsv = 'NL,IT,DE';

A simple way to pass a list of values as a single parameter. Not the only way — table-valued parameters are more efficient for large lists — but it’s clean for up to a few dozen values.

TRIM and friends

SELECT LTRIM(RTRIM('   hello world   '));  -- 'hello world', old style
SELECT TRIM('   hello world   ');          -- 'hello world', since 2017
SELECT TRIM(',' FROM ',,,trimmed,,,');     -- 'trimmed', any char (2022+)

TRIM defaults to whitespace. You can also specify a set of characters to trim:

SELECT TRIM(' ,.;' FROM '  , .Mr. Bloch;  ');
-- 'Mr. Bloch'

Useful for cleaning up imported data that has inconsistent padding, stray commas, or BOM characters.

Pattern matching: LIKE, PATINDEX, CHARINDEX

LIKE

-- Emails ending in @gmail.com
WHERE Email LIKE '%@gmail.com';

-- Names starting with 'Anne'
WHERE Name LIKE 'Anne%';

-- Exactly 5 characters
WHERE Code LIKE '_____';

-- Character classes (single char)
WHERE Code LIKE '[A-Z]%';   -- starts with uppercase letter
WHERE Code LIKE '[^0-9]%';  -- starts with a non-digit

Wildcards: % matches any string (including empty), _ matches exactly one char, [abc] matches one of the listed chars, [^abc] matches one char not in the set.

SARGability reminder (lesson 7): LIKE 'prefix%' is SARGable (can seek an index). LIKE '%suffix' and LIKE '%middle%' are not — they force a scan. If you need full-text search, use SQL Server’s Full-Text Search feature.

PATINDEX and CHARINDEX

Find position of a pattern or substring:

SELECT CHARINDEX('@', Email) AS at_position FROM Sales.Customer;
-- Position of '@' in the email.

SELECT PATINDEX('%[0-9]%', 'order 123') AS first_digit_pos;
-- 7. Pattern match like LIKE, returns position.

Returns 0 if not found. Useful for splitting strings manually when STRING_SPLIT isn’t enough.

Common string operations

-- Convert case
SELECT UPPER(Name), LOWER(Email) FROM Sales.Customer;

-- Substring
SELECT SUBSTRING(Email, 1, CHARINDEX('@', Email) - 1) AS localpart,
       SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000) AS domain
FROM Sales.Customer;

-- Replace
SELECT REPLACE('Apples, oranges, bananas', ',', ';') AS csv_semi;

-- Length
SELECT LEN(Name) AS length_char,      -- character count (trims trailing spaces)
       DATALENGTH(Name) AS length_bytes  -- byte count (2x for NVARCHAR)
FROM Sales.Customer;

-- Reverse (rarely useful, occasionally handy)
SELECT REVERSE('goblin');  -- 'nilbog'

-- Padding
SELECT RIGHT(REPLICATE('0', 10) + CAST(OrderId AS NVARCHAR(10)), 10) AS padded_id
FROM Sales.Orders;
-- Zero-pad to 10 chars.

-- Or FORMAT, slower but cleaner
SELECT FORMAT(OrderId, '0000000000') AS padded_id
FROM Sales.Orders;

LEN vs DATALENGTH matters: LEN trims trailing spaces and returns character count; DATALENGTH returns bytes (so 2× for Unicode). If you care about storage or you’re working with VARBINARY, use DATALENGTH. Otherwise LEN.

FORMAT is convenient but known to be slow — it calls into .NET’s System.Globalization framework on every row. For dashboards and light queries it’s fine; for hot queries, use CONVERT + explicit patterns.

Regular expressions (the bad news)

SQL Server doesn’t have native regex. LIKE is limited pattern matching, not regex.

Your options:

  1. Simulate common patterns with LIKE. Works for 80% of cases.
  2. Write a CLR function — possible, rare, adds deployment complexity.
  3. SQL Server 2025+ has native REGEXP_LIKE, REGEXP_REPLACE, REGEXP_INSTR, REGEXP_SUBSTR. If you’re on 2025, celebrate and use them.
  4. Parse in the application — for truly complex regex needs, the application code is often the right place.
-- SQL Server 2025+
SELECT Email FROM Sales.Customer WHERE REGEXP_LIKE(Email, '^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$');

If you inherit a database with user-defined CLR functions called RegexMatch, RegexReplace, etc. — that was the pre-2025 workaround. When you migrate to 2025, replace them with the native functions.

Collation and case sensitivity in strings

Remember collation from lesson 4: it affects sorting and equality comparisons for strings.

-- Case-sensitive check on a case-insensitive column
SELECT Email FROM Sales.Customer
WHERE Email = 'Anne@Example.COM' COLLATE Latin1_General_CS_AS;
-- This will match 'Anne@Example.COM' exactly, not 'anne@example.com'.

COLLATE <name> overrides the collation for a single comparison. Useful for forcing case-sensitivity in an otherwise case-insensitive database.

Useful-but-overused: ISJSON and JSON extraction

T-SQL since 2016 has ISJSON, JSON_VALUE, JSON_QUERY, OPENJSON for parsing JSON stored in NVARCHAR columns. SQL Server 2025 adds a native JSON type.

-- Product attributes stored as JSON
SELECT ProductId,
       JSON_VALUE(Attributes, '$.color')        AS color,
       JSON_VALUE(Attributes, '$.material')     AS material,
       JSON_VALUE(Attributes, '$.dimensions.cm') AS dim_cm
FROM Catalog.Product
WHERE JSON_VALUE(Attributes, '$.color') = 'black';

Useful when your schema has flexible attributes. Warning: the JSON_VALUE call isn’t SARGable on that column unless you create a computed column + index. For filter-heavy queries, extract into a real column.

Run this on your own machine

USE Runehold;
GO

-- 1. CONCAT and CONCAT_WS
SELECT CustomerId,
       CONCAT_WS(' - ', Name, Country, Email) AS label
FROM Sales.Customer;

-- 2. STRING_AGG: product list per order
-- (needs OrderLine and Product tables; here's a lightweight version)
SELECT CustomerId,
       STRING_AGG(CAST(OrderId AS NVARCHAR(20)), ', ')
           WITHIN GROUP (ORDER BY OrderId) AS orders_csv,
       COUNT(*) AS order_count
FROM Sales.Orders
GROUP BY CustomerId
ORDER BY order_count DESC;

-- 3. STRING_SPLIT: filter by a CSV parameter
DECLARE @countries NVARCHAR(100) = 'NL,IT,DE';

SELECT OrderId, CountryCode, Total
FROM Sales.Orders
WHERE CountryCode IN (SELECT value FROM STRING_SPLIT(@countries, ','))
ORDER BY CountryCode;

-- 4. Extract email parts with SUBSTRING + CHARINDEX
SELECT Email,
       SUBSTRING(Email, 1, CHARINDEX('@', Email) - 1) AS local_part,
       SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000) AS domain
FROM Sales.Customer
WHERE CHARINDEX('@', Email) > 0;

-- 5. Domain-frequency report
SELECT SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000) AS domain,
       COUNT(*) AS customers
FROM Sales.Customer
WHERE CHARINDEX('@', Email) > 0
GROUP BY SUBSTRING(Email, CHARINDEX('@', Email) + 1, 4000)
ORDER BY customers DESC;

-- 6. TRIM with a set of characters
SELECT TRIM(' ,.;' FROM '  ;Piotr Nowak,  ') AS cleaned;

Run each. Look at the outputs. Notice how much shorter and cleaner these are than before the modern functions existed.

Next lesson: dates and times. The timezone swamp. Why DATETIMEOFFSET exists, what AT TIME ZONE does, and the bugs that come from storing local time naively.

Search