DuckDB URLPattern Extension¶
A DuckDB extension that implements the WHATWG URLPattern API for powerful URL matching and extraction directly in SQL.
Overview¶
The URLPattern API provides a web-standard way to match URLs against patterns, similar to how routing works in web frameworks. This extension brings that capability to DuckDB, enabling you to:
- Match URLs against patterns with wildcards and named groups
- Extract components from URLs using named capture groups
- Inspect patterns to retrieve their individual components
Quick Start¶
-- Load the extension
LOAD urlpattern;
-- Test if a URL matches a pattern
SELECT urlpattern_test(
'https://example.com/users/:id',
'https://example.com/users/123'
);
-- Returns: true
-- Extract a named group from a URL
SELECT urlpattern_extract(
'https://example.com/users/:id',
'https://example.com/users/123',
'id'
);
-- Returns: '123'
Features¶
| Feature | Description |
|---|---|
| URLPATTERN Type | Custom type for storing and validating URL patterns |
| Pattern Matching | Test URLs against patterns with urlpattern_test() |
| Full Execution | Get complete match results with urlpattern_exec() |
| Group Extraction | Extract named groups with urlpattern_extract() |
| Component Accessors | Get pattern components like pathname, hostname, etc. |
| WHATWG Compliant | Follows the official URLPattern specification |
| High Performance | RE2 regex engine with pattern caching (~320k matches/sec) |
Use Cases¶
API Route Analysis¶
Analyze API access logs to understand usage patterns:
SELECT
urlpattern_extract(
'https://api.example.com/:version/:resource/:id',
request_url,
'resource'
) as resource,
COUNT(*) as requests
FROM access_logs
WHERE urlpattern_test('https://api.example.com/:version/:resource/:id', request_url)
GROUP BY resource
ORDER BY requests DESC;
URL Classification¶
Classify URLs into categories based on their structure:
SELECT
url,
CASE
WHEN urlpattern_test('https://example.com/blog/*', url) THEN 'blog'
WHEN urlpattern_test('https://example.com/products/:id', url) THEN 'product'
WHEN urlpattern_test('https://example.com/users/:id/*', url) THEN 'user'
ELSE 'other'
END as category
FROM urls;
Multi-tenant URL Parsing¶
Extract tenant information from subdomain patterns:
SELECT
urlpattern_extract(
'https://:tenant.example.com/*',
url,
'tenant'
) as tenant_id,
url
FROM requests;
Storing Patterns in Tables¶
Use the URLPATTERN type to store validated patterns:
CREATE TABLE routes (
name VARCHAR,
pattern URLPATTERN
);
INSERT INTO routes VALUES
('users', 'https://api.example.com/users/:id'),
('posts', 'https://api.example.com/posts/:slug');
-- Find matching route for a URL
SELECT name
FROM routes
WHERE urlpattern_test(pattern, 'https://api.example.com/users/123');
Next Steps¶
- Installation Guide - How to install the extension
- Pattern Syntax - Learn the URLPattern syntax
- API Reference - Complete function documentation
- Examples - More usage examples