Lua bindings to hyperscan, high-performance regular expression matching library
You need hyperscan and luarocks installed.
$ luarocks make
You can provide HS_DIR
if you have installed hyperscan
to unusual place.
$ luarocks make HS_DIR=/usr/local
This rock has two modules:
luahs
has all functions including compilation of patternsluahs_runtime
has all functions exceptcompile
,expressionInfo
andcurrentPlatform
.
luahs
is linked against libhs
and
luahs_runtime
is linked against libhs_runtime
.
Require luahs
module from Lua:
luahs = require 'luahs`
You can find unit tests in directory spec
.
All constants used by hyperscan are available in sub-tables of luahs
:
luahs.errors
-- error codesluahs.compile_mode
-- compilation mode (block, stream, etc)luahs.pattern_flags
-- pattern flags (case-insensitive, etc)luahs.extended_parameters
-- extended parameters of patternluahs.cpu_features
-- CPU feature support flagsluahs.cpu_tuning
-- CPU tuning flags
Example:
> print(luahs.errors.HS_SUCCESS)
0
Compilation is done with function luahs.compile
.
It takes a regular expression (or several regular expressions)
and parameters of compilation and returns a database.
db = luahs.compile {
expression = 'aaa',
mode = luahs.compile_mode.HS_MODE_BLOCK,
}
Provide pattern flags:
db = luahs.compile {
expression = 'aaa',
mode = luahs.compile_mode.HS_MODE_BLOCK,
flags = luahs.pattern_flags.HS_FLAG_CASELESS,
}
Provide multiple flags:
db = luahs.compile {
expression = 'aaa',
mode = luahs.compile_mode.HS_MODE_BLOCK,
flags = {
luahs.pattern_flags.HS_FLAG_CASELESS,
luahs.pattern_flags.HS_FLAG_DOTALL,
},
}
mode
can also be a list in case of Start-Of-Match (SOM):
db = luahs.compile {
expression = 'aaa',
mode = {
luahs.compile_mode.HS_MODE_STREAM,
luahs.compile_mode.HS_MODE_SOM_HORIZON_LARGE,
},
flags = HS_FLAG_SOM_LEFTMOST,
}
Compile multiple patterns:
db = luahs.compile {
expressions = {
'aaa',
'bbb',
},
mode = luahs.compile_mode.HS_MODE_BLOCK,
}
If you compile multiple patterns and you need provide flags, identifiers or extended parameters of a pattern, you should provide a table with the following fields as a pattern:
- (required)
expression
- pattern itself - (optional)
flags
- flags, integer of list of integers - (optional)
id
- identifier of a pattern, defaults to 0 - (optional)
min_offset
- the minimum end offset in the data stream at which this expression should match successfully - (optional)
max_offset
- the maximum end offset in the data stream at which this expression should match successfully - (optional)
min_length
- minimum match length (from start to end) required to successfully match this expression
Example:
db = luahs.compile {
expressions = {
{
expression = 'aaa',
id = 1,
flags = luahs.pattern_flags.HS_FLAG_CASELESS,
min_offset = 100,
max_offset = 140,
},
{
expression = 'b.{1,20}b.{1,20}b',
id = 2,
flags = {
luahs.pattern_flags.HS_FLAG_CASELESS,
luahs.pattern_flags.HS_FLAG_DOTALL,
},
min_offset = 200,
max_offset = 800,
min_length = 20,
},
},
mode = luahs.compile_mode.HS_MODE_BLOCK,
}
You can provide a platform on which database runs:
db = luahs.compile {
expression = 'aaa',
mode = luahs.compile_mode.HS_MODE_BLOCK,
platform = {
tune = luahs.cpu_tuning.HS_TUNE_FAMILY_GENERIC,
}
}
platform
table has the following fields, all are optional:
cpu_features
- CPU feature support flagstune
- CPU tuning flags
Value can be an integer or a list of integers.
Function luahs.currentPlatform()
returns such a table for
current platform.
Function luahs.expressionInfo
returns information about
the expression instead of database:
> info = luahs.expressionInfo('a?a?a?b')
> print(info.min_width)
1
> print(info.max_width)
4
Optionally, pattern flags can be provided as an integer or as a table:
info = luahs.expressionInfo(
'a?a?a?',
luahs.pattern_flags.HS_FLAG_ALLOWEMPTY
)
info = luahs.expressionInfo(
'a?a?a?',
{
luahs.pattern_flags.HS_FLAG_ALLOWEMPTY,
luahs.pattern_flags.HS_FLAG_CASELESS,
}
)
See fields of table info
.
To scan a text against a database, you need a scratch object.
It can be created using method db:makeScratch()
:
db = luahs.compile {
expression = 'aaa',
mode = luahs.compile_mode.HS_MODE_BLOCK,
}
scratch = db:makeScratch()
Then you can scan a text:
hits = db:scan('aaa')
-- hits is {{id=0, from=0, to=3}}
Method scan
returns hits. Each hit is a table with
the following fields:
id
- identifier of a patternfrom
- start of a hit, 0-based. If Start-Of-Match flag is not set,from
is always equal to 0.to
- end of a hit, 0-based index of a first byte after a hit.
If a database is compiled in vectored mode
(mode
= luahs.compile_mode.HS_MODE_VECTORED
), you have to pass
a table of strings to scan
method:
hits = db:scan({'a', 'aa'})
To scan in stream mode, you need a stream object:
db = luahs.compile {
expression = 'abc',
mode = luahs.compile_mode.HS_MODE_STREAM,
}
scratch = db:makeScratch()
stream = db:makeStream()
Apply method scan
to the stream object:
hits1 = stream:scan('a', scratch) -- hits1 is {}
hits2 = stream:scan('b', scratch) -- hits2 is {}
hits3 = stream:scan('c', scratch) -- hits3 is {{id=0, from=0, to=3}}
You have to close a stream using method close
:
hits = stream:close(scratch)
Method close
also can return some hits.
You can reset a stream object:
hits = stream:reset(scratch)
A call to reset
has the same effect to a call to close
followed by
creating a new stream for the same database.
Stream objects of the same database can be assigned to each other:
db = luahs.compile {
expression = 'aaa$',
mode = luahs.compile_mode.HS_MODE_STREAM,
}
scratch = db:makeScratch()
stream1 = db:makeStream()
stream2 = db:makeStream()
stream1:scan('a', scratch) -- returns {}
stream1:scan('a', scratch) -- returns {}
stream1:scan('a', scratch) -- returns {}
stream2:assign(stream1, scratch) -- returns {}
-- stream2 := stream1;
stream2:reset(scratch) -- returns {{id=0, from=0, to=3}}
You can also clone a stream: clone = stream:clone()
.
You can get a database back from a stream using
method stream:database()
.
TODO
TODO