Posted 02 August 2018 - 09:57 PM
I was wondering how I would go about building a regex engine
In computercraft, this was the file containing the regex engine. To me, it looks like a million lines of gibberish.
So far, this is the code I had for my version
I have multiple reasons to write my own regex engine. For example, I need to compare tables of strings rather than single letters. And I need to sneak in additional logic.
What I really can't wrap my head around is repeating characters/symbols, because there is the chance that the "+" token is used multiple times
In addition, It would be nice to do this without a recursive function, maybe a tree or something
Actually, I'm clueless on how to build a regex engine…
In computercraft, this was the file containing the regex engine. To me, it looks like a million lines of gibberish.
So far, this is the code I had for my version
--rules
--"?": optional
--"+": repeating
--"*": interuppting
local function copy(t)
local rtn = {}
for k, v in pairs(t) do rtn[k] = v end
return rtn
end
local function find(str, char)
for i=1, #str do
if string.sub(str, 1, 1) == char then
return true
end
end
return false
end
local function resolveRuleData(token)
token = copy(token)
local function resolve(c)
return not not find(token[1], c)
end
local dat = {
optional = resolve("?"),
repeating = resolve("+"),
interuppting = resolve("*"),
multi = #token > 2
}
table.remove(token, 1)
return dat, token
end
local match
match = function(rule, token)
rule = copy(rule)
token = copy(token)
local dat = resolveRuleData(rule, 1)
if dat.optional then
rule[1] = string.gsub(rule[1], "?", "")
local s, nrule, ntoke = match(rule, token)
if s then
rule, token = nrule, ntoken
end
return true, rule, token
elseif dat.interuppting then
else
end
end
It already looks like gibberishI have multiple reasons to write my own regex engine. For example, I need to compare tables of strings rather than single letters. And I need to sneak in additional logic.
What I really can't wrap my head around is repeating characters/symbols, because there is the chance that the "+" token is used multiple times
In addition, It would be nice to do this without a recursive function, maybe a tree or something
Actually, I'm clueless on how to build a regex engine…
Edited on 02 August 2018 - 08:06 PM