This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
1lann's profile picture

Patterns (Regex in Lua) Tutorial

Started by 1lann, 20 June 2013 - 10:19 PM
1lann #1
Posted 21 June 2013 - 12:19 AM
Foreward:
Alright, I know some of you might think Patterns (Regex) is hard and confusing. Although it may look that way, but today, I decided to look into Patterns since for the entire time in the past I had been using string.find and then string.sub to interpret stuff from like XML. And I have noticed that it's not only me, but others too who do this kind of stuff and I was surprised to see how easy Patterns actually were and how there was no Patterns tutorial I could find on the CCForums (Although I was really sure there was)! So here it is. A patterns tutorial written by me.

Introduction and an example of when to use patterns!:
Right, now some of you are probably wondering what are patterns are. Patterns are like these key holes. And text are like keys. Many many keys. So what happens is you specify the key hole, then lua scans the text and tries to separate the text and fit it in the keyhole, and when it fits, it returns the text you want! I can't really explain it so well, so here's an example:
In this example I'll be showing what I and others used to do and how it can be much more simplified using Patterns. So for this example I'll be using a dummy piece of weather data XML:

<city>London</city>
<temperature current="20" low="19" high="25"/>
So here we have it. So from this, we want the name of the city, the current temperature, the lowest temperature and highest. So in this example, I'll be doing it like many of you would be currently, using string.find and string.sub.

local data = (insert xml here)
local _, pos1 = data:find("<city>")
local pos2 = data:find("</city>")
local city = data:sub(pos1+1, pos2-1)
_, pos1 = data:find([[<temperature current%="]])
-- Note that I added a % infront of the = because for some reason it doesn't work without it sometimes o_O
-- I will also add it infront of a " if it's the first character to escape it'
local pos2, pos3 = data:find([[%" low%="]])
local current = data:sub(pos1+1, pos2-1)
local pos4, pos1 = data:find([[%" high%="]])
local low = data:sub(pos3+1, pos4-1)
pos2 = data:find([[%"/>]])
local high = data:sub(pos1+1, pos2-1)
*phew* that was a lot of work wasn't it? Wouldn't there be a much easier way of doing this? Well there is, and that's where patterns come in! Here's an example of the exact same thing but with patterns!

local city = data:match("<city>(.+)</city>")
local current, low, high = data:match([[<temperature current%="(%d+)" low%="(%d+)" high%="(%d+)"/>]])
(thanks to kingdaro for that fix)

Wow, wasn't that a LOT shorter? And not is it only shorter, but it's also safer, since with this method lua can't get tricked if there's a chunk of code that is the same as the :find term above somewhere else in the code compared to the previous way of doing it with find and sub! So this is short, easy, saves time and prevents any confusion with the code!

How do you use it?
Well, here's a step by step explanation on how it works. In patterns, the search "terms" can be found here: http://www.lua.org/m...nual.html#6.4.1. To get the city, all I did was to search for .+ inbetween <city> and </city>. the . represents any character and the + means it continues on. So it returns all of the chracters until it hits </city>. But wait, what are the () for? Well the () are used to specify "I only want to return this section". Without the ()s it would return
<city>London</city>
see? With the rest of the code, I basically did the same thing! As you can see, with the usage of finding current, low and high, the ()s are only around the things I want, and string.match with patterns can return multiple arguments, which makes it extremely handy! And by using the entire line as a template, I can be assured that I only get what I want and not some other conflicting thing.

Note that ^$()%.[]*+-? are special characters and should be escaped with a % infront of them, if those characters are part of the string and match! Also sometimes other non alphanumeric characters have these problems. I don't know why.

Don't worry! Patterns won't bite. This is just a piece of what Patterns are. See more at http://lua-users.org...atternsTutorial and http://www.lua.org/m...nual.html#6.4.1. Note that this can be used within many string functions, like string.find, string.sub, string.match and string.gmatch,

Alright, I probably suck at explaining and this tutorial isn't completely finished. I'll add more stuff in. Please suggest examples/topics to cover about Patterns in this tutorial. Thanks! :D/>
MudkipTheEpic #2
Posted 21 June 2013 - 12:38 AM
This is a great patterns tutorial. Patterns are often used without knowing anything about them, AKA the split function. It's always good to know something before you just slap it in your code. It makes debugging MUCH easier when you know what your code does.

But it could use a more widely used example, like command forming (getting the parts of a command, or any string).

Edit: You may want to also explain magic characters and how to escape them.
Edited on 20 June 2013 - 10:39 PM
Kingdaro #3
Posted 21 June 2013 - 12:41 AM

local current = data:match([[<temperature current%="(%d+)" low%="%d+" high%="%d+">]])
local low = data:match([[<temperature current%="%d+" low%="(%d+)" high%="%d+">]])
local high = data:match([[<temperature current%="%d+" low%="%d+" high%="(%d+)">]])

This is painfully unnecessary.


local current, low, high = data:match([[<temperature current="(%d+)" low="(%d+)" high="(%d+)">]])

That, and the equal signs don't need escaping, as they aren't magic characters.

But yeah, great idea and a well-written tutorial.
1lann #4
Posted 21 June 2013 - 01:13 AM

local current = data:match([[<temperature current%="(%d+)" low%="%d+" high%="%d+">]])
local low = data:match([[<temperature current%="%d+" low%="(%d+)" high%="%d+">]])
local high = data:match([[<temperature current%="%d+" low%="%d+" high%="(%d+)">]])

This is painfully unnecessary.


local current, low, high = data:match([[<temperature current="(%d+)" low="(%d+)" high="(%d+)">]])

That, and the equal signs don't need escaping, as they aren't magic characters.

But yeah, great idea and a well-written tutorial.
Oh you can do it like that? Oh ok then, thanks! Yeah I know they're not magic characters, but for some reason without the % they tend to mess up when using string.find. Idk why.
ElvishJerricco #5
Posted 21 June 2013 - 10:41 AM
I love patterns. They're so freakin useful. But they're not Regex, just fyi. They're similar to Regex but much more like a lite version of it. Anyway yea Lua's string library is my absolute favorite because of its pattern matching functions.


local sX = str:match("%d+.?%d*")
local x = tonumber(sX)

Number parser in two lines =P
H4X0RZ #6
Posted 21 June 2013 - 12:09 PM
Thx for this awesome tutorial!
Engineer #7
Posted 21 June 2013 - 06:24 PM
I also really like this:
http://www.gammon.com.au/scripts/doc.php?lua=string.find