This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
rich73's profile picture

Extracting data between tags from a string - Help!

Started by rich73, 16 August 2012 - 06:14 AM
rich73 #1
Posted 16 August 2012 - 08:14 AM
I have written a program which retrieves information using the HTTP API and makes it into a long string. I need a way of parsing the string so that every substring which lies between <title>…..</title> is saved into a table. I believe the answer is to be found in using string.gmatch or string.gsub but i can't quite figure it out. Any help would be much appreciated!
KaoS #2
Posted 16 August 2012 - 08:18 AM
I have never used those cmds but I am a huge believer in serialized tables, take a look at the documentation, I think it works better than making a long string and splitting it, I admit that I really should learn those cmds if I ever plan on making an OS so that it can separate entered cmds from their parameters and each step in a path etc but maybe I will get to that
Ponder #3
Posted 16 August 2012 - 04:23 PM
To extract text from tags one can do something like this:

tag = "<title>foobar</title>" -- your incoming document
pattern = "<title>(.*)</title>" -- a regular expression, it is a pattern, which matches string which fulfill this pattern
str = string.match (tag, pattern) -- str now holds "foobar"

As you can see tag and pattern are similar to one another, and the bit which is different "(.*)" actually only says match any character (.) unlimited times (*) and return them (the parenthesis).
If you want learn about it more, read this.
rich73 #4
Posted 17 August 2012 - 12:30 AM
Thank you! I had researched it a bit more since I posted the question and found these commands and the like in the official lua documentation but your explanation is very clear and easy to follow. I didn't fully understand how the capture parenthesis worked and how the pattern was put together but this makes a lot of sense to me now, thanks once again. (And the wiki link is great too!) :(/>/>