This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
FuuuAInfiniteLoop(F.A.I.L)'s profile picture

HTML and XML parser

Started by FuuuAInfiniteLoop(F.A.I.L), 29 March 2013 - 02:23 PM
FuuuAInfiniteLoop(F.A.I.L) #1
Posted 29 March 2013 - 03:23 PM
HTML parser
SpoilerI have recoded a html parser that i founded to be compatible with cc-lua

It return a table with the html tree
SpoilerFor example, if the following input is given:

<html><body>
<p>
Click <a href="http://example.com/">here!</a>
<p>
Hello
</p>
</body></html>

Then, the parser produces the following table:

{
_tag = "#document",
_attr = {},
{
_tag = "html",
_attr = {},
{
_tag = "body",
_attr = {},
"\n",
{
_tag = "p",
_attr = {},
"\n Click ",
{
_tag = "a",
_attr = {href = "http://example.com/"}
"here!",
},
"\n",
},
{
_tag = "p",
_attr = {},
"\n Hello\n",
},
"\n",
}
}
}
Usage:

os.loadAPI("html")
a = html.getTable(filename)
–code to use that table

or you can use the function with io.stdin using html.parse(io.stdin) instead of html.getTable or you can parse a single string using html.parsestr(str)

This was modified from https://github.com/v...ree/master/html
the license:
SpoilerCopyright (c) 2007 T. Kobayashi


Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

pastebin: pastebin get hzYsY0wW html

Currently im working in the next version of this that give you the capability to redner all the thing into a GUI but i cant find a way to read the table and obtain all the elements and a gui api that is enought complete to do it
XML parser
Spoilerpastebin: pastebin get k9qGgNne testxml –(if you want the example to work)
example: pastebin get hDEsi2ZE test –(it creates a xml string and parse it)
README
Spoiler
  • Create a local variable local xml = require("xmlSimple.lua").newParser()
  • Read xml using xml:ParseXmlText(xmlString) or xml:loadFile(xmlFilename, base)
Parsing XML


<test one="two"> <three four="five" four="six"/> <three>eight</three> <nine ten="eleven">twelve</nine></test>

You can access values in two ways:

Using the simple method:


xml.test["@one"] == "two"xml.test.nine["@ten"] == "eleven"xml.test.nine:value() == "twelve"xml.test.three[1]["@four"][1] == "five"xml.test.three[1]["@four"][2] == "six"xml.test.three[2]:value() == "eight"

or if your XML is a little bit more complicated you can do it like this:


xml:children()[1]:name() == "test"xml:children()[1]:children()[2]:value() == "eight"xml:properties()[1] == {name = "one", value = "two"}

Limitations

There's no support for namespaces. When I see namespaces I immediately start to remember days when I worked at corporate. We had to use namespaces only because XML was so convoluted we would not be able to handle it without them. In the end XML parsing took longer for some APIs then actual logic of the API. If you're in this situation it is better to step back and do something about it rather than asking for namespace support. I am using this module to read fairly simple XML. Even if it is a large XML string, the structure is still simple, so I was not able to test it properly. Please create a new Issue if you spot a problem. Please take a loook at xmlTest.lua for an example of use.

Final notes

This is a modified version from the modified version of Corona-XML-Module by Jonathan Beebe which in turn is based on Alexander Makeev's Lua-only XML parser found here for working on cc lua
Post suggestions/bugs!
oeed #2
Posted 29 March 2013 - 03:47 PM
This could be useful, I guess. I waiting for someone (possibly myself) to make an HTML renderer.
FuuuAInfiniteLoop(F.A.I.L) #3
Posted 29 March 2013 - 04:00 PM
I added some more information and the download link which i forgetted :wacko:/>
FuuuAInfiniteLoop(F.A.I.L) #4
Posted 29 March 2013 - 04:13 PM
thinking on testing it with the cc forum….
Mads #5
Posted 29 March 2013 - 09:46 PM
Very nice! But I think it's more of an XML parser than HTML parser ;)/>
FuuuAInfiniteLoop(F.A.I.L) #6
Posted 30 March 2013 - 04:11 AM
Very nice! But I think it's more of an XML parser than HTML parser ;)/>
It parses an html file and output a table with all the tags and arguments so i think its an html parser
Mads #7
Posted 30 March 2013 - 06:01 AM
Ah, sorry, didn't actually look at the source, just at the readme. So, naturally, this question rises: why not XML?
remiX #8
Posted 30 March 2013 - 06:29 AM
Ah, sorry, didn't actually look at the source, just at the readme. So, naturally, this question rises: why not XML?

Why not both?

Just tested it on gravityscore's thunderhawk website and it got everything :P/>
FuuuAInfiniteLoop(F.A.I.L) #9
Posted 30 March 2013 - 08:47 AM
Ah, sorry, didn't actually look at the source, just at the readme. So, naturally, this question rises: why not XML?

Why not both?

Just tested it on gravityscore's thunderhawk website and it got everything :P/>

I will work in a xml parser and i will put it here!
FuuuAInfiniteLoop(F.A.I.L) #10
Posted 30 March 2013 - 09:06 AM
Ah, sorry, didn't actually look at the source, just at the readme. So, naturally, this question rises: why not XML?

Why not both?

Just tested it on gravityscore's thunderhawk website and it got everything :P/>

I will work in a xml parser and i will put it here!
Done!
FuuuAInfiniteLoop(F.A.I.L) #11
Posted 21 April 2013 - 03:39 AM
Nobody using it?
Shazz #12
Posted 01 May 2013 - 09:57 PM
You literally copied the programs and not only that, you also copied the descriptions.
FuuuAInfiniteLoop(F.A.I.L) #13
Posted 01 May 2013 - 10:11 PM
You literally copied the programs and not only that, you also copied the descriptions.
First, I dont copy the programs, i modify them to work with computercraft(the license allow it) and i posted the link and said in the FIRST LINE that i have modified it
And Second, The description is also modified, it describes a bit more and the changed functions, so first look and the comment

EDIT: And in the html api i added functions!
theoriginalbit #14
Posted 02 May 2013 - 12:48 AM
First, I dont copy the programs, i modify them to work with computercraft(the license allow it) and i posted the link and said in the FIRST LINE that i have modified it
The difference in your XML code and the source code you got it from is that you removed the module function call and comment block. Also I see nowhere in the git repo that you got this from a license saying that you can modify and distribute.
Lyqyd #15
Posted 03 May 2013 - 01:36 PM
What does your HTML parser output for this page?


<html>
<body>
<b>
This text is bold
<i>
This text is bold and italic
</b>
This text is italic
</i>
</body>
</html>
TableCraft0R #16
Posted 04 June 2013 - 01:56 AM
maybe the key of my new os that is NOT working?
xcrafter_40 #17
Posted 12 January 2017 - 04:03 AM
How exactly do I use this?
I put in:

<html>
  <body>
    <p>hi!</p>
  </body>
</html>
And I got:

{1.0={1.0=
  , 2.0={1.0=
    , 2.0={1.0=hi!, _tag=p, _attr={}}, _tag=body, 3.0=
  , _attr={}}, _tag=html, 3.0=
, _attr={}}, _tag=#document, _attr={}}
Now how would I index this data? In python it would be like:

>>> print(htmlvar["html"]["body"]["p"])
"hi!"