Lex is a Lua lexer made specifically (from scratch, by me) for code editors, meaning it is designed to handle syntax errors without choking. The input string can always be reconstructed only from the tokens returned. Here's a screenshot from ShEdit, my customized edit program that uses Lex to do syntax highlighting, instead of weird Lua patterns:



Honestly, I think it's pretty cool. It was also a fun project to work on. And I'm releasing it publicly for all to use!

You can find it here. Use the following program to install it:


pastebin get edyuQ5xY lex

Then, you can just use os.loadAPI('lex') to load it. From the docs in the code:


-- Lex, by LoganDark
-- Can be loaded using os.loadAPI, has only a single function: lex.lex('code here')
-- It returns a list of lists, where each list is one line.
-- Each line contains tokens (in the order they are found), where each token is formatted like this:
-- {
--  type = one of the token types below,
--  data = the source code that makes up the token,
--  posFirst = the position (inclusive) within THAT LINE that the token starts
--  posLast = the position (inclusive) within THAT LINE that the token ends
-- }

-- Possible token types:
--  whitespace: Self-explanatory. Can match spaces, newlines, tabs, and carriage returns (although I don't know why anyone would use those... WINDOWS)
--  comment: Either multi-line or single-line comments.
--  string: A string. Usually the part of the string that is not an escape.
--  escape: Can only be found within strings (although they are separate tokens)
--  keyword: Keywords. Like "while", "end", "do", etc
--  value: Special values. Only true, false, and nil.
--  ident: Identifier. Variables, function names, etc..
--  number: Numbers!
--  symbol: Symbols, like brackets, parenthesis, ., .., ... etc
--  operator: Operators, like =, ==, >=, <=, ~=, etc
--  unidentified: Anything that isn't one of the above tokens. Consider them ERRORS.

Windows-style line endings (CRLF) are not supported because I absolutely despise them. Have a nice day.