This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
Engineer's profile picture

Checking for an escaped quote

Started by Engineer, 21 September 2014 - 09:24 PM
Engineer #1
Posted 21 September 2014 - 11:24 PM
Hello folks,

Im stuck with my parser I am writing. It basically is a recursive descent parser, which technically parses it letter by letter. That is all going nice for me, but I got a problem. The part Im stuck on with the parser, is parsing a string like this:

blabla bla "I need this part \" and this part" blablal
I know when I need to get parsing the string, when it hits the first double quote. Then we need get to the second quote because a string is enclosed in those. In this particular case it simply will just stop parsing when it hits the second double quote, because there is no real difference between a quote and escaped quote.

To clearify for my rather vague description, I would need this output (when I print it to the console:

I need this part " and this part

I really consider patterns as a last option because that will screw up the tokenizer and would have me a lot of work to do. So how would I detect an escaped cape versus one which isn't?
theoriginalbit #2
Posted 22 September 2014 - 03:54 AM
What have you got so far? I assume it is thinking that the escaped quote is the end of the string?
Engineer #3
Posted 22 September 2014 - 08:04 AM
What have you got so far? I assume it is thinking that the escaped quote is the end of the string?
I technically have nothing yet on this part, because I already had done some tests and everything stopped at the quoted string. So officially I have nothing but it would work how you said it. It's not that hard to do with a tokenizer, it simply stops at any escaped quote (however it will find a difference between " and ', but that's obvious)
theoriginalbit #4
Posted 22 September 2014 - 08:14 AM
well if you are doing it letter by letter, you could simply have a boolean flag for when you discover a \ and then just ignore the next character, or alternatively, validate the next character is a valid character, since this is also a valid string

'hello \" world'
you don't have to only have them terminated within a double quoted string.
Bomb Bloke #5
Posted 22 September 2014 - 08:27 AM
I'd go with something along these lines:

local myData = "blabla bla \"I need this part \\\" and this part\" blablal"

local foundInQuotes, inQuotes, i = "", false, 1

while i <= #myData do
	if myData:sub(i,i) == "\\" then
		i = i + 1
		
		if inQuotes then
			if myData:sub(i,i) == "n" then
				foundInQuotes = foundInQuotes .. "\n"
			else foundInQuotes = foundInQuotes .. myData:sub(i,i) end
		end
		
	elseif myData:sub(i,i) == "\"" then
		inQuotes = not inQuotes

	elseif inQuotes then
		foundInQuotes = foundInQuotes .. myData:sub(i,i)
	end
	
	i = i + 1
end

print(foundInQuotes)

Though this doesn't deal with the single-quote matter BIT points out, I'm sure you can see how to modify it if you want that functionality.
Edited on 22 September 2014 - 06:29 AM
Engineer #6
Posted 22 September 2014 - 08:42 AM
well if you are doing it letter by letter, you could simply have a boolean flag for when you discover a \ and then just ignore the next character, or alternatively, validate the next character is a valid character, since this is also a valid string

'hello \" world'
you don't have to only have them terminated within a double quoted string.
Thing is, \" is only one character long. So I cannot detect the backslash. I will get back to this topic once I get to a computer and can test more again.

Both of you, bomb bloke and bit, thanks for your help already
Engineer #7
Posted 22 September 2014 - 02:03 PM
I have got the following code:
Spoiler

function readString( self )
	local char = self.reader:peek()

	local doubleQuote = char == "\""
	local singleQuote = char == "'"
	if doubleQuote or singleQuote then
		self.reader:pop()
		
		local s = ""
		local backslash = false
		while doubleQuote or singleQuote do
			if backslash then
				backslash = false
				s = s .. self.reader:pop()
			else
				local char = self.reader:peek()
				if char == ( doubleQuote and "\"" or "'" ) then
					doubleQuote = false
					singleQuote = false
					self.reader:pop()
					return s
				elseif char == "\\" then
					backslash = true
					s = s .. "\\"
				else
					s = s .. self.reader:pop()
				end
			end
			
		end
	end
end
However, this is not appropriate for a parser, because when a quote is missing it will error because of a TLWY. That is one point I will have to fix, but this is mostly for testing circumstances.

I did the following test with this function in mind:

local t = {}
t.reader = StringReader.new( '"bla bla bla \" blabla"' )
print(readString(t))
This does what I thought initially, it prints the following to the console (without quotes):

"bla bla bla "

I hope you guys have more ideas or have improvements on it, but remember that this is mostly a test funtion. Also some information on the StringReader object:
SpoilerStringReader.peek(): With this function you can literally peek what the current character is without moving the position.
StringReader.pop(): Returns the current character and moves the position by one.
When both of those functions return a empty string, it means that the end of the string has been reached

Thanks in advance and for the effort
MKlegoman357 #8
Posted 22 September 2014 - 03:48 PM
Just an idea, but I think it's not completely your fault. The way you define this string:


'"bla bla bla \" blabla"'

Well, the Lua parser converts it to this, if I'm not mistaken:


'"bla bla bla " blabla"'

That is probably why your other attempts failed when they tried to find the backslash. To produce a string with a backslashed double quote, inside double quotes you should use this:


"bla bla \\\" bla"

…and then the Lua parser will convert it to this:


"bla bla \" bla"
Engineer #9
Posted 22 September 2014 - 04:48 PM
Just an idea, but I think it's not completely your fault. The way you define this string:


'"bla bla bla \" blabla"'

Well, the Lua parser converts it to this, if I'm not mistaken:


'"bla bla bla " blabla"'

That is probably why your other attempts failed when they tried to find the backslash. To produce a string with a backslashed double quote, inside double quotes you should use this:


"bla bla \\\" bla"

…and then the Lua parser will convert it to this:


"bla bla \" bla"
Thats obvious. Becuase \\ escapes the backslash so there is essentially a backslash. But what Im trying to do is to detect an escaped string, which is: "\""
For the record, I dont define the string myself, it is actually a JSON parser. Of course you could think why would someone put "djsfnhk\" ldejfnhkdf" as key, but I want to it be perfect; it should function as long as its a valid string.

Thanks for pointing it out though, any effort is appreciated! :)/>
Engineer #10
Posted 22 September 2014 - 06:50 PM
I have found a possible lead to a solution to this. If I'd use patterns for this, it could possibly work. Because the following works like a charm:


local s = "\"This is \"a string\" with escaped\" quotes\""
print(s:gsub( "\"(.*)\"", "%1"))
--> This is "a string" with escaped" quotes
However, this will completely kill the use of my tokenizer, although I can work around it and make it work properly. My question now is, given this information, can I make use of y tokenizer properly and still detect escaped quotes?
Bomb Bloke #11
Posted 23 September 2014 - 03:27 AM
One problem with that string is that it doesn't contain escaped quotes. You used escape characters to put the quotes in there, but the escape characters themselves aren't inserted.

I'm confused as to where you're getting your strings from, and the exact format they should be in. Let's say you were reading them from a text file - what, verbatim, would you put in that file?
Engineer #12
Posted 23 September 2014 - 03:00 PM
I just realized something which makes this unnecessary to even try. If one escapes a quote it is for the compiler/parser, not the string itself.

Im a fool sometimes xD
MKlegoman357 #13
Posted 23 September 2014 - 05:40 PM
Lol, that was exactly what I mentioned to you in my first post :D/>
Engineer #14
Posted 23 September 2014 - 09:30 PM
Lol, that was exactly what I mentioned to you in my first post :D/>
Im assuming I read it too quickly or something but I completely missed it. I should think and read more thoroughly next time.

Thank you Bomb Bloke, MKlegoma357 and theoriginalbit for the help you guys provided.
It really, really is appreciated!
theoriginalbit #15
Posted 24 September 2014 - 12:36 AM
I just realized something which makes this unnecessary to even try. If one escapes a quote it is for the compiler/parser, not the string itself.
I wish I'd've come back and read this thread sooner. I assumed you already knew this, hence my suggestion of the boolean flag.
Engineer #16
Posted 24 September 2014 - 08:20 PM
I wish I'd've come back and read this thread sooner. I assumed you already knew this, hence my suggestion of the boolean flag.
I should have known that, but Ijust didnt realize that. Its worth saying I literally did this:
Spoiler