This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
eniallator's profile picture

Compression/Decompression not working for 1 file whilst working for others

Started by eniallator, 29 July 2016 - 03:53 AM
eniallator #1
Posted 29 July 2016 - 05:53 AM
Hi, I recently started trying to compress/decompress various files and i've run into a bug. I've tried the functionality with massive files and also smaller files and it works 100% (i've stored the contents of the before and after if the file in variables and compared with a originalFile == fileAfterCompression and it's returned true). I've been trying to go through the code to see for myself if i could find it, but with no luck.

compression program:
Spoiler

-- This is eniallator's Compression program that takes an input file and then you can choose to either compress it or decompress it.
-- The way compression works is by making a table of all the different words that come up in the file.
-- It will then instead of putting the word in the actual body of the file, it will just put the index of the table that the program has to search for.
-- The index is in the form of a byte and i use string.char() and string.byte() to convert between the 2.
--
-- ================================================================================
--
-- Im currently using the following numbers for special cases:
-- 0 = bigger than 255, 1 = new line, 2 = multiple spaces, 3 = 1 space
--
-- I also don't use 13 because when LUA converts character 13 back to it's number, it will be the same result as byte 10.
-- Aparts from that, every other byte i just use to index the words.

local tArgs = { ... }

-- Function to return a table of the lines of a file
function fileToLines(file)

  local read = fs.open(file,"r")
  local lines = {}

  -- While loop to keep on adding the current line to a table and if theres no current line, it will break
  while true do

	local currLine = read.readLine()

	if currLine then

	  table.insert(lines, currLine)
	else

	  break
	end
  end

  read.close()
  return lines
end

-- Function to split a string up at its spaces
local function wordSplit(string)

  local out = {}

  -- For loop to iterate over the words in the string
  for word in string:gmatch("%S+") do

	table.insert(out, word)
  end

  return out
end

-- Function that compression uses, if the number given in the arguments is bigger than 254 it will keep on adding the byte 0 to the return
local function checkNum(num)

  local out = ""

  -- While to iterate when num is bigger than 254
  while num > 254 do

	out = out .. string.char(0)
	num = num - 254
  end

  -- Making sure num isn't byte 13
  if num >= 13 then num = num + 1 end

  -- Returning the bytes instead of the number.
  return out .. string.char(num)
end


-- Function to add any new words to the index
local function sortWord(word,outTbl,line)
  if #word > 0 then

	local wordFound = false

	-- Iterating over the first index of outTable
	for i=1,#outTbl[1] do

	  -- Checking if the word already exists or not
	  if outTbl[1][i] == word then

		table.insert(outTbl[line+1],i+3)
		wordFound = true
		break
	  end
	end

	-- Adding the word to the index if it hasn't been found
	if not wordFound then

	  table.insert(outTbl[1],word)
	  table.insert(outTbl[line+1],#outTbl[1]+3)
	end
  end

  return outTbl
end

-- Function to handle spaces in the file
local function sortSpaces(spaces,outTbl,line)
  if spaces > 0 then

	-- Checking if the number of spaces is bigger than 1 so it can add a different byte depending on if it is or not
	if spaces > 1 then

	  table.insert(outTbl[line+1],2)
	  table.insert(outTbl[line+1],spaces)
	else

	  table.insert(outTbl[line+1],3)
	end
  end

  return outTbl
end

-- The compression function
function compress(lines)

  local outTable = {{}}

  for line=1,#lines do

	local spaces = 0
	local word = ""
	outTable[line+1] = {}

	-- For to handle the entire compression to convert the file into bytes
	for i=1,#lines[line] do

	  local currChar = lines[line]:sub(i,i)

	  -- A crude way to split up the lines into words and spaces
	  if currChar == " " then

		spaces = spaces + 1
		outTable = sortWord(word,outTable,line)
		word = ""
	  else

		word = word .. currChar
		outTable = sortSpaces(spaces,outTable,line)
		spaces = 0
	  end
	end

	outTable = sortWord(word,outTable,line)
	outTable = sortSpaces(spaces,outTable,line)
  end

  local outString = ""

  -- For loop to combine the outTable into an output string
  for i=1,#outTable do
	for j=1,#outTable[i] do
	  if i == 1 then
		if #outString > 0 then outString = outString .. " " end

		outString = outString .. outTable[i][j]
	  else

		outString = outString .. checkNum(outTable[i][j])
	  end
	end

	-- Adding the new line character to the end of the line. The index is always at line 1 so i choose to add a \n to the end of line 1
	if i == 1 then

	  outString = outString .. "\n"
	else

	  outString = outString .. string.char(1)
	end
  end

  return outString
end

-- The decompression function
function decompress(lines)

  -- Seperating the index table from the body table
  local index = wordSplit(lines[1])
  local body = {}
  table.remove(lines,1)

  -- For to convert the compressed file into it's original lines and where the indexes should go
  for line=1,#lines do

	-- Inserting the character 10 every time the file goes onto a new line. This is because character 10 actually is the new line character
	if line > 1 then

	  table.insert(body,10)
	end

	-- For loop to convert the bytes into the corresponding indexes
	for i=1,#lines[line] do

	  local indexNum = string.byte(lines[line]:sub(i))

	  if indexNum >= 13 then

		indexNum = indexNum - 1
	  end

	  table.insert(body,indexNum)
	end
  end

  local counter = 1
  local fullFile = ""

  -- While loop to convert the indexes into the corresponding words (aparts from the special characters)
  while counter < #body do

	-- Checking if the current index is 0 and then converting it into it's actual index (because 0 means it's bigger than 254)
	if body[counter] == 0 then

	  local multiples = 0

	  -- Adding up the multiples of 254
	  while body[counter] == 0 do

		counter = counter + 1
		multiples = multiples + 254
	  end

	  -- Inserting the corresponding word with the full index from adding the multiples and the current index
	  fullFile = fullFile .. index[body[counter] + multiples-3]

	-- Seeing if the current index is 1 which is the new line character
	elseif body[counter] == 1 then

	  fullFile = fullFile .. "\n"

	-- Seeing if the current index is 2 and then seeing what the next index is to make that next index * spaces.
	elseif body[counter] == 2 then

	  counter = counter + 1

	  -- Iterating for the amount of spaces that should be in and adding them
	  for i=1,body[counter] do

		fullFile = fullFile .. " "
	  end

	-- Seeing if the current index is 3 and inserting a space into the file
	elseif body[counter] == 3 then

	  fullFile = fullFile .. " "

	-- If nothing before has caught the index, the corresponding word to that index will be inserted into the file.
	else
	  if index[body[counter]-3] then --# i added this if statement because it was trying to index nil at some point which is weird...
		fullFile = fullFile .. index[body[counter]-3]
	  else
		fullFile = fullFile .. "STUFFEDUPHERE"
	  end
	end

	counter = counter + 1
  end

  return fullFile
end

-- Handling program arguments
if tArgs[1] == "com" then
  if tArgs[2] and fs.exists(tArgs[2]) then
	if tArgs[3] then

	  comString = compress(fileToLines(tArgs[2]))

	  openFile = assert(fs.open(tArgs[3],"w"),"Something went wrong when trying to open the output file")
	  openFile.write(comString)
	  openFile.close()
	else

	  print("Error: Compression requires a third argument")
	end
  else

	print("Error: Compression requires a valid file name as the second argument.")
  end
elseif tArgs[1] == "decom" then
  if tArgs[2] and fs.exists(tArgs[2]) and tArgs[3] then
	if tArgs[3] then

	  comString = decompress(fileToLines(tArgs[2]))

	  openFile = assert(fs.open(tArgs[3],"w"),"Something went wrong when trying to open the output file")
	  openFile.write(comString)
	  openFile.close()
	else

	  print("Error: Decompression requires a third argument")
	end
  else

	print("Error: Decompression requires a valid file name as the second argument.")
  end
else

  print('Error: Invalid syntax. Correct syntax is: "compress [com/decom] [input file name] [output file name]"')
end

original file that broke it:
Spoiler

local grid = {}
-- Defining the 2D table as grid

local maxX,maxY = term.getSize()
-- Getting the dimensions of the screen for the game

local colours = {
on = colors.green,
off = colors.red
}
-- Defining the colours used below

for x=1,maxX do

  grid[x] = {}

  for y=1,maxY do

   grid[x][y] = "on"
  end
end
-- Making the 2D table with "on" as all of the values

local function checkWin()

  for i=1,#grid do
   for j=1,#grid[i] do
	 if grid[i][j] == "on" then

	  return false
	  -- If there is a value that is "on" it will immediately return false and prevent further unecessary iterations
	 end
   end
  end
  -- Iterating over the 2D table

  return true
  -- If the function hasn't already returned false, it will return true meaning that the user has won
end
-- Seeing if the user has won the game or not

local function toggle(x,y)
  if grid[x][y] == "on" then

   grid[x][y] = "off"
  else

   grid[x][y] = "on"
  end
end
-- Toggling the status of a single tile in the 2D table

local function toggleAdjacent(x,y)

  toggle(x,y)
  -- Toggling the tile that the user has clicked. I don't need an if to see if it exists since i already know it exists

  if grid[x+1] then

   toggle(x+1,y)
  end

  if grid[x-1] then

   toggle(x-1,y)
  end

  if grid[x][y+1] then

   toggle(x,y+1)
  end

  if grid[x][y-1] then

   toggle(x,y-1)
  end
  -- First seeing if the 4 adjacent tiles exist and then if they do, it will toggle them
end

local function displayGrid()
  for y=1,#grid do
   for x=1,#grid[y] do

	 term.setCursorPos(x,y)
	 -- Setting the cursor's position to what the for loop's are up to

	 if grid[x][y] == "on" then

	  term.setBackgroundColor(colours.on)
	  -- Displaying the on colour if the tile is on
	 else

	  term.setBackgroundColor(colours.off)
	  -- Displaying the off colour if the tile is off
	 end

	 term.write(" ")
	 -- displaying the background without text
   end
  end
end
-- Displaying the 2D table so that the user can see whats been turned on and off

while not checkWin() do
  -- Only iterating while the user hasn't won the game

  displayGrid()

  local event, button, clickX, clickY = os.pullEvent("mouse_click")
  -- Getting where the user has clicked

  toggleAdjacent(clickX,clickY)
  -- toggling the adjacent tiles aswell as the tile the user has clicked on

end

term.setBackgroundColor(colors.black)
shell.run("clear")
print("You Win!")
sleep(2)
-- Displaying the win message

compressed/decompressed version of the file:
Spoiler

local grid = {}
-- Defining the 2D table as grid

local maxX,maxY = term.getSize()
-- Getting the dimensions of the screen for the game

local colours = {
on = colors.green,
off = colors.red
}
-- Defining the colours used below

for x=1,maxX do

  grid[x] = {}

  for y=1,maxY do

  grid[x][y] = "on"
  end
end
-- Making the 2D table with "on" as all of the values

local function checkWin()

  for i=1,#grid do
  for j=1,#grid[i] do
  if  grid[i][j] == then "on" return

  if  false --
  if  If there is a value that it a "on" will immediately return and -- prevent further unecessary iterations end
  if  end
  Iterating
  end
  -- over true the 2D table

  and hasn't
  -- there the function already returned false, meaning will immediately and hasn't user it the has won Seeing
end
-- or grid[i][j] the has won Seeing the game not toggle(x,y)

local function grid[x][y]
  grid[i][j] "off" then "on" return

  grid[x][y] = else
  Toggling

  grid[x][y] = "on"
  end
end
-- status the single of value tile in toggleAdjacent(x,y) the 2D table

local function clicked.

  grid[x][y]
  -- status the in it the has won I don't need an to grid[i][j] see exists grid[i][j] will since i know returned grid[x+1] will since

  grid[i][j] toggle(x+1,y) return

  grid[x-1]
  end

  grid[i][j] toggle(x-1,y) return

  grid[x][y+1]
  end

  grid[i][j] toggle(x,y+1) return

  grid[x][y-1]
  end

  grid[i][j] toggle(x,y-1) return

  First
  end
  -- seeing 4 grid[i][j] the adjacent tiles exist they prevent return grid[i][j] do, toggle will immediately them displayGrid()
end

local function y=1,#grid
  for x=1,#grid[y] do
  for term.setCursorPos(x,y) do

  if  Setting
  if  -- cursor's the position what see loop's the for are up term.setBackgroundColor(colours.on) see

  if  grid[i][j] "off" then "on" return

  if  Displaying
  if  If on the colour term.setBackgroundColor(colours.off) grid[i][j] the in a colour
  if  Toggling

  if  off
  if  If on the term.write(" term.setBackgroundColor(colours.off) grid[i][j] the in a term.write("
  if  end

  if  ") displaying
  if  -- background the without text so
  Iterating
  end
end
-- on the 2D table can it the has whats exists been turned while colour prevent term.write("

Only toggle(x,y) checkWin() do
  -- iterating event, Only the has already Seeing the game

  y=1,#grid

  local button, clickX, clickY os.pullEvent("mouse_click") = where
  -- Getting clicked the has won toggleAdjacent(clickX,clickY)

  toggling
  -- aswell the tiles exist term.setBackgroundColor(colors.black) as the in the has won toggleAdjacent(clickX,clickY) colour

end

shell.run("clear")
print("You
Win!") sleep(2)
win
-- on the message STUFFEDUPHERE

I hope you guys can help because i'm completely lost about how it's breaking :(/>
valithor #2
Posted 29 July 2016 - 02:29 PM
I didn't find the error in your code, but I found the difference between the two files. Let's say you have the original file named original, and the compressed/decompressed version named decompressed. You can get the original back by doing this:

local original --# equals the original file that you posted
local decompressed --# equals the decompressed file that you posted
local fixed = decompressed:gsub(string.char(10),string.char(13)..string.char(10))
print(fixed==original) --# prints true
Sense you know the code, this might help you find the bug. :P/>
Edited on 29 July 2016 - 12:37 PM
eniallator #3
Posted 30 July 2016 - 02:02 PM
I didn't find the error in your code, but I found the difference between the two files. Let's say you have the original file named original, and the compressed/decompressed version named decompressed. You can get the original back by doing this:

local original --# equals the original file that you posted
local decompressed --# equals the decompressed file that you posted
local fixed = decompressed:gsub(string.char(10),string.char(13)..string.char(10))
print(fixed==original) --# prints true
Sense you know the code, this might help you find the bug. :P/>
it didn't seem to do anything, im not sure what you mean :P/>
valithor #4
Posted 30 July 2016 - 06:33 PM
I didn't find the error in your code, but I found the difference between the two files. Let's say you have the original file named original, and the compressed/decompressed version named decompressed. You can get the original back by doing this:

local original --# equals the original file that you posted
local decompressed --# equals the decompressed file that you posted
local fixed = decompressed:gsub(string.char(10),string.char(13)..string.char(10))
print(fixed==original) --# prints true
Sense you know the code, this might help you find the bug. :P/>
it didn't seem to do anything, im not sure what you mean :P/>

In my test the only difference between the two files, was the original had the character 13 before every newline char, while the compressed and decompressed one did not. After I added them back in, they were the same string.

Do keep in mind the code I posted is not runnable. The original variable needs to contains the file contents of the original file, and the decompressed variable needs to contain the file contents of the file that was compressed and decompressed. You would have to set those yourself.
Emma #5
Posted 30 July 2016 - 08:07 PM
was the original had the character 13 before every newline char
The reason character 13 was before every newline was probably because the file was created in a windows environment, as windows editors (like notepad and such) end lines with carriage-return>linefeed (\r\n where \r == char 13), whereas pretty much everything else uses just linefeed (\n). This is (most likely) also the reason why cc converts 13 to 10, because 10 is linefeed, and cc is trying to normalize the endings to linefeeds only.
eniallator #6
Posted 31 July 2016 - 12:23 AM
In my test the only difference between the two files, was the original had the character 13 before every newline char, while the compressed and decompressed one did not. After I added them back in, they were the same string.

Do keep in mind the code I posted is not runnable. The original variable needs to contains the file contents of the original file, and the decompressed variable needs to contain the file contents of the file that was compressed and decompressed. You would have to set those yourself.

but like, if you look at the body of the compressed/decompressed file, theres a massive difference because something in the algorithm stuffed up which is why i made a thread here. like if you just look at the entire compressed/decompressed file, you will see that its not just the newlines but the actual text itself.
Edited on 30 July 2016 - 10:24 PM
valithor #7
Posted 31 July 2016 - 03:12 AM
In my test the only difference between the two files, was the original had the character 13 before every newline char, while the compressed and decompressed one did not. After I added them back in, they were the same string.

Do keep in mind the code I posted is not runnable. The original variable needs to contains the file contents of the original file, and the decompressed variable needs to contain the file contents of the file that was compressed and decompressed. You would have to set those yourself.

but like, if you look at the body of the compressed/decompressed file, theres a massive difference because something in the algorithm stuffed up which is why i made a thread here. like if you just look at the entire compressed/decompressed file, you will see that its not just the newlines but the actual text itself.

Very well… I will go redo my test. If I get the same result I can not help you. In my original test, the file (the one posted in the op) that I compressed and then decompressed after doing the gsub was the exact same as the original one. This, to me at least, means that any formatting errors was in some way caused by the missing characters. I will edit this post when I have done the test.

edit:

I see what happened now. I compressed and decompressed the file, and it did not have the same results as when you did it (I should have checked to make sure I got the same results before I posted the first time). I got a file that looked the same (although it was missing those characters). So… Either there is a difference between the file you posted and the one you tried to compress, a different version of the program was used, or there is something with the environment you tried to compress it in that caused it to act up. This could have been caused by you using a emulator (I tested in game), or some other thing that had been done to the computer prior to compressing/decompressing.

Another thing is I could be using the program in a way that is different from how it is supposed to be used. Is this the correct usage:

Program is named compress.

compress com file1 file2
compress decom file2 file3

After running the program with the following two arguments file1 and file3 should be the same (relatively).
Edited on 31 July 2016 - 01:41 AM
eniallator #8
Posted 01 August 2016 - 07:32 AM
Very well… I will go redo my test. If I get the same result I can not help you. In my original test, the file (the one posted in the op) that I compressed and then decompressed after doing the gsub was the exact same as the original one. This, to me at least, means that any formatting errors was in some way caused by the missing characters. I will edit this post when I have done the test.

edit:

I see what happened now. I compressed and decompressed the file, and it did not have the same results as when you did it (I should have checked to make sure I got the same results before I posted the first time). I got a file that looked the same (although it was missing those characters). So… Either there is a difference between the file you posted and the one you tried to compress, a different version of the program was used, or there is something with the environment you tried to compress it in that caused it to act up. This could have been caused by you using a emulator (I tested in game), or some other thing that had been done to the computer prior to compressing/decompressing.

Another thing is I could be using the program in a way that is different from how it is supposed to be used. Is this the correct usage:

Program is named compress.

compress com file1 file2
compress decom file2 file3

After running the program with the following two arguments file1 and file3 should be the same (relatively).

Hmmm that is odd. The CC version i am using is 1.79 for minecraft version 1.8.9 (not on an emulator). Now im just even more confused >.<