This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
Espen's profile picture

[1.52]fs.getSize() returns unexpected values

Started by Espen, 17 April 2013 - 11:45 PM
Espen #1
Posted 18 April 2013 - 01:45 AM
ComputerCraft Version Information: 1.52

Description of Bug:
  • If a file is <= 508 bytes, then fs.getSize() will always show its size as 512 bytes.
  • If a file is > 508 bytes, then fs.getSize() will always show 4 bytes too much (e.g. 627 bytes will be shown as 630 bytes).
Steps to Reproduce Bug:
  1. Create a file <= 508 bytes.
  2. Call fs.getSize() on the file.
  3. => Result: "512"
  1. Create a file > 508 bytes.
  2. Call fs.getSize() on the file.
  3. => Result: Real file size + 4 bytes.



User-Story:
While trying to make use of fs.getSize() I noticed the two peculiarities mentioned above.

The second one of them isn't too much of a problem. Because as long as it remains consistent (always 4 bytes too much) then we can account for that programmatically.
But the first one always showing 512 bytes for smaller files is a bit more problematic.

The way I wanted to use fs.getSize() was to make sure two files had the same size.
My reason for that is that I wanted to process the files byte-by-byte instead of reading them in all at once first.
I don't want to read them in all at once because the files could potentially be quite large.

As it is right now I'll just have to check if a file is <= 512 according to fs.getSize().
If it is, I just read it in all at once, since it isn't that big really.
If it is bigger than 512 bytes, then I can read it byte-by-byte and just keep in mind that it shows 4 bytes too many.

So it's not really a problem for what I'm using it for right now, but I thought I'd mention it in case it wasn't known yet and because
I could imagine there being situations where you'd want to know the exact file size of files < 508 bytes.
If this is intended behaviour though or if I'm missing something, then I'd appreciate someone to point it out to me all the same! :)/>


Edit: Changed "3 bytes too much" into "4 bytes too much" and "509 bytes" into "508 bytes".
Edited on 18 April 2013 - 07:40 PM
superaxander #2
Posted 18 April 2013 - 02:52 AM
How could such a problem exist… Hmm…Weird
BigSHinyToys #3
Posted 18 April 2013 - 03:10 AM
The first problem is to stop people having hundreds of small files so they are all counted as that size.
Cloudy #4
Posted 18 April 2013 - 03:41 AM
There is a minimum file size. This is intended.

As for 3 bytes too much - I'll have to look into that. Not sure how it is happening.
Espen #5
Posted 18 April 2013 - 03:53 AM
There is a minimum file size. This is intended.
You mean minimum as far as fs.getSize() is concerned, or that there cannot be files in CC smaller than 512 bytes?
Because when I write a file of, say, 15 bytes and look at its size from outside of Minecraft, then it shows 15 bytes.
fs.getSize() will tell me it's 512 bytes though.

Edit: Changed "3 bytes too much" into "4 bytes too much" and "509 bytes" into "508 bytes" in the OP.
Edited on 18 April 2013 - 04:11 AM
Espen #6
Posted 18 April 2013 - 06:59 AM
Ok, I investigated this a bit more and it's becoming ever so weirder:
  • When I write a file with a text editor outside of Minecraft and the file turns out to be 840 bytes big, then fs.getSize() will tell me:
  • 844
  • When I use CC to read that file and write it byte-by-byte into a new file, then fs.getSize() on that new file will tell me:
  • 845
  • But I can confirm that both files have an equal size outside of Minecraft:
  • 840
!? :blink:/> !?

I even counted the number of bytes using a hex editor and the numbers match up.
So I guess that means (barring any faux-pas on my part) fs.getSize() is unreliable at the moment?



EDIT:
The length of the file names are added to the file-size that fs.getSize() returns!
Wow, I thought I was loosing my mind when the size seemed to be different despite the exact same content of the files.

So for anyone interested, fs.getSize() seems to work with the following ruleset:
  • If [content of file] + [length of file name] <= 512 bytes then fs.getSize() == 512
  • Else fs.getSize() == [content of file] + [length of file name]
All is good again (with my brain) ^_^/>

EDIT #2: Refined ruleset after some further tests. Should be airtight now.
Edited on 18 April 2013 - 05:53 AM
immibis #7
Posted 18 April 2013 - 06:20 PM
It returns the size used for counting towards the disk space limit - not the length of the contents of the file.
Espen #8
Posted 18 April 2013 - 09:38 PM
It returns the size used for counting towards the disk space limit - not the length of the contents of the file.
But surely the way it works is not intended? I mean, even if small files count as at least 512 bytes, or even if the file names are added to the size of larger files when computing the quota, shouldn't one expect fs.getSize() to return the actual size of the file?
A propos, I think I need to change the topic title from "inconsistent" to "unexpected", seeing as it actually is behaving consistent, just not as expected IMO.^^

Ah well, in the meantime I've helped myself with some code which counts all bytes for files reported as <= 512 bytes and simply subtracts the length of the filename from files reported as > 512 bytes. It's not optimized yet, but whoever is interested in a possible solution for getting the real file-sizes, here you go:
--[[ Helper Function for getSize( _file ):		]]
--[[   Returns the number of bytes for the given file.  ]]
local function getNumBytes( _file )
  local hFile
  local byte
  local byteCount = 0

  -- Open file
  hFile = fs.open( _file, "rb" )

  -- Count bytes
  repeat
	byte = hFile.read()
	if byte ~= nil then byteCount = byteCount + 1 end
  until byte == nil

  -- Close file handle
  hFile.close()

  -- Return size
  return byteCount
end

--[[ Returns the exact size of files ]]
function getSize( _file )
  -- Get filename-part of _file path.
  local name = string.match( _file, "[^%s/\.]+$" )
  -- Kindly ask fs.getSize for its opinion about the supposed file-size ^^
  local fileSize = fs.getSize( _file )

  if fileSize > 512 then
	return fileSize - #name   -- Minus the filename, as fs.getSize() counts that with large files as well.
  else
	-- Need to count the bytes due to behaviour of fs.getSize()
	return getNumBytes( _file )
  end
end