Posted 07 February 2013 - 07:42 PM
I am trying to design a database engine to be used in ComputerCraft. As far as I can tell, there is no file seeking capabilities in CC so to read data at a certain position in a file I would need to first read all the data before that position which would have performance penalty. I believe I have come up with a novel solution to this, but I don't know enough about NTFS (I'm on windows) to know whether this would be better or worse.
In my design, a table would be represented as a folder. Each record would be represented as a file inside the table folder. The record files' names would be the record ID (1, 2, 3, etc). I would persist indexes as one file per index, but I would keep them in memory for speed reasons. Changes to an index would be written to a log which could be used to update the index to a valid state in case of a crash, but during normal operation I would save the in-memory version of the index to disk at a chosen interval while clearing the log.
I chose this design because I can use the fs api to go directly to the record without having to do a bunch for read calls on a file to find the record. I have no idea of how the directory tree is stored in NTFS or other file systems, so this may not give me the performance gains I hope for. On it's surface though it seems like it could work out well. At the very least I can safely assume that it would cost more disk space than a single file table/db design. I would never design a database engine like this outside of ComputerCraft, but with the file IO restrictions that are in place this seems like a decent idea. What do you guys think?
In my design, a table would be represented as a folder. Each record would be represented as a file inside the table folder. The record files' names would be the record ID (1, 2, 3, etc). I would persist indexes as one file per index, but I would keep them in memory for speed reasons. Changes to an index would be written to a log which could be used to update the index to a valid state in case of a crash, but during normal operation I would save the in-memory version of the index to disk at a chosen interval while clearing the log.
I chose this design because I can use the fs api to go directly to the record without having to do a bunch for read calls on a file to find the record. I have no idea of how the directory tree is stored in NTFS or other file systems, so this may not give me the performance gains I hope for. On it's surface though it seems like it could work out well. At the very least I can safely assume that it would cost more disk space than a single file table/db design. I would never design a database engine like this outside of ComputerCraft, but with the file IO restrictions that are in place this seems like a decent idea. What do you guys think?