Home > Archive > Compression > May 2006 > ANN: ZPack File Format, spec and tools
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
ANN: ZPack File Format, spec and tools
|
|
| cr88192 2006-05-10, 6:55 pm |
| ok, finally got around to uploading this in case anyone cares.
the tools/source can be downloaded here:
http://bgb-sys.sourceforge.net/zpack.zip
contents are the source files, and a few precompiled exe's (cygwin),
basically, a command line tool and a shell-based tool.
any comments (about the format or tools) could be helpful.
of note, some code from zlib (for crc combining) was used, but no mention
was made in the liscense (some credit is given in the source however). I
didn't notice this until after I uploaded it, but I am not going to upload
it again (ssh server slow...).
otherwise, I have created a BWT based algo that gets close to the ratios of
bzip2 in my tests, but this algo is not implemented in the format/tools. I
may add it later, if I feel it makes sense (it compresses better than
deflate, but is slower, and is not a common algo in the same way deflate
is...).
or such...
| |
| iBBiS@gmx.de 2006-05-10, 6:55 pm |
| > ok, finally got around to uploading this in case anyone cares.
>
> the tools/source can be downloaded here:
> http://bgb-sys.sourceforge.net/zpack.zip
> ...
> any comments (about the format or tools) could be helpful.
You should not redefine "ulong" (especially because you want it to
become "unsigned long long"). I renamed it to "ullong" and it compiles
on Linux too. BTW: it isn't a good idea to expect those basic types to
have a special bit size - use e.g. "u_int_{8,16,32,64}_t" from
"sys/types.h" to make sure to get what you need.
$ ./zpack -c image *.txt
works fine and I can list and test it too. Unfortunately, extraction
seems to be flawed because it doesn't finish... :-(
I'd like to see some kind of documentation which describes the
advantages/di vantages compared to existing solutions. This might
also be helpful to convince more people ...
Christian
| |
| cr88192 2006-05-10, 6:55 pm |
|
<iBBiS@gmx.de> wrote in message
news:1147278100.075629.320760@i40g2000cwc.googlegroups.com...
>
> You should not redefine "ulong" (especially because you want it to
> become "unsigned long long"). I renamed it to "ullong" and it compiles
> on Linux too. BTW: it isn't a good idea to expect those basic types to
> have a special bit size - use e.g. "u_int_{8,16,32,64}_t" from
> "sys/types.h" to make sure to get what you need.
>
note that my init function checks these types to make sure that the sizes
are correct, and complains if they aren't. the assumption is that on pretty
much all modern computers (primarily x86 and x86_64, possibly others, eg,
PPC) the sizes are as expected (and I suspect the 32 bit definition of
'long' will likely disappear in not too 'long' anyways, eg when os'es and
code generally migrate to 64 bits).
my code wont work on 16 bit machines anyways, and I don't really care much
about oddball crap...
hell, if I wanted I could even go and assume little-endian and the typical
padding rules (assuming that even something as rare as PPC wont be used, x86
being the one true architecture), instead however, I opted to make it
padding-rules resistent, and also to deal with endianess issues.
also note that on my computer (windows building with either mingw or
cygwin):
these types were othen not defined, so there was no problem.
I have not tested any linux builds thus far.
also note that this code is in early stages, and has not been all that well
tested in general (eg: I have not yet tested it with larger filesets, under
general read/write conditions, ...).
> $ ./zpack -c image *.txt
> works fine and I can list and test it too. Unfortunately, extraction
> seems to be flawed because it doesn't finish... :-(
>
yeah, that is odd (and does not occure on my computer).
you know where it stalls exactly?...
then again, when you renamed ulong, did you also rename all the places it
occured in the source?... a lot of code depends on it, and generally assumes
it is 64 bits...
> I'd like to see some kind of documentation which describes the
> advantages/di vantages compared to existing solutions. This might
> also be helpful to convince more people ...
>
the main point is partly that it is designed to be read/write friendly, vs
zip, which doesn't really handle the read-write case. for larger files it
does "fragmentation", which is intended to help wrt latency but not really
wrt compression (since each fragment is compressed independently of the
others, context is lost between fragments). fragmentation is currently kind
of an ugly kludge wrt the implementation though...
most other archive formats I have looked at have similarly been
stream-based, and thus not really suitible for read/write (in any case, it
needs to be possile to re-organize the contents of the image, which is not
entirely possible with a stream). this is part of why nearly everything is
based on offsets and lengths.
also, the format and code is designed to attempt to much more accurately
emulate a filesystem than zip (the zip code in my main project basically
ends up processing and rebuilding a directory tree from the supplied paths).
as such, it is designed such that a relatively thin wrapper should be needed
to integrate with my main project's filesystem api...
then again, someone else may have done similar, but I have not found any
good examples thus far...
> Christian
>
| |
| Thomas Richter 2006-05-11, 7:55 am |
| cr88192 wrote:
>
> note that my init function checks these types to make sure that the sizes
> are correct, and complains if they aren't.
That should really the job of a compile time assertion rather than a run
time problem. During compile time, you can switch to alternative types
of the right size. The right tool for exactly that is "autoconf". You
write a configure.in that tests for the system and picks what is needed.
> the assumption is that on pretty
> much all modern computers (primarily x86 and x86_64, possibly others, eg,
> PPC) the sizes are as expected (and I suspect the 32 bit definition of
> 'long' will likely disappear in not too 'long' anyways, eg when os'es and
> code generally migrate to 64 bits).
long is 32 bit on Win64, but 64 bit on GNU g++ even on the same
architecture. The C standard does not make any assumptions, except that
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <=
sizeof(long long), and int must be able to represent at least numbers in
range -32768...32737 IIRC, and the int_xy types are optional. Thus, you
better check what you got.
> hell, if I wanted I could even go and assume little-endian and the typical
> padding rules (assuming that even something as rare as PPC wont be used, x86
> being the one true architecture), instead however, I opted to make it
> padding-rules resistent, and also to deal with endianess issues.
It shouldn't be hard to code endian-independent actually.
> most other archive formats I have looked at have similarly been
> stream-based, and thus not really suitible for read/write (in any case, it
> needs to be possile to re-organize the contents of the image, which is not
> entirely possible with a stream). this is part of why nearly everything is
> based on offsets and lengths.
Did you check cpio? Not that I know all the details, but it might just
do what you need, probably.
So long,
Thomas
| |
| cr88192 2006-05-11, 6:55 pm |
|
"Thomas Richter" <thor@math.TU-Berlin.DE> wrote in message
news:4cgchjF15tudlU1@news.dfncis.de...
> cr88192 wrote:
>
>
> That should really the job of a compile time assertion rather than a run
> time problem. During compile time, you can switch to alternative types
> of the right size. The right tool for exactly that is "autoconf". You
> write a configure.in that tests for the system and picks what is needed.
>
I seriously dislike autoconf, personally.
in my experience, it has regularly had quite a few problems on windows.
often, I would much rather edit things manually than be faced with autoconf
when it doesn't feel like working right...
>
> long is 32 bit on Win64, but 64 bit on GNU g++ even on the same
> architecture. The C standard does not make any assumptions, except that
> sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <=
> sizeof(long long), and int must be able to represent at least numbers in
> range -32768...32737 IIRC, and the int_xy types are optional. Thus, you
> better check what you got.
>
that is why I chose 'long long', as this is 64 bits in both the 32 and 64
bit case, and with both gcc and msvc. in my case, I was also partly assuming
gcc as the primary compiler.
personally, I don't care that much about the looseness of the c standards,
what matters is more what is done on the particular architectures in
question...
>
> It shouldn't be hard to code endian-independent actually.
>
yeah. the main annoyance though is when dealing with on-disk data, which
needs either:
to have possible conversion done when reading or writing data from disk
(this may be implicit, depending on how the data is represented on-disk);
to have conversion done when getting or setting slots.
either case has limits, and is not purely automatic.
>
> Did you check cpio? Not that I know all the details, but it might just do
> what you need, probably.
>
dunno the details, but cpio doesn't really sound like anything even close...
cpio is also a tool, and what is needed in my case is code, rather, the tool
is ancillary.
from what I can gather, cpio is generally similar, eg, to something like
tar. that being: you can extract archives, or you can make archives. nothing
special there from what I can gather.
can you incrementally rewrite parts of the archive in a compressed form?
thus far, it doesn't sound like it...
the tool I have written is not actually the intended use of the format (it
is not actually intended for archiving). rather the tool was created as I
needed to have something where I could actually implement and test the code
(implementing directly in my main project would be problematic), and a tool
would be needed, eg, for tasks related to archive/image maintainence.
the actual intended use is actually hardly even that of storing files (in
the conventional sense). more, it is intended to operate as a file-based
persistent store (the file-system notion being particularly general for
storing varying manner of hetrogenous data, and also can be usefully
unpacked into the host filing system).
the objects themselves are more likely to be stored in various forms of
text-based formats (xml, line-based formats, and formats resembling
quake-style entity trees), for which algorithms like deflate are well
suited.
read-write uses are important, as data in the store may change, and complete
reserialization may not be practical. I am also assuming some abstraction
between the in-memory representation and the on-disk representation (thus,
this is not for "heap dump" style stores, as would be expected in most
vm's).
another possible use is holding scripts or compiled object code (though zip
also makes some sense here).
it is likely to be rare that a user will need to manually need to unpack or
repack an archive, but there may be cases where this is useful (for example,
for making a "stock" image, for examining an image, ...).
or such...
a lot of this, however, depends on "other" things I have not gotten around
to...
| |
| iBBiS@gmx.de 2006-05-11, 6:55 pm |
| >> You should not redefine "ulong" (especially because you want it to
[color=darkred]
> note that my init function checks these types to make sure that
> the sizes are correct, and complains if they aren't.
Your program terminates if the sizes are not as expected. This might
prevent failures but doesn't help the normal user who simply can't use
your tool.
> the assumption is that on pretty much all modern computers ...
But this is only an assumption. See the posting of Thomas. If you don't
like autoconf and don't have sys/types.h then you might try limits.h
which is available in every ANSI C compatible environment.
> hell, if I wanted I could even go and assume little-endian ...
no, no - keep your endian-handling ;-)
[color=darkred]
> you know where it stalls exactly?...
No. It simply doesn't finish extraction while eating 98% of the CPU
time. So there seems to be an endless loop somewhere ...
> then again, when you renamed ulong, did you also rename all the places
> it occured in the source?... a lot of code depends on it, and generally
> assumes it is 64 bits...
$ rm *.exe
$ sed -i -e "s/ulong/ullong/g" *
Yes, I'm sure. ;-)
[color=darkred]
> <explanation>
Hey, don't just write some sentences here in comp.compression! Write a
README or a proper documentation or an html page. Don't expect people
to intensively test your tool and read its source code to understand
how it works and what it supports or not.
> that is why I chose 'long long', as this is 64 bits in both the 32
> and 64 bit case, and with both gcc and msvc. in my case, I was
> also partly assuming gcc as the primary compiler.
It wouldn't surprise me to see a 128 bit long long type, especially
since 64-bit longs are becoming usual. Try to avoid special targets and
try to keep as portable as possible. [You know: "the Linux kernel is
not written in C anymore - it is written in GCC".]
> personally, I don't care that much about the looseness of the c standards
OK. But many users DO care because they want to use software on _their_
architecture with _their_ chosen environment and _their_ preferred
compiler!
Christian
| |
|
|
| cr88192 2006-05-11, 6:55 pm |
|
<iBBiS@gmx.de> wrote in message
news:1147369526.169886.89960@j73g2000cwa.googlegroups.com...
>
> Your program terminates if the sizes are not as expected. This might
> prevent failures but doesn't help the normal user who simply can't use
> your tool.
>
pardon the incomming stream of arrogance...
the tool isn't really aimed at "normal users" anyways, it being primarily
not an archiver or similar tool, but rather intended to allow maintanence of
said file format, which will likely be used for backend purposes in my
projects.
likely, that said, your normal user will be going and getting precompiled
binaries anyways, and said binaries will have presumably been built such
that they work (or they don't, and said user is SOL).
all this is assuming that said users can understand such complexities as the
shell and make to begin with (vs thinking they can use the program from
within ms word or something).
recent example: some female couldn't get her document to open. it was an
open-office file. apparently the problem was it had been opened in something
else (wordpad or such), she saw garbage, apparently pressed enter a few
times, and saved. result is a corrupted document.
that said, I don't hold in high regard the "normal user", especially for a
tool/code that is not really intended to be used by "users" anyways...
they want an archiver, for archiving files, there is always winzip or 7-zip.
they want command line, there is infozip or tar.
my tool will not compete with these, as it is inteded for a different usage
area.
> But this is only an assumption. See the posting of Thomas. If you don't
> like autoconf and don't have sys/types.h then you might try limits.h
> which is available in every ANSI C compatible environment.
>
pretty much all modern computers are x86 or x86_64, and run windows. some
people run linux, a few others run mac (on PPC, but PPC is now being
replaced by x86 here as well...).
this leaves game consoles using PPC, but anyone developing on a console is
likely able to figure how to deal with type size issues.
thus my thought is: this problem is a minor annoyance, if that.
> no, no - keep your endian-handling ;-)
>
yeah.
>
> No. It simply doesn't finish extraction while eating 98% of the CPU
> time. So there seems to be an endless loop somewhere ...
>
yeah, there are several places where this could occure.
went and looked, it was actually an obvious error.
fix is in the function:
ZPACK_ExtractFiles_R, line 190 or so.
loop there should be something like:
while(1)
{
i=ZPACK_Read(zfd, buf, 1024);
if(!i)break;
fwrite(buf, 1, i, fd);
}
but instead involved a broken loop (looking for the eof of the output file,
and reading from both files). this was because the recent extract function
was based on my export function (zpacksh.c, ZPSH_Export_R, begins on line
511 or so), which was seemingly broke in the same way.
this points out how much this was tested...
note:
the recent extract fuction was actually a replacement for the original
extract function, which had dealt with the archive internals directly rather
than using the readdir interface (as such, it was likely sensitive to
possible changes to the internals of the api).
apparently I hadn't tested either the newer export or the extract
functions...
> $ rm *.exe
> $ sed -i -e "s/ulong/ullong/g" *
> Yes, I'm sure. ;-)
>
ok.
>
> Hey, don't just write some sentences here in comp.compression! Write a
> README or a proper documentation or an html page. Don't expect people
> to intensively test your tool and read its source code to understand
> how it works and what it supports or not.
>
well, the point is, the tool is not the final form anyways. rather, it the
format will be used with my main project (and probably "never see the light
of day" after that).
then again, can do probably...
the reasons I am writing the code as I am (largely standalone) is that there
could be uses for it outside my main project (several things have been this
way). standalone code is a good alternative for when entirely self-contained
code is problematic...
as a result, any such documentation is going to focus some on the c-side
api, more so than just the tool itself.
> It wouldn't surprise me to see a 128 bit long long type, especially
> since 64-bit longs are becoming usual. Try to avoid special targets and
> try to keep as portable as possible. [You know: "the Linux kernel is
> not written in C anymore - it is written in GCC".]
>
ok, that makes sense. someone could then, prompted by the error message, go
an change the header...
> OK. But many users DO care because they want to use software on _their_
> architecture with _their_ chosen environment and _their_ preferred
> compiler!
>
but, there are only a few architectures one needs to care about :
x86, x86_64, and ppc.
and only a few os'es:
windows, linux, and possibly mac.
someone off using much else is probably long used to stuff not working
anyways...
> Christian
>
| |
| cr88192 2006-05-11, 6:55 pm |
|
"Fulcrum" <werner.bergmans@gmail.com> wrote in message
news:1147371175.130314.233390@j73g2000cwa.googlegroups.com...
>
> This version crashes on compressing the a10.jpg, mso97.dll and
> flashmx.pdf files found on http://www.maximumcompression.com
>
ok, this is helpful.
I can't find where to download these files, but looking at the list
description, the program crashing makes sense. most are larger, and thus
likely to incure use of "fragmented" mode (which is currently hackishly
implemented in the lib, so I wouldn't be too surprised if major bugs
remain), along with a lot of the caching mechanisms not yet being well
tested either (my test sets usually not being large enough to really incure
much caching activity).
my testing thus far had mostly focused on sets of smaller files (typically a
few hundred bytes to a few kB each, vs larger ones, eg, 100's of kB and
up...).
so yeah, I may need test on larger sets as well...
> Regards,
> Werner
>
| |
| Brian Raiter 2006-05-11, 9:55 pm |
| > hell, if I wanted I could even go and assume little-endian and the
> typical padding rules (assuming that even something as rare as PPC
> wont be used, x86 being the one true architecture), instead however,
> I opted to make it padding-rules resistent, and also to deal with
> endianess issues.
Wise move. x86 is about the only major processor that is purely
little-endian. Nearly everyone else is big-endian or configurable. And
padding rules are not set in stone but differ from compiler to
compiler (and even invocation to invocation).
There's something to be said for ignoring portability in favor of
getting work done. But you might be surprised how little effort it
really takes, most of the time, to write portable code. And it's
nearly always easier to write code portably to begin with than it is
to go back and add portability to existing code.
b
| |
| cr88192 2006-05-11, 9:55 pm |
|
"Brian Raiter" <breadbox@muppetlabs.com> wrote in message
news:e40o2c$uii$1@cascadia.drizzle.com...
>
> Wise move. x86 is about the only major processor that is purely
> little-endian. Nearly everyone else is big-endian or configurable. And
> padding rules are not set in stone but differ from compiler to
> compiler (and even invocation to invocation).
>
yes.
however, although x86 is in a minority wrt arch's using little endian, it is
by far in the majority in terms of systems using it (enter a public
building, what does one see? well, a whole lot of windows boxes, and
sometimes a few macs off by themselves...).
as for padding, personally all I have ever really seen is the packed style
and the "align everything on to its natural size" style (others are
possible, eg, sorting fields by element size, ..., but I have not seen them
used in practice). usually if one takes these into account when laying out
structs, there is unlikely to be much difference between compilers.
for extra safety (usually when reading arrays of structs from disk or
similar) I usually define the fields in terms of byte arrays.
> There's something to be said for ignoring portability in favor of
> getting work done. But you might be surprised how little effort it
> really takes, most of the time, to write portable code. And it's
> nearly always easier to write code portably to begin with than it is
> to go back and add portability to existing code.
>
yes, ok.
but for a project of only a few kloc, is is hardly a big deal (if this were,
say, likely to turn out being a 25 or 50kloc effort, then I would put quite
a bit more thought into portability, partly because any large-scale changes
would be quite painful).
oh well, at least I have defeated some major bugs, partly now by testing
against a slightly larger dataset (about 80MB of assorted files...).
fragment support currently doesn't work right in this case, but things work
ok if fragmentation is disabled...
| |
|
|
| cr88192 2006-05-13, 7:55 am |
|
"Fulcrum" <werner.bergmans@gmail.com> wrote in message
news:1147512831.989120.67390@y43g2000cwc.googlegroups.com...
> files can be found here: www.maximumcompression.com/data/files/
>
ok.
for now (when I had some time at least), I ended up testing using the
contents of the Gimp tarball (about 80MB).
this allowed me to fix a number of bugs, but I suspect somewhere (most
likely in the caching code somewhere) there is a bug that is causing memory
corruption (it was most notable because it was currupting the memory used
for the avl trees used in free space management...).
some features were also disabled for debugging reasons (in particular file
fragmentation).
I may soon release a version incorporating a lot of changes people here have
suggested (renaming types, further documentation work, additional debugging,
....).
of these, debugging is imo important, as for my uses stability is needed,
and it is just lame if the thing suffers from memory corruption problems.
or such...
| |
| Brian Raiter 2006-05-13, 6:55 pm |
| > but for a project of only a few kloc, is is hardly a big deal (if
> this were, say, likely to turn out being a 25 or 50kloc effort, then
> I would put quite a bit more thought into portability, partly
> because any large-scale changes would be quite painful).
No, it's not a big deal. But it's a good habit to fall into, and once
you've done so it becomes second nature. And a good place to start is
with small projects. It's like quitting smoking. It's hard work mostly
for the first few w s or months, but you reap the benefits for years
afterwards.
b
| |
| cr88192 2006-05-13, 9:55 pm |
|
"Brian Raiter" <breadbox@muppetlabs.com> wrote in message
news:e45ed9$377$1@cascadia.drizzle.com...
>
> No, it's not a big deal. But it's a good habit to fall into, and once
> you've done so it becomes second nature. And a good place to start is
> with small projects. It's like quitting smoking. It's hard work mostly
> for the first few w s or months, but you reap the benefits for years
> afterwards.
>
in my case, I usually loosen restrictions for small projects.
note, for example, that instead of compiling to seperate object files, I was
just including all the files from a single set of files.
some of my original projects were done this way, but I had eventually
stopped as problems had become apparent that at the time I was unsure how to
resolve (mostly interdependency issues and similar, many of which could be
resolved easily enough by using the preprocessor or by putting some things
in headers).
often I do things I would not find acceptable in larger projects (little or
no adherence to naming conventions, lots of use of globals, direct interface
to external apis, ...), which a lot of times requires largely rewriting
things during integration.
as for my main codebase it is "portable enough", then again, I still have
not tested whether it still works ok if built on a 64 bit os (had started
messing with linux on x86_64 before, but didn't get around to testing how
well my codebase holds up...).
part of it was that, at the time, things were crashy. eventually I noted
that the problem was my hard drives:
I have 6 of them, all stacked up;
on windows, I generally have them set to spin down after a while (would like
it more if I could control which drives spin down, as for varying reason I
would like it if I could keep my "primary" 2 drives always spun up);
on linux, they all stay spinning the whole time.
the result was that they would somewhat overheat and take out the general
stability of my computer. all this was somewhat resolved via use of fans and
box-tape (for creating a setup where air is blown between the drives, thus
dissipating heat...).
groan:
after searching for it, it turns out what I had thought was a memory
corruption bug was actually just a normal bug (related to the spans-tree
buffer size) annoyingly, in the same general area of code I was originally
looking (spans-tree node allocation...).
it turns out, there was a minor problem in that, after expanding the buffer,
a single line would modify the variable holding the buffer size (in error)
to it's previous value (thus making it think the buffer had not been
expanded to begin with).
I also modified my deflate code such that, likely, it will not call
malloc/free as often (before, some memory would be allocated for each block,
used while compressing, and then freed). now, said buffers will be reused so
long as they are sufficiently large (in general, it seems, compression
performance has increased, but I am not entirely sure...).
note that because my code creates a new block at regular intervals
(currently 64kB, originally figured based on testing speed and compression
with varying block sizes, 64kB generally doing "best") the upper size of
these buffers is effectively bounded (to about 128kB, due to the fact that,
in general, I assume a 2:1 possible worst case). note that typically the
sliding window is preserved between blocks, so more so this just resets the
huffman tables, emits new headers, and similar...
actually, as a thought, given how the lz77 pre-step is handled (min run is 3
bytes, and runs cost 3 bytes, literal/run indicator held in a seperate
buffer, and the max number of literals/runs being less than or equal to the
input buffer size) actual expansion is not possible anyways, so I can get by
with a 64kB buffer (and a smaller 8kB buffer holding the literal/run
flags...).
if I wanted, I could just move the buffers to bss and not call malloc at
all...
or such...
| |
| Jasen Betts 2006-05-16, 3:55 am |
| X-Face: ?)Aw4rXwN5u0~$nqKj`xPz>xHCwgi^q+^?Ri*+R(&uv2=E1Q0Zk(>h!~o2ID@6{uf8s;a+M[5[U[QT7xFN%^gR"=tuJw%TXXR'Fp~W;(T"1(739R%m0Yyyv*gkGoPA.$b,D.w:z+<'"=-lVT?6{T?=R^:W5g|E2#EhjKCa+nt":4b}dU7GYB*HBxn&Td$@f%.kl^:7X8rQWd[NTc"P"u6nkisze/Q;8"9Z{peQF,w)7UjV$c|RO/mQW
/NMgWfr5*$-Z%u46"/00mx-,\R'fLPe.)^
User-Agent: slrn/0.9.8.1 (Debian)
Message-ID: <3dfe.44687784.d5e39@clunker.homenet>
Date: Mon, 15 May 2006 12:43:48 -0000
Bytes: 490
X-Original-NNTP-Posting-Host: news.compass.net.nz
X-Original-Trace: 16 May 2006 20:13:03 +1200, news.compass.net.nz
Organization: CLEAR Net New Zealand http://www.clear.net.nz - Complaints abuse@clear.net.nz
Lines: 12
NNTP-Posting-Host: 203.97.37.6
X-Trace: sv3-siAWWhL3d04/k4/HTg/XuqbZEwJhAefNwbV1hAHeKjYw1UWS/ yvPoQjA6gsZM9WabjlBo2zoui9i8Ec!K+VKuONC0
jUP4b9ZRP2gBGDCalNZDhV8AQZyNACpLhiwLKXLu
v3m6QPx2TQxG/vLlOYeosK6wA==
X-Complaints-To: Complaints to abuse@clear.net.nz
X-DMCA-Complaints-To: Complaints to abuse@clear.net.nz
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.32
Xref: number1.nntp.dca.giganews.com comp.compression:69451
On 2006-05-13, cr88192 <cr88192@NOSPAM.hotmail.com> wrote:
> on windows, I generally have them set to spin down after a while (would like
> it more if I could control which drives spin down, as for varying reason I
> would like it if I could keep my "primary" 2 drives always spun up);
> on linux, they all stay spinning the whole time.
a linux prog called hdparm can tune the spin-down time of each disk
individually. (among other hard drive related things)
Bye.
Jasen
| |
| Stefan Monnier 2006-05-16, 6:56 pm |
| > my code wont work on 16 bit machines anyways, and I don't really care much
> about oddball crap...
Actually, compresion code is one of those things that are actually used on
those "oddball crap" machines such as wireless routers and stuff.
They're not necessarily that oddball, admittedly, but they rarely use the
x86 architecture.
Stef
| |
| cr88192 2006-05-16, 6:56 pm |
|
"Jasen Betts" <jasen@free.net.nz> wrote in message
news:3dfe.44687784.d5e39@clunker.homenet...
> On 2006-05-13, cr88192 <cr88192@NOSPAM.hotmail.com> wrote:
>
>
> a linux prog called hdparm can tune the spin-down time of each disk
> individually. (among other hard drive related things)
>
ok.
personally I hadn't realized that they could be spun down...
(misc: I may be nearing the point of doing another update, but a lot of
other things have been going on recently so I am not sure when exactly...).
> Bye.
> Jasen
| |
| cr88192 2006-05-16, 6:56 pm |
|
"Stefan Monnier" <monnier@iro.umontreal.ca> wrote in message
news:jwviro5u709.fsf-monnier+comp.compression@gnu.org...
>
> Actually, compresion code is one of those things that are actually used on
> those "oddball crap" machines such as wireless routers and stuff.
> They're not necessarily that oddball, admittedly, but they rarely use the
> x86 architecture.
>
ok, maybe.
about the only real use I can imaging right now for my code on said
architectures would likely be, eg, as a filesystem replacement (assuming
they don't have much better already).
the main limitation for "tiny" or 16 bit architectures is currently my
memory management. eg: each file, when in cache, is kept in a single buffer.
likewise, a lot of other similar buffers are used.
in the case of a 16 bit-only architecture (like the 8088, or 80286), my code
will not play well with segmented addressing (really, if needed, I would
think it would make more sense to write a custom version for those archs).
so, an arch with flat addressing of some sort is needed (32 or 64 bit x86,
m68k, ...). no big deal...
likewise, sufficient memory would be needed (as set up right now, a minimum
of about 32MB/mount makes sense).
and, even on x86, there are some bugs I have yet to resolve. in some cases,
I am still getting noticable damage to the spans tree, but as of yet have
not been able to locate why. a lot has to do with how and when I resize the
tree's spans-node table, and behaves like a memory corruption bug. a
problem, however, is that it will still occure in cases where, basically,
nothing else is going on (suggesting an internal bug). in particular, one
would suspect the code to allocate nodes/resize the table, however, I have
looked at that a lot already...
other possible suspicions include that, possibly, cygwin's malloc
implementation has bugs, but why then would it be acting up primarily in
this case (excepting maybe the occurance of performing mallocs from within
quite possibly 20 or so levels of recursive function calls).
otherwise, bizarre malloc behavior is nothing new in my experience...
actually, I think I remember that at one point on linux I would experience a
lot of buggy behavior from malloc (I forget if this was before or after the
1.0 kernel, I suspect this was prior to inclusion of X11 in any case). in
more modern cases, my experience has been that usually malloc on linux will
notice various problems with memory access and throw signals, but this
doesn't necissarily occure on cygwin (where it may get messed up, and start
doing strange things it seems...).
at the time, in response, I had ended up doing my own allocator that I would
use in place of malloc. similar could be possible in this case (more so, I
could keep a more strict limit on heap size, at the cost of using more...).
then again, I am not sure if it is justified in this case. could maybe write
one and make it preprocessor-controlled as to whether or not it is actually
used...
for now, it is patched over by making the table overly large (it should take
storing more than about 32000 files at present before the bug shows up).
>
> Stef
|
|
|
|
|