Home > Archive > Compression > April 2005 > Re: "HTAR" archive format idea
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: "HTAR" archive format idea
|
|
| Claudio Grondi 2005-03-30, 8:55 pm |
| How does it compare to the format used by 7zip?
Claudio
"cr88192" <cr88192@NOSPAM.hotmail.com> schrieb im Newsbeitrag
news:3KA2e.188$HA6.18@fe07.usenetserver.com...
> I beat together some an idea for an archive format.
> I have yet to get to writing an archiver for it, and it may change, but I
> figured I would post what I have come up with.
>
> note: in a lot of ways this is likely similar to http, but I have varied
it
> in numerous subtle ways, and there is the whole issue that it is not http
so
> I don't need to be bound that much to the spec anyways...
>
> but, anyways, if anyone feels like commenting on the general idea that
would
> be nice...
>
> ---
> Simplistic vaguely HTTP-like archive format.
>
> Considering extension 'HTAR'.
>
> Structure will consist of a number of headers, interspersed with
"content".
>
> Each header will take the form of a number of key/value pairs.
>
> Each pair will have the syntax:
> <key> ': ' <value>
>
> 1 space is to be present after the colon, any more will be interpreted as
> part of the value.
>
> eg:
> File-Name: foobar.txt
>
> Each value may be continued over multiple lines by having each subsequent
> line indented by 1 tab. The tab is not included, and no extra characters
are
> to be inserted.
>
> File-Name: foo
> bar.txt
>
> Each line should be limited to 80 characters.
> Either a single newline or a carriage-return newline pair is allowed as a
> line seperator (hmm, probably I guess CRLF is preferred, but either should
> be accepted).
>
> Each header is terminated by a single blank line.
>
> Content will be defined as some amount of data directly following the
> header, with both it's presence and size given within the header.
> In particular, 'Content-Length' will indicate the presence and size of the
> content.
>
> Any blank lines within the inter-header space are to be ignored.
>
>
> Values
>
> Within values, C style escapes are to be used if needed, eg: \\ \t \n \r
....
> Numbers will be represented either in decimal or hex (C-style 0x
> convention).
> Commas are to be used as the general seperator.
>
> Times will have the format:
> YYYY-MM-DD hh:mm:ss [TZ]
>
> 'hh' is a 24 hour clock ranging from 00 to 23.
>
> Where TZ is +hhmm, -hhmm, or some timezone nmonic (eg: GMT).
> If Ommited, TZ should be interpreted as local time.
>
> Example:
> 2005-03-31 02:11:20 +1000
> 2005-03-31 01:11:20 +0900
>
>
> General Fields
>
> Header-Type: <typename>
> The type of a particular header:
> File A single file;
> FileGroup A group of files (packed end to end and encoded together);
> Directory A directory.
>
> Any unknown header types should probably be ignored.
>
> File-Name: <filename> (',' <filename> )*
> The name of one or more files.
> File-Size: <size> (',' <size> )*
> Uncompressed size of one or more files.
> File-ATime: <time> (',' <time> )*
> File-MTime: <time> (',' <time> )*
> File-CTime: <time> (',' <time> )*
> Optional: file access, modification, and creation times.
>
> Other OS-specific fields could be included here, eg:
> File-Linux-Type: <mode>
> File-Linux-Mode: <mode>
> File-Linux-Dev-Major: <number>
> File-Linux-Dev-Minor: <number>
> File-Linux-UID: <uid>
> File-Linux-GID: <gid>
> ...
>
> Or even implementation-specific fields:
> File-libfoo-bar: <string>
> File-libfoo-baz: <number>
> ...
>
> Content-Encoding: <algoname>
> Algorithm used for encoding the content.
> Content-Length: <size>
> Size of the header's content in the archive.
> Content-Type: <typename>
> Mime type of content (optional and likely irrelevant).
>
>
>
| |
| cr88192 2005-03-31, 3:56 am |
|
"Claudio Grondi" <claudio.grondi@freenet.de> wrote in message
news:3b0ggpF69bvisU1@individual.net...
> How does it compare to the format used by 7zip?
>
well, first off, it is completely different...
7zip uses a binary tv style format apparently consisting of byte prefixes
followed by prefix specific data, and encodes larger numbers with a vli
scheme.
imo, it is almost an exact opposite:
mine mostly text, 7z binary;
mine open tag/value structure, 7z has a fairly fixed structure;
mine should be easy to extend independantly, 7z will likely require
centralized activity;
mine has minimal concern for format overhead, 7z has massive concern for
overhead (eg: 7z uses individual bytes and bitpacking often, wheras mine
represents numbers in plaintext...);
....
my main reasoning is primarily that the headers are likely to be smaller
than the files anyways, so a little bloat is probably no big deal.
the file is likely to be read/written in binary mode anyways, so things like
s ing are expected (including, eg, possibly space padding numbers so it is
possible to s back to them and fill them in later or such).
otherwise, I might not want writer s ing, so chunking may make sense
instead. I guess it would depend on the writer.
may as well keep the footer (defined as another header).
dunno, could put cumulative compressed crc's there or something.
Example of which would be, eg, allowing:
Content-Type: chunked
12
Hello All,
13
The Next Part
Content-Adler32: *
or whatever...
| |
| Vicente Werner 2005-03-31, 3:55 pm |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:gUH2e.218$HA6.71@fe07.usenetserver.com:
> my main reasoning is primarily that the headers are likely to be
> smaller than the files anyways, so a little bloat is probably no big
> deal.
Actually I disagree with your reasoning, as you're adding aditional bloat
to the code to manage the archive, is much easier to work with a binary
file with a fixed structure or a dynamic one, than with a text one, since
you need to parse the headers back into a format useable from your program.
If you're looking for a chunkable format, take a look at the old amiga IFF
format or the PNG one as examples.
See ya
| |
| cr88192 2005-04-01, 3:55 am |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns962ACE7A12A11notasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:gUH2e.218$HA6.71@fe07.usenetserver.com:
>
>
> Actually I disagree with your reasoning, as you're adding aditional bloat
> to the code to manage the archive, is much easier to work with a binary
> file with a fixed structure or a dynamic one, than with a text one, since
> you need to parse the headers back into a format useable from your
> program.
>
yes, I know, it is more awkward, but it should be far more extensible
without risking breaking existing tools...
I am now considering adding simplistic compression to reduce the header
bloat, at the cost of a lot more code bloat.
anyways, I was never saying text was a "convinient" way of doing the
archives, only that the structure should be tolerable, and is in most ways
almost exactly the opposite of the 7z format...
a more persuasive argument would have been that the headers were
unreasonably large, which I might have dealt better with, this is partly why
I am considering the compression (at the cost of losing the format being
mostly textual...).
this should not signifigantly hurt extensibility.
values <128: passed through clean.
values >128: interpreted as lz values
128 (run 0): interpreted as an escape, 1 byte escaped value
129 (run 1): reserved, possibly length-prefixed escape
130 (run 2): reserved
131..255: sane run values
next 2 bytes are offset
(note: format most likely to do poorly on binary data given mass
escaping...).
I have put a little bit of thought into how to try for a fast stream-centric
decoder. buffer would be circular with 2 pointers (current read position and
current end of decoded data).
wrapping is a difficult issue performance-wise.
encoder, should be able to get a reasonably fast one.
for a 64k dictionary, I am imagining needing approx 320kB of memory (256k of
which would be related to hashing). I put a little thought into figuring how
to do wrap-around with the hash data and efficiently handling the wraparound
issues.
slower than doing nothing, but might be acceptable.
note: I am now thinking this approach may be relevant to xml, at the cost of
being slower than a true binary xml. given I am considering an algo which
does not use entropy coding encoding/decoding should be faster than with one
that does (and faster than my other decoders, since it will be possible to
detect runs with a single mask and the run structure is fixed).
a plain linear decoder would likely be faster than the ring-based one I am
imagining, but a ring-based one reduces memory-related concerns, and should
be better to handle the mixing of encoded and non-encoded data...
not like I need that much speed in accessing the headers anyways though...
> If you're looking for a chunkable format, take a look at the old amiga IFF
> format or the PNG one as examples.
>
I looked at those, but decided against them on the grounds that they make
extensibility more complicated (yes, IFF and PNG have generally extensible
formats, but imo, there are likely a lot more issues than there would be,
eg, with something closer to http...).
or such...
| |
| Niels Fröhling 2005-04-01, 3:55 pm |
| Hy;
You should take a look at a DICOM implementation
if you want to know how to handle proprietary
Archive-information fitting market needs.
DICOMs amount of Archive-information is horrible
much. Maybe you can detect the heaviest flaws and
find better ways.
Ciao
Niels
| |
| Vicente Werner 2005-04-01, 3:55 pm |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:Qm13e.15$A71.1@fe07.usenetserver.com:
> yes, I know, it is more awkward, but it should be far more extensible
> without risking breaking existing tools...
A properly binary format does also the same, take a look at IFF for
example, or PNG.
> anyways, I was never saying text was a "convinient" way of doing the
> archives, only that the structure should be tolerable, and is in most
> ways almost exactly the opposite of the 7z format...
I don't think it's going to be tolerable, at the end the chunk of code
needed to deal with just the headers will be HUGE regarding the size of
it.
> a more persuasive argument would have been that the headers were
> unreasonably large, which I might have dealt better with, this is
> partly why I am considering the compression (at the cost of losing the
> format being mostly textual...).
The overhead I'm worried about is not that one, it's the one at the code.
> I looked at those, but decided against them on the grounds that they
> make extensibility more complicated (yes, IFF and PNG have generally
> extensible formats, but imo, there are likely a lot more issues than
> there would be, eg, with something closer to http...).
Of course there're always a limit on how much you can expand a format or
how much you can do with it, no matter how you design it, at the end
there's allways a point where something needs to be changed and break
compatibility to do it.
For example I do not think your system will be realistic to deal with
delete operations on very large archives, with 100000+ of compressed
items, or adding error recovery records intra item will likely impose
heavy overheads.
| |
| cr88192 2005-04-01, 8:55 pm |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns962BBDB2AC15Enotasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:Qm13e.15$A71.1@fe07.usenetserver.com:
>
>
> A properly binary format does also the same, take a look at IFF for
> example, or PNG.
>
I know both formats.
they are extensible, but one has to worry a little more about behavior by
tools upon encounter of unknown chunks (png specifies this a little more
than iff does), one also has to worry more about fourcc clash, wheras with
plaintext one can generate much longer names.
> I don't think it's going to be tolerable, at the end the chunk of code
> needed to deal with just the headers will be HUGE regarding the size of
> it.
>
yes, I know...
> The overhead I'm worried about is not that one, it's the one at the code.
>
ok.
I wrote a basic parser/dumper allready, and it would not be too hard to
modify it into a decompressor.
at present, vars are parsed and stuffed into locals.
mostly I am thinking of having a struct which would hold all the known
parsed vars, and dispatching the using the struct (header-type and whatever)
to perform the decode.
>
> Of course there're always a limit on how much you can expand a format or
> how much you can do with it, no matter how you design it, at the end
> there's allways a point where something needs to be changed and break
> compatibility to do it.
>
yes, I know as well, just afaik, with an IFF or PNG style format, this
threashold is likely to be a little lower.
of course, one could use plaintext for the file-info, but then again, same
problem.
I started designing a format like this allready, and had realized that
compound entries would be a signifigant design issue with such a format, but
not so big of a deal with text.
> For example I do not think your system will be realistic to deal with
> delete operations on very large archives, with 100000+ of compressed
> items, or adding error recovery records intra item will likely impose
> heavy overheads.
>
nope, it wont probably...
I aim low up front, but I still hope for a flexible format (eg: one that can
possibly be easily customized for "experimental" uses or whatever), or
allowing patching in 3rd party tools (eg: want bzip2 support, just add an
entry in the config file, and hope the person decompressing did similar).
as a result, likely I am going to be calling external tools for compression
and decompression.
most more generic archive use though consists of just archiving a directory
or unpacking files into a directory.
it is likely to beat out tar though, as it will be possible to read file
lists without decompressing the whole file.
it will also be a lot more extensible than either tar or zip.
or such...
| |
| Niels Fröhling 2005-04-06, 12:40 pm |
| Hy;
> err, somehow I get the idea this is not a file archiver...
The general solution you try reach with your open text
attributes is not (only) specific to an archiver.
In your concept you tag files, and DICOM graphic files
may give you a good idea about that.
Ciao
Niels
| |
| cr88192 2005-04-06, 12:40 pm |
|
"Niels Fröhling" <niels.froehling@seies.de> wrote in message
news:d2mnbo$alt$1@domitilla.aioe.org...
> Hy;
>
>
> The general solution you try reach with your open text
> attributes is not (only) specific to an archiver.
> In your concept you tag files, and DICOM graphic files
> may give you a good idea about that.
>
oh, ok then.
| |
| Vicente Werner 2005-04-06, 12:40 pm |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:Sgl3e.3062$A71.493@fe07.usenetserver.com:
> they are extensible, but one has to worry a little more about behavior
> by tools upon encounter of unknown chunks (png specifies this a little
> more than iff does), one also has to worry more about fourcc clash,
> wheras with plaintext one can generate much longer names.
The first argument is not a fault of the format, it's a failure of the
applications dealing with them, as for the second, if you do use 4 bytes,
you've 2^32 posibilities... hard to belive you'll get into clashes.
> yes, I know as well, just afaik, with an IFF or PNG style format, this
> threashold is likely to be a little lower.
You still haven't show a point backing that argument.
> I started designing a format like this allready, and had realized that
> compound entries would be a signifigant design issue with such a
> format, but not so big of a deal with text.
Why?
> I aim low up front, but I still hope for a flexible format (eg: one
> that can possibly be easily customized for "experimental" uses or
> whatever), or allowing patching in 3rd party tools (eg: want bzip2
> support, just add an entry in the config file, and hope the person
> decompressing did similar).
Adding human intervention will only make your system less usable
> it is likely to beat out tar though, as it will be possible to read
> file lists without decompressing the whole file.
Please don't compare apples with oranges, they're different ! Tar was
designed long time ago as just an archival format, they didn't even think
of file by file compression, so it's not a valid reference, nor benchmark.
A valid reference will be any of the current fileformats without
compression.
| |
| cr88192 2005-04-06, 12:40 pm |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns962EA17A16D94notasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:Sgl3e.3062$A71.493@fe07.usenetserver.com:
> The first argument is not a fault of the format, it's a failure of the
> applications dealing with them, as for the second, if you do use 4 bytes,
> you've 2^32 posibilities... hard to belive you'll get into clashes.
>
yes, but remember, fourcc's are generated by humans, and humans tend to like
nmonics, especially those most closely resembling the intent of the tag.
so, if you have 2 people independently adding a certain feature, there is
likely to be a clash...
this is less so with strings only because you can motivate people to
generate longer ones and insert some "uniqueness" (eg: an implementation or
os name) into the tag.
guids are a typical solution to the clash problem, but I would rather have a
string than a guid (even though the guid is far liklier to be unique...).
imo, strings are far closer to self documenting than either fourccs or
guids.
> You still haven't show a point backing that argument.
>
I don't have a clear point, more it is just my "experience" telling me this.
I have seen what typically happens to both older textual and binary formats.
textual formats tend more to just "mutate". all their extra tokens may allow
some backwards compatibility, but the format typically remains at a stable
level of complixity and just drifts ever further from the original (and may
break compatibility in minor ways over time).
binary formats tend to just be kludged over, and increase in complexity over
time, with often much older binary formats being quite ugly and scary
looking.
in the same way as most other binary formats, iff and png based formats tend
to get kludged over, though less so than many others.
eg, wave and avi are still tolerable, as opposed to turning into something
more like zip...
likewise, the more verbose a text format is, the more it seems to drift
rather than just being kludged. every so often someone changes some tokens
and makes the data a little different, and the mutation continues.
this is by no means a rule though.
another important difference:
iff and png tend to favor more tree-like data layouts, wheras an http-like
format is a lot closer to being flat.
tree vs. flat list is another issue I have seen come up repeatedly in file
formats, with flat-list formats often being more "general" than tree-based
ones. of course, more importantly is that typically, because of data
semantics, they are not interchangable.
in particular, the "globs of key/value pairs with possibly attached data"
design I have seen come up fairly often (http and smtp are just a few
examples, another biggie is the quake-style 'map' format, which has been
used in a wide variety of games in different variations...).
> Why?
>
well, consider if each file needs an info header of some sort.
a single file, a single data and header chunk, no big deal.
compound file, still a single data chunk, but likely multiple headers.
one can seperate these, but it is a little uglier.
in text, one can use a different tag/"header type", and just define the name
and size fields to be comma seperated lists. binary formats tend to lack a
similar notion. one typically lacks convinient concepts like "comma" and
"linebreak" to ease processing, instead one is just left with atomic and
list values, and the fact that the difference tends to be important (wheras
a comma seperated list can be treated more as a special case of a single
item).
> Adding human intervention will only make your system less usable
>
maybe.
it depends on how one defines "usable".
> Please don't compare apples with oranges, they're different ! Tar was
> designed long time ago as just an archival format, they didn't even think
> of file by file compression, so it's not a valid reference, nor benchmark.
> A valid reference will be any of the current fileformats without
> compression.
dunno, imo, all general archive formats are fair game as they are within a
similar domain (pack files together and add compression of some sort).
tar just lacks internal compression is all, which is the default for most
other formats.
conceivably, a version of tar could be made with internal compression, but I
don't know if anyone has. in any case there is still the issue that tar
tends to have a lot of empty space in its headers.
but it doesn't matter really imo.
I have allready lost interest some, I wrote a util but now I am off working
on a rather different line of projects (eg: a geometry and physics api).
| |
| Vicente Werner 2005-04-06, 12:40 pm |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:zJc4e.50$Kr2.15@fe07.usenetserver.com:
> yes, but remember, fourcc's are generated by humans, and humans tend
> to like nmonics, especially those most closely resembling the intent
> of the tag.
Ok, make room for 128 char strings, and you'll get the same risk of
collision than with your text approach
> I don't have a clear point, more it is just my "experience" telling me
> this.
Mine says you'll be likely to encouter the same issues, many text formats
change radically too.
> in the same way as most other binary formats, iff and png based
> formats tend to get kludged over, though less so than many others.
> eg, wave and avi are still tolerable, as opposed to turning into
> something more like zip...
Why you insist on comparing apples with potatoes? Wave is a sound format,
designed very specifically to hold certain data, avi is an audio vid
container, not an archival format! please if you wanna show your point,
try to compare same kind of formats, (rar vs zip vs tar... for example)
> likewise, the more verbose a text format is, the more it seems to
> drift rather than just being kludged. every so often someone changes
> some tokens and makes the data a little different, and the mutation
> continues.
And the complexity increases...
> another important difference:
> iff and png tend to favor more tree-like data layouts, wheras an
> http-like format is a lot closer to being flat.
Of course, because they're formats for storing 1 file with additional
data, they're good as models to design chunk based flat structrures.
> in particular, the "globs of key/value pairs with possibly attached
> data" design I have seen come up fairly often (http and smtp are just
> a few examples, another biggie is the quake-style 'map' format, which
> has been used in a wide variety of games in different variations...).
WAD ones?
> compound file, still a single data chunk, but likely multiple headers.
You're sure?
> one can seperate these, but it is a little uglier.
> in text, one can use a different tag/"header type", and just define
> the name and size fields to be comma seperated lists. binary formats
> tend to lack a similar notion. one typically lacks convinient concepts
> like "comma" and "linebreak" to ease processing, instead one is just
> left with atomic and list values, and the fact that the difference
> tends to be important (wheras a comma seperated list can be treated
> more as a special case of a single item).
>
?????????
So, if you're using a binary file, you can't have those chars and treat
them the way you like on a given chunk of data?
> maybe.
> it depends on how one defines "usable".
This is begining to degenerate...
> dunno, imo, all general archive formats are fair game as they are
> within a similar domain (pack files together and add compression of
> some sort). tar just lacks internal compression is all, which is the
> default for most other formats.
So it's ok to compare a ceramic brick with a plastic one, after all
they're all bricks ....
> conceivably, a version of tar could be made with internal compression,
> but I don't know if anyone has. in any case there is still the issue
> that tar tends to have a lot of empty space in its headers.
Of course you're comparing a 20+ year old format
| |
| cr88192 2005-04-06, 12:40 pm |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns962EBDE958ED4notasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:zJc4e.50$Kr2.15@fe07.usenetserver.com:
>
> Ok, make room for 128 char strings, and you'll get the same risk of
> collision than with your text approach
>
ok.
just fixed 128 char strings are likely to risk wasting a lot more space.
> Mine says you'll be likely to encouter the same issues, many text formats
> change radically too.
>
they change, but they don't end up becomming quite so complex usually...
> Why you insist on comparing apples with potatoes? Wave is a sound format,
> designed very specifically to hold certain data, avi is an audio vid
> container, not an archival format! please if you wanna show your point,
> try to compare same kind of formats, (rar vs zip vs tar... for example)
err, all file formats tend to be designed under at least vaguely similar
principles and tend to exhibit similar behaviors and mutation paths.
> And the complexity increases...
> Of course, because they're formats for storing 1 file with additional
> data, they're good as models to design chunk based flat structrures.
>
yes, but traditionally these formats are used for tree structures.
yes, there are avi's with multiple toplevels, but most are more tree-like.
> WAD ones?
>
err, not wad.
wad was used for maps primarily in doom and friends.
by quake, wad had the format changed some (16 char names, internal
compression, type field, tag became 'WAD2', vs 'IWAD' and 'PWAD', ...) but
was relegated primarily to storing textures and tile graphics (this was used
in quake-1 and half-life, generally dying out after that, like the pack
format, dying out in favor of using zip files for game resources in
quake-3).
the map format, however, is textual.
it consists of lots of entities, each with the format as below:
{
"classname" "light"
"origin" "-2112 -1904 -32"
"light" "1024"
"_color" "1 1 1"
"spawnflags" "0"
}
{
"classname" "light"
"origin" "-2112 2160 0"
"light" "1024"
"_color" "1 1 1"
}
{
"classname" "monster_spidbot0"
"origin" "672 0 -96"
"angle" "0"
}
geometries are stored in entiries with attached models.
often the models are either referenced via a key, eg, "model".
sometimes they are inlined, eg, in what are called "brushmodels".
{
"classname" "func_door"
"angle" "-1"
{
( 128 128 -96 ) ( 256 128 -96 ) ( 128 256 -96 ) cs/oldmetal [ 1.00000
0.00000 0.00000 -128.00000 ] [ 0.00000 -1.00000 0.00000 128.00000 ] 0
1.00000 1.00000 0 0 0
( 128 128 0 ) ( 128 256 0 ) ( 256 128 0 ) cs/oldmetal [ 1.00000 0.00000
0.00000 -128.00000 ] [ 0.00000 -1.00000 0.00000 128.00000 ] 0 1.00000
1.00000 0 0 0
( 128 64 -32 ) ( 128 64 96 ) ( 256 64 -32 ) cs/oldmetal [ 1.00000 0.00000
0.00000 -128.00000 ] [ 0.00000 0.00000 -1.00000 -32.00000 ] 0 1.00000
1.00000 0 0 0
( 128 192 -32 ) ( 256 192 -32 ) ( 128 192 96 ) cs/oldmetal [ 1.00000
0.00000 0.00000 -128.00000 ] [ 0.00000 0.00000 -1.00000 -32.00000 ] 0
1.00000 1.00000 0 0 0
( 64 128 -32 ) ( 64 256 -32 ) ( 64 128 96 ) cs/oldmetal [ 0.00000 1.00000
0.00000 -128.00000 ] [ 0.00000 0.00000 -1.00000 -32.00000 ] 0 1.00000
1.00000 0 0 0
( 192 128 -32 ) ( 192 128 96 ) ( 192 256 -32 ) cs/oldmetal [ 0.00000
1.00000 0.00000 -128.00000 ] [ 0.00000 0.00000 -1.00000 -32.00000 ] 0
1.00000 1.00000 0 0 0
}
}
note, this is not the quake1 variety, but a variation of the valve variety
(used in half-life).
these inlined brushmodels are the main thing that has mutated between games,
changing notably wrt syntax.
note: the sets of 3 points are not actually vertices but instead are used
for defining planes, and the resultant figure is a result of the
intersection of all these planes (and a union with all similar constructs in
the same entity).
one is left reading tokens from the file, and keeping note of what variety
one has encountered.
another text based model format used in half-life is smd, but this is not so
much chunk based. instead, it works more by "changing modes" using various
special strings, and parsing whatever follows them until another important
string pops up.
these formats are often not used directly by the game, but are more used by
tools and converted into the game's native format (typically a
single-purpose binary format, often with such names as "mdl" and "bsp"...).
at one point in a modified version of quake1 I had ended up using the wad
format as the basis of the bsp's.
> You're sure?
>
well, that is a straightforwards way to do it, and seems to mix better with
other iff-based formats I have seen.
> ?????????
> So, if you're using a binary file, you can't have those chars and treat
> them the way you like on a given chunk of data?
>
well, it is traditional in binary formats to either use fixed-width fields,
or a variation of TLV structures.
one can used concepts similar to text, but those are "weird" for binary.
> This is begining to degenerate...
>
ok.
> So it's ok to compare a ceramic brick with a plastic one, after all
> they're all bricks ....
>
maybe.
a plastic one might have problems with mortar sticking to it though, and
might squish under too much weight.
>
> Of course you're comparing a 20+ year old format
ok, from a timeframe where octal was in style...
| |
| Vicente Werner 2005-04-06, 12:40 pm |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:6Cl4e.90$Kr2.46@fe07.usenetserver.com:
> just fixed 128 char strings are likely to risk wasting a lot more
> space.
Way less than your approach, and a lot easier to deal with.
> they change, but they don't end up becomming quite so complex
> usually...
Well I've had a lot of experience with mutated text formats, at the end
the mutations make them unuseable and prone to loose data due to
incoherences on the headers.
> err, all file formats tend to be designed under at least vaguely
> similar principles and tend to exhibit similar behaviors and mutation
> paths.
Vaguely similar aren't comparable systems, in order to do a real
comparision or back your argument with a solid basement, you need to
compare not just similar, but functional equivalent formats like ace vs
rar (both support the same feature set), on all other cases your argument
will be biased.
> yes, but traditionally these formats are used for tree structures.
> yes, there are avi's with multiple toplevels, but most are more
> tree-like.
Avi is not tree like, more like a multiple track one, no hierarchy is
availible.
> wad was used for maps primarily in doom and friends.
wad alike formats have been in use since them by many games and systems,
they're just resource archival formats.
> the map format, however, is textual.
>
> it consists of lots of entities, each with the format as below:
> {
> "classname" "light"
> "origin" "-2112 -1904 -32"
> "light" "1024"
> "_color" "1 1 1"
> "spawnflags" "0"
> }
> {
> "classname" "light"
> "origin" "-2112 2160 0"
> "light" "1024"
> "_color" "1 1 1"
> }
> {
> "classname" "monster_spidbot0"
> "origin" "672 0 -96"
> "angle" "0"
> }
They're more like preprocess systems , internally they're parsed to a
binary format and If I recall correctly they're just a step in the
generation of the game files.
Any way you're drifting and drifting from your target funtionality in
order to get examples, they're not relevant to this discusion let's leave
descriptive maps to their field of work, and get back to archival.
> well, it is traditional in binary formats to either use fixed-width
> fields, or a variation of TLV structures.
>
> one can used concepts similar to text, but those are "weird" for
> binary.
Weird? Techniques are to be used where they're useful or required, btw in
that context you're dealing with textual data, so they really fit in.
> maybe.
>
> a plastic one might have problems with mortar sticking to it though,
> and might squish under too much weight.
That's the point, they are designed for different situations and it's not
correct to treat them the same way, and that's what you're doing at the
full lenght of your discussion.
| |
| cr88192 2005-04-06, 12:40 pm |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns962F8FE5058E6notasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:6Cl4e.90$Kr2.46@fe07.usenetserver.com:
>
> Way less than your approach, and a lot easier to deal with.
>
why do you say that exactly, text may waste space, but typically a lot less
than fixed-width strings due to the fact that one usually just needs enough
space to store the string.
>
> Well I've had a lot of experience with mutated text formats, at the end
> the mutations make them unuseable and prone to loose data due to
> incoherences on the headers.
>
maybe, but loose data isn't necessarily always a problem...
> Vaguely similar aren't comparable systems, in order to do a real
> comparision or back your argument with a solid basement, you need to
> compare not just similar, but functional equivalent formats like ace vs
> rar (both support the same feature set), on all other cases your argument
> will be biased.
>
maybe...
> Avi is not tree like, more like a multiple track one, no hierarchy is
> availible.
>
RIFF 'AVI ' {
LIST 'hdrl' {
...
}
LIST 'movi' {
...
}
}
imo, this counts as a tree.
yes, it is not a very branchy tree, but both the RIFF and LIST chunks need
to be worried about, so it is essentially still a tree.
> wad alike formats have been in use since them by many games and systems,
> they're just resource archival formats.
>
yes.
but I thought you were taking about wad in particular, and in the context of
map storage, vs just wad-like formats (eg: pack would be classified here, as
pack is basically like wad with a much bigger name field).
afaik, wad, in the IWAD and PWAD varieties, was used primarily in
doom-related games (doom, doom2, heretic, hexen, ...).
I can't answer for games outside of this realm.
actually, something like pack would make sense as an archival format if one
added some features (eg: internal compression).
normal pack entry:
{
char name[56];
u32 offs; //data offset
u32 size; //data size
}
the fairly trivial would be adding compression:
{
char name[48];
byte flag; //file flags
byte enc; //content encoding
u16 prefix; //length of a "prefix", which could eg, contain file info
byte resv[4]; //some pading or whatever
u32 offs; //data offset
u32 size; //data size
}
but then comes a thought:
the only clear reason that comes up to use pack would be that it is
conviniently supported in quake-related tools, but changing the format would
screw up support (eg: quake related tools could not decompress anything).
I can easily imagine exceeding a 56 byte name length (eg: that normally used
by pack...). I would be more comfortable, eg, with like 224 chars..
this could cost, eg, 256 bytes/file, vs like 64 bytes/file with pack...
I guess it doesn't matter much.
I could algo just go the newer game route, and either not bother packing
files, or use zip...
hmm...
> They're more like preprocess systems , internally they're parsed to a
> binary format and If I recall correctly they're just a step in the
> generation of the game files.
> Any way you're drifting and drifting from your target funtionality in
> order to get examples, they're not relevant to this discusion let's leave
> descriptive maps to their field of work, and get back to archival.
>
dunno.
> Weird? Techniques are to be used where they're useful or required, btw in
> that context you're dealing with textual data, so they really fit in.
>
ok.
> That's the point, they are designed for different situations and it's not
> correct to treat them the same way, and that's what you're doing at the
> full lenght of your discussion.
>
I am not good with either keeping on topic or maintaining a point.
| |
| Vicente Werner 2005-04-06, 12:40 pm |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:nUw4e.2205$Kr2.374@fe07.usenetserver.com:
> why do you say that exactly, text may waste space, but typically a lot
> less than fixed-width strings due to the fact that one usually just
> needs enough space to store the string.
Because with your tags, and the info you provide in your headers you're
likely to end consuming much more space than a fixed string.
> maybe, but loose data isn't necessarily always a problem...
I'll say after 14 some years of profesional programing experience that lost
data is almost certainly always a BIG problem.
> imo, this counts as a tree.
> yes, it is not a very branchy tree, but both the RIFF and LIST chunks
> need to be worried about, so it is essentially still a tree.
Avi is not hierarchical, you have just a barely structured list of streams,
but not a true hierarchy.
> the only clear reason that comes up to use pack would be that it is
> conviniently supported in quake-related tools, but changing the format
> would screw up support (eg: quake related tools could not decompress
> anything).
The reason for pack/wad is that they're easy implementable and fast enough
systems for resource archival.
> I can easily imagine exceeding a 56 byte name length (eg: that
> normally used by pack...). I would be more comfortable, eg, with like
> 224 chars..
Most rsc are named numerically, not vervally, thus the need for fewer
chars.
| |
| cr88192 2005-04-06, 12:40 pm |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns962FABCCFA95Fnotasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:nUw4e.2205$Kr2.374@fe07.usenetserver.com:
>
> Because with your tags, and the info you provide in your headers you're
> likely to end consuming much more space than a fixed string.
>
maybe, though doubtful...
> I'll say after 14 some years of profesional programing experience that
> lost
> data is almost certainly always a BIG problem.
>
I had thought you had said "loose data", as in, the format may vary some but
is sort-of compatible.
as for lost data, I say it depends on the data. some data can be lost and
regenerated later without problems, or was unnecessary to begin with, thus
in that case loss is tolerable...
>
> Avi is not hierarchical, you have just a barely structured list of
> streams,
> but not a true hierarchy.
>
maybe, dunno...
> The reason for pack/wad is that they're easy implementable and fast enough
> systems for resource archival.
>
yes.
> Most rsc are named numerically, not vervally, thus the need for fewer
> chars.
>
pack was used generally for storing directory trees, and as a result had
longer name requirements.
however, quake tended to stick to dos naming limits and shallow heirarchy
depths, but my stuff doesn't necissarily...
| |
| Vicente Werner 2005-04-06, 12:40 pm |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:B%y4e.2216$Kr2.1733@fe07.usenetserver.com:
> maybe, though doubtful...
Come on, your example headers are already past 60 chars! Few mutations and
you'll be over.
> I had thought you had said "loose data", as in, the format may vary
> some but is sort-of compatible.
Even that is a BIG issue, data mistmatched , incorrectly interpreted or
simply lost is always a think you really want to avoid, and only deal with
it when its stricly and absolutely necesary.
> as for lost data, I say it depends on the data. some data can be lost
> and regenerated later without problems, or was unnecessary to begin
> with, thus in that case loss is tolerable...
That's not tolerable, is barely aceptable as a solution, but not something
you really want to do (overhead, extra time)
| |
| cr88192 2005-04-06, 12:40 pm |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns96306E43D6952notasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:B%y4e.2216$Kr2.1733@fe07.usenetserver.com:
>
> Come on, your example headers are already past 60 chars! Few mutations and
> you'll be over.
>
ok.
> Even that is a BIG issue, data mistmatched , incorrectly interpreted or
> simply lost is always a think you really want to avoid, and only deal with
> it when its stricly and absolutely necesary.
>
maybe.
imo, if the data "matters" then loss may not be acceptable, but for basic
crap, eg, the various dates get messed up, or some os-specific semantics get
dropped, then I don't think it is a big deal.
in the simplist sense, for an archive the file is just the name and the
data. so long as those are preserved, then things are at least ok (though it
is often preferable to maintain a little more).
> That's not tolerable, is barely aceptable as a solution, but not something
> you really want to do (overhead, extra time)
imo, it is no big deal.
one example I can bring up is the case with present geometric formats, which
in the path between tools may loose damn near everything
(texturemapping/colors lost, or even the faces being merged or split, or
other such transformations). in the path, things may be changed
(retexturing, other transforms, ...).
after 3 or 4 apps the file may only vaguely resemble the original, and no
longer have any strict correspondence with the original, just this case is
common with these formats.
sanely, people would have been able to agree on things by now, but even with
raw triangle meshes there are issues (eg: because someone or another feels
like parsing by token vs. per line).
eg, the 'tri' or 'raw' format, with each line having the form:
x0 y0 z0 x1 y1 z1 x2 y2 z2
one would expect at least this format to be consistent...
at least it is almost vaguely common.
| |
| Vicente Werner 2005-04-06, 12:40 pm |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:R3R4e.4735$Kr2.129@fe07.usenetserver.com:
> imo, if the data "matters" then loss may not be acceptable, but for
> basic crap, eg, the various dates get messed up, or some os-specific
> semantics get dropped, then I don't think it is a big deal.
You haven't had much experience then, because belive me, it's a BIG issue
not to lose data or misinterpret it.
> in the simplist sense, for an archive the file is just the name and
> the data. so long as those are preserved, then things are at least ok
> (though it is often preferable to maintain a little more).
You're overly simplistic with that affirmation. Of course in the
strictiest sense that's true, but, the need to maintain a certain set of
metadata about the file itself and in a correct fashion is very
important: how an archive can know if the file is older, or not, or even
if the unpacking was made correctly?
>
> imo, it is no big deal.
Ok you've a couple of archives with the same archived files on two
different media with the same timestamp (archived files don't have
timestamp, because the tool didn't implement it) they are different
in contents.. how do you differentiate wich is the most current one?
Another example: There's a couple of bad sectors inside your file, so
you've 1024 bytes full of zeroes in a data chunk of a file, how do you
detect it without the original one?
You can say "imo is no big deal" but I think you're absolutely flat
wrong.
> one example I can bring up is the case with present geometric formats,
> which in the path between tools may loose damn near everything
> (texturemapping/colors lost, or even the faces being merged or split,
> or other such transformations). in the path, things may be changed
> (retexturing, other transforms, ...).
Please don't drift away, you'e talking about a very particular set of
problems and conditionants... that certainly dosn't back up your
affirmation.
| |
| cr88192 2005-04-07, 3:55 am |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns96309BD195557notasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:R3R4e.4735$Kr2.129@fe07.usenetserver.com:
>
> You haven't had much experience then, because belive me, it's a BIG issue
> not to lose data or misinterpret it.
>
imo, this comment is a personal attack and not really related to the
argument either...
I know what data I care about in files...
I can tolerate some loss, I am not expecting a world of perfect data
consistency. a bigger deal imo is maintaining at least some compatibility,
where the number of incompatible formats far exceeds the number of vaguely
similar but not quite compatible formats (eg: those that can be processed
with a little glossing).
> You're overly simplistic with that affirmation. Of course in the
> strictiest sense that's true, but, the need to maintain a certain set of
> metadata about the file itself and in a correct fashion is very
> important: how an archive can know if the file is older, or not, or even
> if the unpacking was made correctly?
>
as for file is older: you can't, unless one defines older as !=, in which
case one can start decoding a file and compare it with the one on-disk,
noting any differences.
some of my tools (for generating delta archives and such) work this way.
> Ok you've a couple of archives with the same archived files on two
> different media with the same timestamp (archived files don't have
> timestamp, because the tool didn't implement it) they are different
> in contents.. how do you differentiate wich is the most current one?
>
you don't, except maybe by manual checking...
> Another example: There's a couple of bad sectors inside your file, so
> you've 1024 bytes full of zeroes in a data chunk of a file, how do you
> detect it without the original one?
>
likewise, you don't, however, I like keeping crc's in archives, even if they
are not technically necessary most of the time.
one could just assume that no data will be corrupted. this is typically also
an acceptable solution for many (typically non-archive) formats.
this is also sane on modern disks, given the rarity of bad sectors...
likewise, network packets are crc checked, so damage going over the network
can be also assumed not to occure.
the user could also assume, eg, that if the archive itself is damaged, that
the contents can no longer be trusted.
> You can say "imo is no big deal" but I think you're absolutely flat
> wrong.
>
I think you are being pedantic...
all this is half-assed, but the world is not perfect, and loss of
less-needed information is imo tolerable in most cases.
> Please don't drift away, you'e talking about a very particular set of
> problems and conditionants... that certainly dosn't back up your
> affirmation.
I am just trying to show that not allways is all information necessary, and
in some cases even a crude approximation is sufficient.
this whole conversation has imo gotten fairly uniteresting anyways...
| |
| Vicente Werner 2005-04-07, 8:55 am |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:OV_4e.7033$Kr2.3772@fe07.usenetserver.com:
> imo, this comment is a personal attack and not really related to the
> argument either...
Sating you obviously lack real world experience is an attack?
> as for file is older: you can't, unless one defines older as !=, in
> which case one can start decoding a file and compare it with the one
> on-disk, noting any differences.
>
> some of my tools (for generating delta archives and such) work this
> way.
Of course you can't and that's the point of the example.
> you don't, except maybe by manual checking...
You don't because you lack a reference point.
> likewise, you don't, however, I like keeping crc's in archives, even
> if they are not technically necessary most of the time.
For archival pourposes they're if not you can't guarantee file integrity
and that's one of the points in archival.
> one could just assume that no data will be corrupted. this is
> typically also an acceptable solution for many (typically non-archive)
> formats.
That's absurd to assume for an archival format.
> this is also sane on modern disks, given the rarity of bad sectors...
All HD's have a hidden space used to relocate failing sectors, this is
done transparently to the user (only noticiable by a slowdown in disks
operations) and to the computer hardware, you don't even notice them
until this relocation area is full, and they start appearing on the
surface scans. Until that moment occurs, you're likely to have several
bad sectors and don't even know it.
> likewise, network packets are crc checked, so damage going over the
> network can be also assumed not to occure.
Not applicable,, and in fact data corruption on network environments
occurs.
> the user could also assume, eg, that if the archive itself is damaged,
> that the contents can no longer be trusted.
trust is out of the cuestion, but the problem starts because you might
don't even know the archive was damaged.
> I think you are being pedantic...
Don't like being told that you're wrong? Grow up!
> all this is half-assed, but the world is not perfect, and loss of
> less-needed information is imo tolerable in most cases.
That's your personal opinion and someday you'll learn in a real world
environment that that's not true.
>
> I am just trying to show that not allways is all information
> necessary, and in some cases even a crude approximation is sufficient.
You can't prove anything by bringing scenarios not comparable
| |
| cr88192 2005-04-07, 3:55 pm |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns96315FF08E1EEnotasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:OV_4e.7033$Kr2.3772@fe07.usenetserver.com:
>
> Sating you obviously lack real world experience is an attack?
>
in a sense, yes...
> Of course you can't and that's the point of the example.
>
ok.
> You don't because you lack a reference point.
>
you can't, unless you are human and manually exampining the files and the
files have something in them to give away the relative timeframe.
if one looks at 2 versions of piece of text for example, often one can
predict which is newer and which is older, eg, based on things like
stylistic/content changes, tendancy of text to inflate and become more
regular, ...
> For archival pourposes they're if not you can't guarantee file integrity
> and that's one of the points in archival.
>
errm, afaik the point was more packing the files together for distribution
purposes...
why am I zipping crap otherwise? I could just leave it on my damn hard
drive. distribution seems like a much more common use.
backup is a lesser use, and is typically done with more specialized software
anyways...
an additional (but less common) use is deltas, eg, files that when unpacked
in the right order will generate the status of the content up until the most
recent archive.
>
> That's absurd to assume for an archival format.
>
for what purpose does one expect an archival format?...
clearly it can't be magnetic tapes or backups?... (ok, yeah, this one is
more of a joke).
but, yeah, I think there is some conceptual mismatch here...
> All HD's have a hidden space used to relocate failing sectors, this is
> done transparently to the user (only noticiable by a slowdown in disks
> operations) and to the computer hardware, you don't even notice them
> until this relocation area is full, and they start appearing on the
> surface scans. Until that moment occurs, you're likely to have several
> bad sectors and don't even know it.
>
yes, but they are still quite rare...
I remember back in the 90's having bad sectors regularly, and them getting
progressively rarer until now they are quite infrequent (if a disk starts
getting any bad sectors, it usually implies it is time to change the disk,
as often they will start popping up a lot more frequently after this point).
even this is rare however.
> Not applicable,, and in fact data corruption on network environments
> occurs.
yes, but not that frequently, presumably the corruption has to get through
several levels of checksums (eg: in the ethernet packets, then in the tcp
layer, and possibly in the protocol itself).
if sufficiently rare, it is imo acceptable to assume it doesn't occure.
> trust is out of the cuestion, but the problem starts because you might
> don't even know the archive was damaged.
>
usually disk or transmission level errors are detected.
if not, then risk exists, but imo we can't expect complete consistency
anyways, just things as they are have been doing quite a good job.
everything severely beats my days back when I was using an 8088, when the hd
was not too much more reliable than the 5.25's, and everything crashed
often. it is one's own fault if they expect things to be completely
reliable.
yet, I probably put more trust than I should in things, but my experience
has shown errors to be rare, and imo there is an upper bound to what I
expect and the steps needed to attempt recovery.
> Don't like being told that you're wrong? Grow up!
>
maybe, maybe I should get a job or move out on my own as well, but for now I
am not...
> That's your personal opinion and someday you'll learn in a real world
> environment that that's not true.
people, always expecting certainty and reliability.
how do we know the user wont die before the data becomes damaged, for
example?...
> You can't prove anything by bringing scenarios not comparable
>
yeah, and I tire of this...
| |
| Vicente Werner 2005-04-07, 3:55 pm |
| "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
news:zRb5e.9019$Kr2.3027@fe07.usenetserver.com:
> in a sense, yes...
You're way too sensitive.
> you can't, unless you are human and manually exampining the files and
> the files have something in them to give away the relative timeframe.
Of course that's the point.
> errm, afaik the point was more packing the files together for
> distribution purposes...
Even more, installers and other applications that pack files for
distribution do a pretty good job on making sure you don't unpack
corrupted files.
> yes, but they are still quite rare...
Not so, if you could take a look at the hidden area, you'll see it starts
to add entries pretty fast, of course it depends of the use you gave it.
> yes, but not that frequently, presumably the corruption has to get
> through several levels of checksums (eg: in the ethernet packets, then
> in the tcp layer, and possibly in the protocol itself).
It does and in each step a resend is requested. But again that's drifting
from archival.
> usually disk or transmission level errors are detected.
> if not, then risk exists, but imo we can't expect complete consistency
> anyways, just things as they are have been doing quite a good job.
No, but it's necesary to know when something fails, and where.
> it is one's own fault if they expect things to be
> completely reliable.
That's ridiculous, an app should be reliable between reasonable limits
(you can't expect an app to deal with an EMP pulse, but you must prepare
it to deal with dead sectors, corrupted data.. at least to fail in a
meanigful way)
> how do we know the user wont die before the data becomes damaged, for
> example?...
as bad a joke as is it's not even fun
| |
| cr88192 2005-04-08, 3:55 am |
|
"Vicente Werner" <Nothin@nothing.com> wrote in message
news:Xns9631BF3F55698notasinglethingofmy
i@216.196.109.144...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:zRb5e.9019$Kr2.3027@fe07.usenetserver.com:
>
> You're way too sensitive.
>
maybe, but afaik one is not supposed to make any personal-ish comments on
usenet...
> Of course that's the point.
>
this itself may or may not be a problem depending on context.
for distribution purposes, knowing the relative ages of files is
unimportant, one typically doesn't even need to know the relative ages of
archives...
> Even more, installers and other applications that pack files for
> distribution do a pretty good job on making sure you don't unpack
> corrupted files.
>
ok.
that is a good point, albeit imo one can still get by without checksums (if
needed), albeit, yes, even my format contained dates and checksums, just, it
was textual.
I found no signifigant cost in parsing or such.
the headers were sufficient wrt size (albeit, I never did add any header
compression).
> Not so, if you could take a look at the hidden area, you'll see it starts
> to add entries pretty fast, of course it depends of the use you gave it.
>
ok.
but I mean bad sectors that start showing up in the filesystem, which are
typically pretty rare. usually, when they do start showing up, I guess it
implies that the disk is hitting its limit (I usually like to stop using
disks at this point).
it has been several years, however, since this has happened (annoyingly, a
few bad sectors were able to trash a notable portion of a reiserfs volume,
which seems to deal with bad sectors fairly poorly...).
> It does and in each step a resend is requested. But again that's drifting
> from archival.
>
ok.
but it also shows, transmission errors are likely rare because of this.
> No, but it's necesary to know when something fails, and where.
>
ok.
> That's ridiculous, an app should be reliable between reasonable limits
> (you can't expect an app to deal with an EMP pulse, but you must prepare
> it to deal with dead sectors, corrupted data.. at least to fail in a
> meanigful way)
it depends on what one defines as "reasonable".
I have used some pretty buggy software in my life, and still do (eg: crap
that often when used is able to crash and reboot winxp after a short while).
I am not supprised when this happens though.
likewise, a user of some format lacking any kind of error
detection/correction could expect, eg, that if anything bad happens the
files are going to be trashed, and that it wont necessarily be obvious.
it is a risk inherent in such a format.
imo, the only reason it might be acceptable, is because most formats don't
use any such mechanisms, and work just fine (partly because disk and network
protocols work fairly hard in keeping everything going well...).
as for checksums, usually I am lazy and just use the adler-32 algo.
> as bad a joke as is it's not even fun
ok.
all this reminds me of a long standing argument with some other dude about
my lack of "design patterns"/"comprehensive unit and systemwide
testing"/"good abstraction/encapsulation"/...
I am a c head, and if my code is buggy and a horrible mess, oh well, that is
a risk of using c. I don't expect my code to be reliable, I just do what I
can...
likewise, I typically design bottom up (though for a recent effort have had
to go up a little and look down to try to avoid some problems).
people, however, can't objectively compare the buggyness of my code,
however, without using or at least looking at it...
or such...
| |
| Simon Jackson, BEng. 2005-04-08, 3:55 am |
| Vicente Werner <Nothin@nothing.com> wrote in message news:< Xns9631BF3F55698notasinglethingofmyi@216
.196.109.144>...
> "cr88192" <cr88192@NOSPAM.hotmail.com> wrote in
> news:zRb5e.9019$Kr2.3027@fe07.usenetserver.com:
>
> You're way too sensitive.
personal judgement
>
> Of course that's the point.
an auto url re-download feature would save time, keep back ups.
>
> Even more, installers and other applications that pack files for
> distribution do a pretty good job on making sure you don't unpack
> corrupted files.
>
> Not so, if you could take a look at the hidden area, you'll see it starts
> to add entries pretty fast, of course it depends of the use you gave it.
>
> It does and in each step a resend is requested. But again that's drifting
> from archival.
>
> No, but it's necesary to know when something fails, and where.
>
but not always possible in a given amount of time.
true MTBF statistics demonstrate this lack of completeness.
[color=darkred]
> That's ridiculous, an app should be reliable between reasonable limits
> (you can't expect an app to deal with an EMP pulse, but you must prepare
> it to deal with dead sectors, corrupted data.. at least to fail in a
> meanigful way)
> as bad a joke as is it's not even fun
the os should use 'free' sectors to transparently provide back-up of
critical functioning.
| |
| Vicente Werner 2005-04-11, 8:55 pm |
| jackokring@yahoo.com (Simon Jackson, BEng.) wrote in
news:51f66f7e.0504071856.20f77c76@posting.google.com:
> an auto url re-download feature would save time, keep back ups.
That's not always workable, many homes still rely on rtc modem conections.
> the os should use 'free' sectors to transparently provide back-up of
> critical functioning.
And an archival app must be able to fail correctly when it's data has
become corrupted.
|
|
|
|
|