Home > Archive > Compression > April 2006 > Re: compressing a text file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: compressing a text file
|
|
| Jasen Betts 2006-04-21, 7:55 am |
| On 2006-04-20, junky_fellow@yahoo.co.in <junky_fellow@yahoo.co.in> wrote:
> HI guys,
>
> I am new to the field of data compression. I want to write an
> algorithm to compress
> the text file. One way I thought of replacing the frequently occuring
> words with a smaller
> symbol. Say, for example if "the" is repeated in the text file 1000
> times I would replace
> "the" with a new symbol "@" at all the 1000 places.
> But there is a possibility that the new symbol "@" is already present
> at some places
> in the text file. So, I may mistook it as "the". Can anyone suggest me
> how to solve
> this problem ?
replace @ with the
replace @ with ~@ and replace ~ with ~~
--
Bye.
Jasen
| |
| cr88192 2006-04-21, 6:56 pm |
|
"Jasen Betts" <jasen@free.net.nz> wrote in message
news:60f5.4448a72e.7ddcc@clunker.homenet...
> On 2006-04-20, junky_fellow@yahoo.co.in <junky_fellow@yahoo.co.in> wrote:
>
> replace @ with the
>
> replace @ with ~@ and replace ~ with ~~
>
still, as it stands, imo, specific word replacement (in this form) is
essentially largely useless in general.
much more effective (assuming a lot of text files are involved and a shared
dictionary is ok) would be a kind of external dictionary approach (ime,
often termed "vector quantization").
assuming the dictionary is good and the files are consistent, this would
likely give the largest possible payoff...
personally, I would much rather prefer this over trying to match and replace
certain words...
then again, specific word replacement may be simplest (conceptually), in
this case, I would at least recommend using the upper 128 chars for stored
words, as then there is little or no clash with printable characters.
> --
>
> Bye.
> Jasen
|
|
|
|
|