Home > Archive > Compression > July 2006 > Is it possible to use second column instead of last column in BWT?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Is it possible to use second column instead of last column in BWT?
|
|
| jondelac@gmail.com 2006-07-11, 9:55 pm |
| Since BWT relies on order 1 statistics anyway, can we use the second
column in the sorted BWT matrix instead of the last column and still a)
do the reverse transform and b) get almost the same results?
Jon.
| |
| igor.grobarcik@gmail.com 2006-07-12, 3:55 am |
|
jondelac@gmail.com nap=EDsal(a):
> Since BWT relies on order 1 statistics anyway, can we use the second
> column in the sorted BWT matrix instead of the last column and still a)
> do the reverse transform and b) get almost the same results?
>
> Jon.
Szip uses a limited order Sort transform. This transformation is
related to
the BWT used in blocksorting compression methods.
See page
http://www.compressconsult.com/szip/
| |
| Willem 2006-07-12, 3:55 am |
| jondelac@gmail.com wrote:
) Since BWT relies on order 1 statistics anyway, can we use the second
) column in the sorted BWT matrix instead of the last column and still a)
) do the reverse transform and b) get almost the same results?
No.
b) The compression results would be much better, but
a) You can not do the reverse transform any more.
To see this more extremely, think about a bitwise BWT:
The second column would consist of four runs, of resp. 0, 1, 0 and 1.
Even coded naively, you could encode that in 5*log2(N) bits.
And there, the counting theorem comes in. :-)
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| michael 2006-07-12, 7:55 am |
|
jondelac@gmail.com wrote:
> Since BWT relies on order 1 statistics anyway, can we use the second
> column in the sorted BWT matrix instead of the last column and still a)
> do the reverse transform and b) get almost the same results?
>
> Jon.
No, this won't work. And the BWT relies on far more than order one
statistics also.
By the way, if you were to encode using the second column, (ignoring
that
it can not be reversed) then every signal could be compressed to at
most
(N^2) * 4 bytes or so in size where N is the alphabet size. (assuming
the signal was limited to 2^32 in length).
So for DNA, every possible signal could be compressed to, at most, 1 of
64 possible
outputs. This alone suggests that it doesn't work. (^:
- Michael Maniscalco
| |
| jondelac@gmail.com 2006-07-12, 9:55 pm |
|
Willem wrote:
> jondelac@gmail.com wrote:
> ) Since BWT relies on order 1 statistics anyway, can we use the second
> ) column in the sorted BWT matrix instead of the last column and still a)
> ) do the reverse transform and b) get almost the same results?
>
> No.
>
> b) The compression results would be much better, but
> a) You can not do the reverse transform any more.
>
>
> To see this more extremely, think about a bitwise BWT:
>
> The second column would consist of four runs, of resp. 0, 1, 0 and 1.
> Even coded naively, you could encode that in 5*log2(N) bits.
> And there, the counting theorem comes in. :-)
>
>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT
Hi Willem, in your iplementation of BWC, I thought you were infact
transmitting the second column... The first column is obviously the
sorted column but in constructing the BWT matrix, and comparing the
last column to the what is output by your bwc implementation, I see
different results. The data going into the arithmetic encoder seems to
be the second column to me... Can you explain this discrepancy?
Thanks
Jon
| |
| Willem 2006-07-13, 3:55 am |
| jondelac@gmail.com wrote:
) Hi Willem, in your iplementation of BWC, I thought you were infact
) transmitting the second column... The first column is obviously the
) sorted column but in constructing the BWT matrix, and comparing the
) last column to the what is output by your bwc implementation, I see
) different results. The data going into the arithmetic encoder seems to
) be the second column to me... Can you explain this discrepancy?
I think I can, yes.
Right before sorting, the block gets reversed. There is a very good reason
for this, which I don't seem to recall at the moment. I think that the
unbwt step gets easier because of it, or something like that. Or maybe
the sort itself is also in reverse, not sure. Didn't I document that in
the code ? Maybe it was so blatantly obvious that I never thought of that.
;-)
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
| |
| jondelac@gmail.com 2006-07-13, 6:55 pm |
|
Willem wrote:
> jondelac@gmail.com wrote:
> ) Hi Willem, in your iplementation of BWC, I thought you were infact
> ) transmitting the second column... The first column is obviously the
> ) sorted column but in constructing the BWT matrix, and comparing the
> ) last column to the what is output by your bwc implementation, I see
> ) different results. The data going into the arithmetic encoder seems to
> ) be the second column to me... Can you explain this discrepancy?
>
> I think I can, yes.
> Right before sorting, the block gets reversed. There is a very good reason
> for this, which I don't seem to recall at the moment. I think that the
> unbwt step gets easier because of it, or something like that. Or maybe
> the sort itself is also in reverse, not sure. Didn't I document that in
> the code ? Maybe it was so blatantly obvious that I never thought of that.
>
> ;-)
>
>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT
Thanks for that post! I though I was getting dumber by the second. It
is not documented why you reverse the block. Do you remember how you
reverse it? For example, given a sequence like "disila" as the input
string:
the rotated BWT matrix will be
disila
isilad
siladi
iladis
ladisi
adisil
The sorted BWT Matrix will be
adisil
disila
iladis
isilad
ladisi
siladi
And the actual last column L of the BWT Matrix is "lasdii". However,
the BWC program you wrote outputs "dilsai", which is the second column.
By the way, I may note that I am able to perfectly reconstruct the
original string with this column as well, since they are adjacent to
the first column. Given a start index, one can always find the
"proceeding" char just as easily as the "preceeding" char, and the
matchup works teh same way... No?
Jon.
| |
| Willem 2006-07-13, 6:55 pm |
| jondelac@gmail.com wrote:
) Thanks for that post! I though I was getting dumber by the second. It
) is not documented why you reverse the block. Do you remember how you
) reverse it? For example, given a sequence like "disila" as the input
) string:
)
) the rotated BWT matrix will be
)
) disila
) isilad
) siladi
) iladis
) ladisi
) adisil
)
) The sorted BWT Matrix will be
)
) adisil
) disila
) iladis
) isilad
) ladisi
) siladi
)
) And the actual last column L of the BWT Matrix is "lasdii". However,
) the BWC program you wrote outputs "dilsai", which is the second column.
Let's see...
The rotated (and reversed) matrix would be:
alisid
lisida
isidal
sidali
idalis
dalisi
sorted: (with preceding characters)
alisid
dalisi
idalis
isidal
lisida
sidali
So I think it should be outputting "dislai", not "dilsai".
NB: there is also an EOF issue, but by pure chance, that doesn't
come into play in this example.
) By the way, I may note that I am able to perfectly reconstruct the
) original string with this column as well, since they are adjacent to
) the first column. Given a start index, one can always find the
) "proceeding" char just as easily as the "preceeding" char, and the
) matchup works teh same way... No?
Yes, unless there are more than one of the same character.
When that happens, it could be that you end up at the wrong one,
and the order gets screwed.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
|
|
|
|
|