For Programmers: Free Programming Magazines  


Home > Archive > Compression > July 2006 > Is it possible to use second column instead of last column in BWT?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Is it possible to use second column instead of last column in BWT?
jondelac@gmail.com

2006-07-11, 9:55 pm

Since BWT relies on order 1 statistics anyway, can we use the second
column in the sorted BWT matrix instead of the last column and still a)
do the reverse transform and b) get almost the same results?

Jon.

igor.grobarcik@gmail.com

2006-07-12, 3:55 am


jondelac@gmail.com nap=EDsal(a):
> Since BWT relies on order 1 statistics anyway, can we use the second
> column in the sorted BWT matrix instead of the last column and still a)
> do the reverse transform and b) get almost the same results?
>
> Jon.


Szip uses a limited order Sort transform. This transformation is
related to
the BWT used in blocksorting compression methods.

See page
http://www.compressconsult.com/szip/

Willem

2006-07-12, 3:55 am

jondelac@gmail.com wrote:
) Since BWT relies on order 1 statistics anyway, can we use the second
) column in the sorted BWT matrix instead of the last column and still a)
) do the reverse transform and b) get almost the same results?

No.

b) The compression results would be much better, but
a) You can not do the reverse transform any more.


To see this more extremely, think about a bitwise BWT:

The second column would consist of four runs, of resp. 0, 1, 0 and 1.
Even coded naively, you could encode that in 5*log2(N) bits.
And there, the counting theorem comes in. :-)


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
michael

2006-07-12, 7:55 am


jondelac@gmail.com wrote:
> Since BWT relies on order 1 statistics anyway, can we use the second
> column in the sorted BWT matrix instead of the last column and still a)
> do the reverse transform and b) get almost the same results?
>
> Jon.


No, this won't work. And the BWT relies on far more than order one
statistics also.

By the way, if you were to encode using the second column, (ignoring
that
it can not be reversed) then every signal could be compressed to at
most
(N^2) * 4 bytes or so in size where N is the alphabet size. (assuming
the signal was limited to 2^32 in length).

So for DNA, every possible signal could be compressed to, at most, 1 of
64 possible
outputs. This alone suggests that it doesn't work. (^:

- Michael Maniscalco

jondelac@gmail.com

2006-07-12, 9:55 pm


Willem wrote:
> jondelac@gmail.com wrote:
> ) Since BWT relies on order 1 statistics anyway, can we use the second
> ) column in the sorted BWT matrix instead of the last column and still a)
> ) do the reverse transform and b) get almost the same results?
>
> No.
>
> b) The compression results would be much better, but
> a) You can not do the reverse transform any more.
>
>
> To see this more extremely, think about a bitwise BWT:
>
> The second column would consist of four runs, of resp. 0, 1, 0 and 1.
> Even coded naively, you could encode that in 5*log2(N) bits.
> And there, the counting theorem comes in. :-)
>
>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT


Hi Willem, in your iplementation of BWC, I thought you were infact
transmitting the second column... The first column is obviously the
sorted column but in constructing the BWT matrix, and comparing the
last column to the what is output by your bwc implementation, I see
different results. The data going into the arithmetic encoder seems to
be the second column to me... Can you explain this discrepancy?

Thanks
Jon

Willem

2006-07-13, 3:55 am

jondelac@gmail.com wrote:
) Hi Willem, in your iplementation of BWC, I thought you were infact
) transmitting the second column... The first column is obviously the
) sorted column but in constructing the BWT matrix, and comparing the
) last column to the what is output by your bwc implementation, I see
) different results. The data going into the arithmetic encoder seems to
) be the second column to me... Can you explain this discrepancy?

I think I can, yes.
Right before sorting, the block gets reversed. There is a very good reason
for this, which I don't seem to recall at the moment. I think that the
unbwt step gets easier because of it, or something like that. Or maybe
the sort itself is also in reverse, not sure. Didn't I document that in
the code ? Maybe it was so blatantly obvious that I never thought of that.

;-)


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
jondelac@gmail.com

2006-07-13, 6:55 pm


Willem wrote:
> jondelac@gmail.com wrote:
> ) Hi Willem, in your iplementation of BWC, I thought you were infact
> ) transmitting the second column... The first column is obviously the
> ) sorted column but in constructing the BWT matrix, and comparing the
> ) last column to the what is output by your bwc implementation, I see
> ) different results. The data going into the arithmetic encoder seems to
> ) be the second column to me... Can you explain this discrepancy?
>
> I think I can, yes.
> Right before sorting, the block gets reversed. There is a very good reason
> for this, which I don't seem to recall at the moment. I think that the
> unbwt step gets easier because of it, or something like that. Or maybe
> the sort itself is also in reverse, not sure. Didn't I document that in
> the code ? Maybe it was so blatantly obvious that I never thought of that.
>
> ;-)
>
>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT


Thanks for that post! I though I was getting dumber by the second. It
is not documented why you reverse the block. Do you remember how you
reverse it? For example, given a sequence like "disila" as the input
string:

the rotated BWT matrix will be

disila
isilad
siladi
iladis
ladisi
adisil

The sorted BWT Matrix will be

adisil
disila
iladis
isilad
ladisi
siladi

And the actual last column L of the BWT Matrix is "lasdii". However,
the BWC program you wrote outputs "dilsai", which is the second column.

By the way, I may note that I am able to perfectly reconstruct the
original string with this column as well, since they are adjacent to
the first column. Given a start index, one can always find the
"proceeding" char just as easily as the "preceeding" char, and the
matchup works teh same way... No?

Jon.

Willem

2006-07-13, 6:55 pm

jondelac@gmail.com wrote:
) Thanks for that post! I though I was getting dumber by the second. It
) is not documented why you reverse the block. Do you remember how you
) reverse it? For example, given a sequence like "disila" as the input
) string:
)
) the rotated BWT matrix will be
)
) disila
) isilad
) siladi
) iladis
) ladisi
) adisil
)
) The sorted BWT Matrix will be
)
) adisil
) disila
) iladis
) isilad
) ladisi
) siladi
)
) And the actual last column L of the BWT Matrix is "lasdii". However,
) the BWC program you wrote outputs "dilsai", which is the second column.

Let's see...

The rotated (and reversed) matrix would be:

alisid
lisida
isidal
sidali
idalis
dalisi

sorted: (with preceding characters)

alisid
dalisi
idalis
isidal
lisida
sidali

So I think it should be outputting "dislai", not "dilsai".

NB: there is also an EOF issue, but by pure chance, that doesn't
come into play in this example.

) By the way, I may note that I am able to perfectly reconstruct the
) original string with this column as well, since they are adjacent to
) the first column. Given a start index, one can always find the
) "proceeding" char just as easily as the "preceeding" char, and the
) matchup works teh same way... No?

Yes, unless there are more than one of the same character.
When that happens, it could be that you end up at the wrong one,
and the order gets screwed.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com