For Programmers: Free Programming Magazines  


Home > Archive > Smalltalk > September 2007 > Sport: Hex Strings









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Sport: Hex Strings
Bruce Badger

2007-09-10, 7:13 pm

leandros's test >>testHexStringFromByteArray exercise the methods:[color=darkred]

I have the test failing in VisualWorks because the ByteArray #[0] is
represented in a string as '00' not as '0' which the test expects.
Equally the test expects #[15] to be represented as 'F', but the VW
implementation produces '0F'.

I think that #[15] -> 'F' is not as safe as #[15] -> '0F', so I am
inclined to suggest that the test should be modified to explicitly check
for '00' and '0F'.

IOW, I think:

(SpEnvironment hexStringFromByteArray: (ByteArray with: 0)) = '00'
(SpEnvironment hexStringFromByteArray: (ByteArray with: 15)) = '0F'

Should both be true and that this is what the test should require.

What do you think?

Normand Mongeau

2007-09-10, 7:13 pm

Actually I had a problem with the opposite method, i.e.

SpEnvironment byteArrayFromHexString: '0'.

both in VA and VW returns an empty ByteArray, because the implementation
expects each hex value to have 2 characters, thus the SUnit test below
failed .

testByteArrayFromHexString
| array |
array := SpEnvironment byteArrayFromHexString: '0'.
self assert: array = (ByteArray with: 0).

I had already changed my implementation to pad with a 0 in order to have
even number of chars, thus passing the SUnit test. Maybe not the best
solution though.

Normand

"Bruce Badger" <bbadger@openskills.com> wrote in message
news:ybCdnQroQotA7HjbnZ2dnUVZ_s6mnZ2d@to
tallyobjects.com...
> leandros's test >>testHexStringFromByteArray exercise the methods:
>
> I have the test failing in VisualWorks because the ByteArray #[0] is
> represented in a string as '00' not as '0' which the test expects. Equally
> the test expects #[15] to be represented as 'F', but the VW implementation
> produces '0F'.
>
> I think that #[15] -> 'F' is not as safe as #[15] -> '0F', so I am
> inclined to suggest that the test should be modified to explicitly check
> for '00' and '0F'.
>
> IOW, I think:
>
> (SpEnvironment hexStringFromByteArray: (ByteArray with: 0)) = '00'
> (SpEnvironment hexStringFromByteArray: (ByteArray with: 15)) = '0F'
>
> Should both be true and that this is what the test should require.
>
> What do you think?
>



Bruce Badger

2007-09-10, 7:13 pm

Normand,

I agree with you, and I'm not sure how I missed the reverse case. Mind
you, I have just got back in from the London Smalltalk pub meet, so I
may not be firing on all cylinders here. Anyway ...

A hex string *must* contain an even number of characters and that a
stray character at the end will simply be ignored and, the other way
around, all created hex strings must contain an even number of characters.

So the two tests:

testHexStringFromByteArray
testByteArrayFromHexString

Should be modified to use an expect hex strings with even numbers of
characters, i.e. '00' and '0F' rather than '0' and 'F'.

All the best,
Bruce
Normand Mongeau

2007-09-10, 7:13 pm

Hmm, after giving some thought, I don't really agree.

'F1A' is a perfectly valid hex string, as is 'C'. Maybe not in the context
of an HTTP server, but they are nonetheless valid.

I think we should change the implementation to accept odd number of
characters.

Normand


"Bruce Badger" <bbadger@openskills.com> wrote in message
news:9qKdnUyMq-jNNXjbnZ2dnUVZ_sejnZ2d@totallyobjects.com...
> Normand,
>
> I agree with you, and I'm not sure how I missed the reverse case. Mind
> you, I have just got back in from the London Smalltalk pub meet, so I may
> not be firing on all cylinders here. Anyway ...
>
> A hex string *must* contain an even number of characters and that a stray
> character at the end will simply be ignored and, the other way around, all
> created hex strings must contain an even number of characters.
>
> So the two tests:
>
> testHexStringFromByteArray
> testByteArrayFromHexString
>
> Should be modified to use an expect hex strings with even numbers of
> characters, i.e. '00' and '0F' rather than '0' and 'F'.
>
> All the best,
> Bruce



nice

2007-09-10, 7:13 pm

'000F' and '0F' are perfect hex string too but would not be interrpreted
the same, and would give byte arrays #[0 15] and [15].

So the leading zero are significant.
In this context Bruce strict rule appear safer to me.

Since the second byte will need the leading zero anyway, i don't see the
point of simplifying the leading zero of first byte.

Nicolas


Normand Mongeau a écrit :
> Hmm, after giving some thought, I don't really agree.
>
> 'F1A' is a perfectly valid hex string, as is 'C'. Maybe not in the context
> of an HTTP server, but they are nonetheless valid.
>
> I think we should change the implementation to accept odd number of
> characters.
>
> Normand
>
>
> "Bruce Badger" <bbadger@openskills.com> wrote in message
> news:9qKdnUyMq-jNNXjbnZ2dnUVZ_sejnZ2d@totallyobjects.com...
>
>

Normand Mongeau

2007-09-11, 4:24 am

I'm not sure I get your point.

Bruce is talking about changing the test code, while I'm worried about the
implementation.

Current implementation yields an empty ByteArray if the hex string has only
one character. How do you suggest we fix that?

Normand


"nice" <ncellier@ifrance.com> wrote in message
news:auydnd3KNKePWXjbnZ2dnUVZ_tGonZ2d@to
tallyobjects.com...[color=darkred]
> '000F' and '0F' are perfect hex string too but would not be interrpreted
> the same, and would give byte arrays #[0 15] and [15].
>
> So the leading zero are significant.
> In this context Bruce strict rule appear safer to me.
>
> Since the second byte will need the leading zero anyway, i don't see the
> point of simplifying the leading zero of first byte.
>
> Nicolas
>
>
> Normand Mongeau a écrit :

nice

2007-09-11, 4:24 am

Normand Mongeau a écrit :
> I'm not sure I get your point.
>
> Bruce is talking about changing the test code, while I'm worried about the
> implementation.
>
> Current implementation yields an empty ByteArray if the hex string has only
> one character. How do you suggest we fix that?
>


Raise an Exception

> Normand
>

Bruce Badger

2007-09-11, 4:24 am

On Tue, 11 Sep 2007 00:19:23 -0400, Normand Mongeau wrote:
> Bruce is talking about changing the test code, while I'm worried about the
> implementation.


Well, I'm suggesting that the test code reflects the way the earliest
versions of Sport were implemented, i.e. that hex strings must have an
even number of characters.

> Current implementation yields an empty ByteArray if the hex string has only
> one character. How do you suggest we fix that?


The tension lies here:

If I have just 'F' is that to be taken as '0F' or 'F0'? You seem to be
saying 0F.

OK, is '00F' to be taken as '00F0' or '000F'? The latter is
consistent with 'F' being taken as '0F' but looks odd to me (and perhaps a
bit too clever?).

By requiring an even number of characters in the hex string, (i.e.
requiring that every byte is unambiguously defined) we avoid the issue.

As for what to do if given a string of odd length, I agree with Nicolas
and think throwing an exception would be the most intention revealing.
Right now the VW and GS versions silently drop data, which is not good.

All the best,
Bruce
Paolo Bonzini

2007-09-11, 4:24 am

>From a completely different point of view, I don't understand fully
the need for these methods. I think that Sport should first try to
set a minimal set of features that Smalltalk implementations should
support -- and class extensions should be one of these (it is in the
newest Gemstone, for example). Then, hexadecimal string <-> ByteArray
conversion should be done by the user who needs it, in their code,
using an extension or a support class: *in terms of Sport* (if
necessary) and not *in Sport*.

We should also decide what dialect to base our extensions on. For the
implementation, both Squeak and GNU both have a liberal enough license
(MIT and LGPL respectively).

For the naming, I expect some wars to burst out. Personally, I'd
prefer VA names (because it is quite similar to GNU for historical
reasons). Squeak and VW have some similarities too, but I found some
inconsistencies (witness #upTo: vs. #upToAll: that was pointed out a
short while ago) and some absurdities (such as #tokensBasedOn: vs. the
simpler name #subStrings: used by ANSI). Don't know about Dolphin,
Visual Smalltalk, Gemstone, etc.

Paolo

Bruce Badger

2007-09-11, 4:24 am

Paolo,

> the need for these methods.


The "reason" is that Sport(A) is an evolved thing, not a designed thing.
It evolved out of work I did for OpenSkills and so it contains all
kinds of things it should not given the goal to keep Sport as small as
possible.

SportB can start to address this.

> I think that Sport should first try to
> set a minimal set of features that Smalltalk implementations should
> support -- and class extensions should be one of these (it is in the
> newest Gemstone, for example). Then, hexadecimal string <-> ByteArray
> conversion should be done by the user who needs it, in their code,
> using an extension or a support class: *in terms of Sport* (if
> necessary) and not *in Sport*.


In the specific case of the hex string stuff, this was a mistake on my
part. It would probably be better done as a stream thing like the UTF8
stream - but no matter how it is done, as you say, it should be done *in
terms of sport* and should not be part of Sport.

But it is currently part of SportA and so I think we need to live with
it and just try to make it work in a consistent way.

> We should also decide what dialect to base our extensions on. For the
> implementation, both Squeak and GNU both have a liberal enough license
> (MIT and LGPL respectively).


My inclination is to avoid making these kinds of choices up front. I
would prefer to leave the hard discussion to the ANSI committee because
they are set up to form a consensus. Sport is first and foremost a
band-aid.

To this end I suggest that, barring any extreme ugliness or impossible
to implement ideas, we go with the solution that is first shown to work.
For example, with the GC Estoban says he has this working in Sport for
Dolphin so I am tempted to just go with the selector he has used (once
we know what it is).

> For the naming, I expect some wars to burst out.


Quite :-)

What we need is a tie-breaker. We need someone who can, when no
consensus can be reached, look at the options and just pick one that
works. I can do that, but I think it may be better to pick someone
else, someone who is not involved with Sport development but who is
respected in the Smalltalk community.

When resolving contentious issues I would prefer to end up with a
solution I was a bit uncomfortable with than no solution at all. A
danger for Sport is that it gets lost in subtlety, when it should be
just good enough until ANSI defines how thing really shall be.

All the best,
Bruce
Paolo Bonzini

2007-09-11, 4:24 am


> What we need is a tie-breaker. We need someone who can, when no
> consensus can be reached, look at the options and just pick one that
> works. I can do that, but I think it may be better to pick someone
> else, someone who is not involved with Sport development but who is
> respected in the Smalltalk community.


And who is not involved with dialect development. I'd choose VA, but
that's as biased as if I chose GNU.

Anyway, the problem is IMO that yes, ANSI is the solution in theory
but in practice who is going to restart it? Is the committee going to
formalize extensions or just the status quo? In the latter case, does
Sport have a role in that? What are the dialect developers going to
do?

The more I think of it, the more I believe that the solution to the
dialect problem is more like "there can be only one" (not in an
optimistic sense, but in a "Highlander" sense). Whoever survives
dictates the rules. Not ANSI, not Sport. :-(

Paolo

Janko Mivšek

2007-09-11, 4:24 am

Paolo Bonzini wrote:

> Anyway, the problem is IMO that yes, ANSI is the solution in theory
> but in practice who is going to restart it? Is the committee going to
> formalize extensions or just the status quo? In the latter case, does
> Sport have a role in that? What are the dialect developers going to
> do?


I would say that Sport can be a proof that it can be done, and it will
be done. Maybe not as clean as someone want, but it will be done. And
this is at least for now more important IMHO: that we have something,
even not perfect. A pragmatic solution that is. And that solution will
definitively have a weight later on deciding a standard solution.
Definitively bigger that if there wouldn't be any solution at all.

> The more I think of it, the more I believe that the solution to the
> dialect problem is more like "there can be only one" (not in an
> optimistic sense, but in a "Highlander" sense). Whoever survives
> dictates the rules. Not ANSI, not Sport. :-(


I think that there will always be a room for differences - just look at
GUI front. So, still a room for many Smalltalks and a healthy competition.

Best regards
Janko

--
Janko Mivšek
AIDA/Web
Smalltalk Web Application Server
http://www.aidaweb.si
Bruce Badger

2007-09-11, 4:24 am

Paolo Bonzini wrote:
>
> And who is not involved with dialect development. I'd choose VA, but
> that's as biased as if I chose GNU.


Indeed. Can you think of anyone?

> Anyway, the problem is IMO that yes, ANSI is the solution in theory
> but in practice who is going to restart it?


I don't know. I have sent an email message to ANSI asking just that,
but as yet I have had no answer. If anyone knows how to get this
information I would very much like to know.

> Is the committee going to
> formalize extensions or just the status quo? In the latter case, does
> Sport have a role in that? What are the dialect developers going to
> do?


Here is how I would like to see this work:

The ANSI committee is restarted and sets a goal of regularly producing
an updated version of the standard. I think one version every 18months
or 2 years would be a good goal. i.e. the process should be time-boxed.

The ANSI committee select the most pressing areas for standardisation,
and this is where I think Sport can help by being a barometer of what
issues are indeed the most pressing. With each version of the ANSI
standard we should be able to produce a new Sport API with recently
standardised things removed.

Over time, Sport will change in size but really should trend towards
ceasing to exist.

> The more I think of it, the more I believe that the solution to the
> dialect problem is more like "there can be only one" (not in an
> optimistic sense, but in a "Highlander" sense). Whoever survives
> dictates the rules. Not ANSI, not Sport. :-(


Oh, I think we can make good things happen :-)

The highlander path leads to silos where each dialect has a closed set
of libraries. Porting between the silos does happen, but it's a
one off and essentially once ported you have a permanent fork in the
project.

Sport, for all it's many faults, breaks the barrier between the silos
and means that projects can have a single code base that can work
in many dialects. So in that regard it should be attractive even to
silo owners.

All Sport needs is consistency. It does not need to be pretty, it just
needs to work and it needs to be small. Given these things I think that
the pressure for resumption of the ANSI process will increase and once
restarted the ANSI committee can worry about beauty and truth.

Well, that's what I think, anyway.

All the best,
Bruce
Bruce Badger

2007-09-11, 4:24 am

Bruce Badger wrote:
> Paolo Bonzini wrote:


>
> I don't know. I have sent an email message to ANSI asking just that,
> but as yet I have had no answer. If anyone knows how to get this
> information I would very much like to know.


I *just* got a response from ANSI. In it they "reaffirmed" the current
publication (and tried to sell me a copy), but they also gave the name
and number of someone at ANSI I can talk to.

I'll call later, when the US wakes up.
Normand Mongeau

2007-09-11, 8:10 am


"Bruce Badger" <bbadger@no.spam.openskills.com> wrote in message
news:pan.2007.09.11.06.33.49.752046@no.spam.openskills.com...
> On Tue, 11 Sep 2007 00:19:23 -0400, Normand Mongeau wrote:
>
> Well, I'm suggesting that the test code reflects the way the earliest
> versions of Sport were implemented, i.e. that hex strings must have an
> even number of characters.
>
>
> The tension lies here:
>
> If I have just 'F' is that to be taken as '0F' or 'F0'? You seem to be
> saying 0F.
>
> OK, is '00F' to be taken as '00F0' or '000F'? The latter is
> consistent with 'F' being taken as '0F' but looks odd to me (and perhaps a
> bit too clever?).


0F and 000F respectively, because they do not modify the inherent value of
the expression. Remember that we are looking at a conversion to a
ByteArray, and prepending an extra 0 when it is missing gives the correct
results. I'm a pragmatic...

>
> By requiring an even number of characters in the hex string, (i.e.
> requiring that every byte is unambiguously defined) we avoid the issue.
>
> As for what to do if given a string of odd length, I agree with Nicolas
> and think throwing an exception would be the most intention revealing.
> Right now the VW and GS versions silently drop data, which is not good.
>


I don't agree, but I'll bow to the majority. An exception it will be then.


Normand


Bruce Badger

2007-09-11, 8:10 am

Normand Mongeau wrote:

> I don't agree, but I'll bow to the majority. An exception it will be then.


Thank you :-]

I need to modify the VW and GS implementations to throw the exception
and I guess we need to modify the test to check that.

Thanks again,
Bruce
Bruce Badger

2007-09-11, 8:10 am

Here is the test that check that odd numbers of characters cause[color=darkred]

testByteArrayFromHexStringError
self should: [SpEnvironment byteArrayFromHexString: '00F']
raise: SpHexStringError

If this is OK then it should be in version 2 of the Sport(A) tests.

All the best,
Bruce
cstb

2007-09-13, 4:21 am

Normand Mongeau wrote:
> Hmm, after giving some thought, I don't really agree.
>
> 'F1A' is a perfectly valid hex string, as is 'C'. Maybe not in the context
> of an HTTP server, but they are nonetheless valid.


Sure are.

> I think we should change the implementation to accept odd number of
> characters.


For the record, and .02,
I agree.

Signalling an exception in response
to valid input bites, and in the wrong place.

A different argument could be made for signaling
an exception for 'F',
if 'FF' signaled the same exception.

(On grounds of no leading digit being ambiguous).

Regards,

-cstb



> Normand
>
>
> "Bruce Badger" <bbadger@openskills.com> wrote in message
> news:9qKdnUyMq-jNNXjbnZ2dnUVZ_sejnZ2d@totallyobjects.com...
>
>

Bruce Badger

2007-09-13, 4:21 am

cstb wrote:
[color=darkred]

So what value is 'F1A'? Depends on the context, as you say.

If you inspect 16rF1A you'll get the Integer 3866. So we can say that
'F1A' is indeed a reasonable string representation of a hex number.

But converting a hex string to a byte array is different. To
unambiguously define all 8 bits of a byte (OK, octet) you need to have
two hex digits - no more and no less.

The method in question here is: byteArrayFromHexString:

So while "F1A" may be a fine hex number in string form, it is not an
unambiguous argument to the above method. The exception makes clear
that the use of an odd length string as an argument is ambiguous - thus
we have revealed our intent.

Anyway, Sport is a band aid and it will be up to the ANSI committee to
work out the Truth. As long as sport works in a consistent and obvious
way, that's good enough I reckon.

All the best,
Bruce
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com