Code Comments
Programming Forum and web based access to our favorite programming groups.On Mar 6, 10:11=A0am, phil chastney <phil.hates.s...@amadeus.munged.eclipse.co.uk> wrote: > Dyalog have taken the right decision, but instead of simply releasing > enhanced Unicode versions without enhanced "classic" versions, it might > be better received by their customer base if Dyalog were to announce an > end-date for ASCII-only enhancements first Phil, thanks for the first bit! I am not sure I understand the rest of the comment, or what the perceived problem is with the Unicode/Classic split - can you elaborate? The Classic and Unicode versions are built using the same source code and are identical in "functionality". They ONLY differ in the way that they translate data as it goes in and out of the system, that character arrays in the Classic version are restricted to the 256- element selection from Unicode known as the "Atomic Vector", and that the byte which represents a character in the Classic version is an index into "Quad-AV". The arguments and results of monadic upgrade and Quad-DR differ from one version to the other. Note that BOTH versions have a single-byte character data type, but that Unicode version will also use 2- and 4-byte internal representations if you use code points beyond 255 (the Unicode product views characters in the same way as integers, which also come in 1-, 2- and 4-byte flavours, depending on the range of the data). We have NOT set a date for the end of the Classic version because we want to give our customers some time to think about the issue and how it will impact applications first. Some applications will just load and run in the Unicode version, others which use a lot of casting ([]DR) and clever tricks when importing and exporting data may need work. We do not yet know what would be a "reasonable" deadline. Actually taking advantage of Unicode in applications (as opposed to just being able to run on the Unicode version) may be a massive effort involving rewriting all your interfaces, converting all your SQL databases and other forms of external media, your applications understanding of sorting and searching, etc: We don't expect many of our customers to go all the way down that path, but it is very important that those who want to CAN do it easily. Note that the Classic and Unicode versions are fully inter-operable, they can share workspaces, component files and TCP socket connections - with the "obvious" limitation that a Classic version will choke if it encounters a character not in its QuadAV (QuadAV can be defined by the user - so Russians can define a different 256-element subset than English Dyalog users). The Unicode version can be configured so that it knows which subset a Classic "partner system" is working with and translate data accordingly. v12 Unicode can be instructed to continue to write non-Unicode component files (on a per file basis) and give an error if this is not possible, so that it does not "accidentally" write files that Classic (or v10 and v11) cannot read. Version 12 is carefully designed to avoid big bang conversion events and make the transition to Unicode as smooth as possible: We want encourage our users to move to Unicode as quickly as possibly but will not force anyone to move hastily. We are NOT setting a deadline at this time. My advice would be to install the Unicode version as soon as possible and start up a project team to evaluate how hard it will be to move, and how (if) your applications can benefit from extending the range of characters handled. I would plan to starting a move to the Unicode version with 3 years and try to complete it in 5. But DON'T PANIC, we do not leave people "up the cr": So long as there is any significant use of the Classic version it will continue to be supported. Come to our User Group conference in October to talk to colleagues about what they are doing and put pressure on Dyalog to do the right thing :-). Morten
Post Follow-up to this messageMorten Kromberg wrote: > On Mar 6, 10:11 am, phil chastney > <phil.hates.s...@amadeus.munged.eclipse.co.uk> wrote: > > > Phil, thanks for the first bit! I am not sure I understand the rest of > the comment, or what the perceived problem is with the Unicode/Classic > split - can you elaborate? > > The Classic and Unicode versions are built using the same source code > and are identical in "functionality". They ONLY differ in the way that > they translate data as it goes in and out of the system, that > character arrays in the Classic version are restricted to the 256- > element selection from Unicode known as the "Atomic Vector", and that > the byte which represents a character in the Classic version is an > index into "Quad-AV". The arguments and results of monadic upgrade and > Quad-DR differ from one version to the other. > > Note that BOTH versions have a single-byte character data type, but > that Unicode version will also use 2- and 4-byte internal > representations if you use code points beyond 255 (the Unicode product > views characters in the same way as integers, which also come in 1-, > 2- and 4-byte flavours, depending on the range of the data). > > We have NOT set a date for the end of the Classic version because we > want to give our customers some time to think about the issue and how > it will impact applications first. Some applications will just load > and run in the Unicode version, others which use a lot of casting > ([]DR) and clever tricks when importing and exporting data may need > work. We do not yet know what would be a "reasonable" deadline. > > Actually taking advantage of Unicode in applications (as opposed to > just being able to run on the Unicode version) may be a massive effort > involving rewriting all your interfaces, converting all your SQL > databases and other forms of external media, your applications > understanding of sorting and searching, etc: We don't expect many of > our customers to go all the way down that path, but it is very > important that those who want to CAN do it easily. > > Note that the Classic and Unicode versions are fully inter-operable, > they can share workspaces, component files and TCP socket connections > - with the "obvious" limitation that a Classic version will choke if > it encounters a character not in its QuadAV (QuadAV can be defined by > the user - so Russians can define a different 256-element subset than > English Dyalog users). The Unicode version can be configured so that > it knows which subset a Classic "partner system" is working with and > translate data accordingly. > > v12 Unicode can be instructed to continue to write non-Unicode > component files (on a per file basis) and give an error if this is not > possible, so that it does not "accidentally" write files that Classic > (or v10 and v11) cannot read. > > Version 12 is carefully designed to avoid big bang conversion events > and make the transition to Unicode as smooth as possible: We want > encourage our users to move to Unicode as quickly as possibly but will > not force anyone to move hastily. > > We are NOT setting a deadline at this time. My advice would be to > install the Unicode version as soon as possible and start up a project > team to evaluate how hard it will be to move, and how (if) your > applications can benefit from extending the range of characters > handled. I would plan to starting a move to the Unicode version with 3 > years and try to complete it in 5. But DON'T PANIC, we do not leave > people "up the cr": So long as there is any significant use of the > Classic version it will continue to be supported. > > Come to our User Group conference in October to talk to colleagues > about what they are doing and put pressure on Dyalog to do the right > thing :-). > > Morten OK, that's fine, if there's no significant overhead in maintaining both versions, then continue to support both versions and thanks for the invitation, but I'm sure you get more than enough pressure as it is, and the pressure that you're getting is probably much more commercially oriented best regards . . . /phil
Post Follow-up to this messageGosi wrote: > On Mar 6, 2:53 pm, phil chastney > <phil.hates.s...@amadeus.munged.eclipse.co.uk> wrote: > > It was you who suggested that technical symbols could be used in > Unicode. > > > That is why I asked if those symbols could be used to translate to Apl > functionality I'm still baffled Unicode is an encoding, it maps abstract characters and symbols to hexadecimal values, nothing more (actually, there is a lot of ancillary stuff as well, but we'll ignore that, pro tem) most known characters and quite a lot of technical symbols can be mapped to hexadecimal values using Unicode the technical symbols include every published APL symbol ever used in any way it also includes just about every mathematical symbol the American Mathematical Society could think of so a stream of mathematical symbols could be encoded using Unicode values, and (if one exists) a semantically equivalent stream of APL symbols could also be encoded using Unicode values I'm sure you knew that Unicode has no functional power, and in particular, it doesn't have markup, so I should amend that earlier statement to "a _linear_ stream of mathematical symbols..." a stream of mathematical symbols, with embedded markup (presumably ASCII-only), could also be encoded using Unicode values conversion to a semantically equivalent stream of APL symbols could be done by hand, or using a programming language[1], but I doubt very much if a general solution exists so, yes, some 2-D mathematical expressions could be translated into executable APL using only Unicode values, which is my present best guess at what you were asking and if I've misunderstood you, my apologies all the best . . . /phil [1] note that there is no requirement for the programming language to know anything about Unicode -- Unicode 1.0 has a copyright date (c) 1990, 1991 and at that time (before the introduction of w_char) some people were already using unsigned ints to store Unicode values
Post Follow-up to this messageOn Mar 7, 8:58 pm, phil chastney
<phil.hates.s...@amadeus.munged.eclipse.co.uk> wrote:
> Gosi wrote:
>
>
>
>
> I'm still baffled
>
> Unicode is an encoding, it maps abstract characters and symbols to
> hexadecimal values, nothing more (actually, there is a lot of ancillary
> stuff as well, but we'll ignore that, pro tem)
>
> most known characters and quite a lot of technical symbols can be mapped
> to hexadecimal values using Unicode
>
> the technical symbols include every published APL symbol ever used in
> any way
>
> it also includes just about every mathematical symbol the American
> Mathematical Society could think of
>
> so a stream of mathematical symbols could be encoded using Unicode
> values, and (if one exists) a semantically equivalent stream of APL
> symbols could also be encoded using Unicode values
>
> I'm sure you knew that
>
> Unicode has no functional power, and in particular, it doesn't have
> markup, so I should amend that earlier statement to "a _linear_ stream
> of mathematical symbols..."
>
> a stream of mathematical symbols, with embedded markup (presumably
> ASCII-only), could also be encoded using Unicode values
>
> conversion to a semantically equivalent stream of APL symbols could be
> done by hand, or using a programming language[1], but I doubt very much
> if a general solution exists
>
> so, yes, some 2-D mathematical expressions could be translated into
> executable APL using only Unicode values, which is my present best guess
> at what you were asking
>
> and if I've misunderstood you, my apologies
>
> all the best . . . /phil
>
> [1] note that there is no requirement for the programming language to
> know anything about Unicode -- Unicode 1.0 has a copyright date (c)
> 1990, 1991 and at that time (before the introduction of w_char) some
> people were already using unsigned ints to store Unicode values
For example:
\sum_{k=1}^{n}{a_k} means a1 + a2 + ... + an.
Unicode sign 2211 and below it k=1 and above it n
Post Follow-up to this messageGosi wrote:
> On Mar 7, 8:58 pm, phil chastney
> <phil.hates.s...@amadeus.munged.eclipse.co.uk> wrote:
>
> For example:
>
> \sum_{k=1}^{n}{a_k} means a1 + a2 + ... + an.
>
> Unicode sign 2211 and below it k=1 and above it n
it's important to keep your levels of detail clear here
"below it" and "above it" are layout specifications, and for that you
need markup, which is not part of an encoding scheme
there is another layout convention, which places "k=1" level with the
lower horizontal of the Sigma, and the "n" level with the upper horizontal
one convention is (normally) used for displayed formulae, the other for
inline for formulae (i.e, embedded in plaintext)
as Sam Sirlin has pointed out, there are various markup schemes
your chosen markup language may or may not support both conventions, but
your hypothetical translator surely must...?!
does that help? . . . /phil
Post Follow-up to this messageOn Mar 8, 1:33=A0am, phil chastney <phil.hates.s...@amadeus.munged.eclipse.co.uk> wrote: > Gosi wrote: s? may uch > > d > > > > > > > > s > > > c) > > > > > it's important to keep your levels of detail clear here > > "below it" and "above it" are layout specifications, and for that you > need markup, which is not part of an encoding scheme > > there is another layout convention, which places "k=3D1" level with the > lower horizontal of the Sigma, and the "n" level with the upper horizontal=[/color ] > > one convention is (normally) used for displayed formulae, the other for > inline for formulae (i.e, embedded in plaintext) > > as Sam Sirlin has pointed out, there are various markup schemes > > your chosen markup language may or may not support both conventions, but > your hypothetical translator surely must...?! > > does that help? =A0 . . . =A0 /phil In the case of the Sigma we need to interpret three lines together First line with the n Second Sigma Third k=3D1 In general the formulas would be written in a grid like fashion and operations grouped together Interesting to interpret formulas like 1 ----------------------- =3D 1 % (y + 3) * ( x - 2) % 10 (y + 3) * ( x - 2) --------- 10 The formula written in 5 lines
Post Follow-up to this messageGosi wrote: > <snip> > > In the case of the Sigma we need to interpret three lines together > First line with the n > Second Sigma > Third k=1 > > In general the formulas would be written in a grid like fashion and > operations grouped together > Interesting to interpret formulas like > > 1 > ----------------------- = 1 % (y + 3) * ( x - 2) % 10 > (y + 3) * ( x - 2) > --------- > 10 > > The formula written in 5 lines true -- and you will need to apply precedence rules to expressions like sigma-x-squared the difficulties you will face are dependent on the markup language being used, but independent of encoding I wish you and yours the best of luck with this endeavour, but you can count me out -- I hate the inconsistencies of mathematical notation all the best . . . /phil
Post Follow-up to this messageOn Mar 8, 9:59=A0am, phil chastney <phil.hates.s...@amadeus.munged.eclipse.co.uk> wrote: > Gosi wrote: > > > 0 > > > true =A0-- =A0and you will need to apply precedence rules to expressions > like sigma-x-squared > > the difficulties you will face are dependent on the markup language > being used, but independent of encoding I was thinking that it might be interesting to do it the other way round. Take Apl lines of code and display them in mathematical terms. After a short considerations I do not think that would be easy. A small subset in either direction might be easy and interesting. Probably more as an exercise to teach Apl
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.