Home > Archive > PERL Modules > March 2006 > CJK Unified unicode translator
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
CJK Unified unicode translator
|
|
| Dennis Roesler 2006-03-22, 6:58 pm |
| Does any one know of a translator, preferably one implemented in perl,
that will translate CJK Unified code points to their respective language
code points. If I understand the concept of the CJK Unified code
points, these code points render as glyphs that are basically the same
in Chinese, Japanese or Korean.
My problem is that the target application I'm working with can't render
the CJK Unified code point because it is expecting, and can only handle,
JIS code points, but some of the data being fed to it is in CJK Unified
code points.
My search through CPAN didn't show anything obvious, at least to me.
Any pointers or suggestions would be appreciated.
Dennis
d underscore roesler at agilent dot com
| |
| harryfmudd [AT] comcast [DOT] net 2006-03-23, 3:58 am |
| Dennis Roesler wrote:
> Does any one know of a translator, preferably one implemented in perl,
> that will translate CJK Unified code points to their respective language
> code points. If I understand the concept of the CJK Unified code
> points, these code points render as glyphs that are basically the same
> in Chinese, Japanese or Korean.
>
> My problem is that the target application I'm working with can't render
> the CJK Unified code point because it is expecting, and can only handle,
> JIS code points, but some of the data being fed to it is in CJK Unified
> code points.
>
> My search through CPAN didn't show anything obvious, at least to me. Any
> pointers or suggestions would be appreciated.
>
> Dennis
> d underscore roesler at agilent dot com
Have you looked at the Encode module? It might be as simple as opening
an input file specifying the CJK encoding, an output file specifying
JIS, and reading and writing. See "Encoding via PerlIO for this
particular slant on things.
Tom Wyant
| |
| Dennis Roesler 2006-03-23, 7:58 am |
| harryfmudd [AT] comcast [DOT] net wrote:
> Dennis Roesler wrote:
I found Unicode::Unihan after more research introduced the Unihan term :-(.
[color=darkred]
>
> Have you looked at the Encode module? It might be as simple as opening
> an input file specifying the CJK encoding, an output file specifying
> JIS, and reading and writing. See "Encoding via PerlIO for this
> particular slant on things.
I've looked at this, but there doesn't seem to be an encoding that is
CJK Unified specific.
I've tried the following using this example from the Encode docs but
when I write the data out it complains about the CJK stuff that isn't
shiftjis. I toss the xml encoding line and rewrite it with encoding as
shiftjis, but besides the above errors XML::Simple complains that it
can't find Shift_JIS encoding and won't parse the file.
use Encode;
open my $in, "<:encoding(utf8)", $infile or die "In $infile: $!";
open my $out, ">:encoding(shiftjis)", $outfile or die "Out $outfile: $!";
my $fline = <$in>;
print $out qq~<?xml version="1.0" encoding="Shift_JIS"?>~;
while(<$in> ){ print $out $_; }
I could change the workflow and have XML::Simple handle the UTF-8 file
and then use Encode's from_to function, or Unicode::Unihan, to do the
conversion.
Dennis
d underscore roesler at agilent dot com
|
|
|
|
|