Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

a unicode question?
Hello,
There is a unicode string, I want to change it to ansi string. but
it raise an exception.
Could you help me?

##  I want to change s1 to s2.

s1 =  u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(60002
8) '

s2 =  '\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028
) '


Report this thread to moderator Post Follow-up to this message
Old Post
zdwang@xinces.com
04-10-06 03:03 AM


Re: a unicode question?
What do you mean by "ansi string"?

Here is a superficially not-unreasonable answer to your more specific
question:

# >>> s1 =  u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(60002
8) '
# >>> s2 =  '\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028
) '
# >>> s3 = s1.encode('latin1')
# >>> s2 == s3
# True

But what are you really trying to achieve? Where does your Unicode data
come from? What ranges of characters do you expect it to contain? You
need to crunch it into an 8-bit representation because ... what?


Report this thread to moderator Post Follow-up to this message
Old Post
John Machin
04-10-06 03:03 AM


Re: a unicode question?
Mr. John Machin, Thank you very much!


Report this thread to moderator Post Follow-up to this message
Old Post
zdwang@xinces.com
04-10-06 03:03 AM


Re: a unicode question?
Mr. John Machin

This question come form the flow codes. I use the PyXml to build a DOM
tree.

from xml.dom.ext.reader import HtmlLib
doc =
HtmlLib.FromHtmlUrl('http://stock.business.sohu.com/q/nbcg.php?code=600028')
title_elem = doc.documentElement.getElementsByTagName("TITLE")[0]
title_string = title_elem.firstChild.data
print title_string

# the title_string is unicode, but it is not "latin1" code, so I wantto
change it.


Report this thread to moderator Post Follow-up to this message
Old Post
zdwang@xinces.com
04-10-06 03:03 AM


Re: a unicode question?
zdwang@xinces.com wrote:
> Mr. John Machin
>
> This question come form the flow codes. I use the PyXml to build a DOM
> tree.
>
> from xml.dom.ext.reader import HtmlLib
> doc =
> HtmlLib.FromHtmlUrl('http://stock.business.sohu.com/q/nbcg.php?code=600028
')
> title_elem = doc.documentElement.getElementsByTagName("TITLE")[0]
> title_string = title_elem.firstChild.data
> print title_string
>
> # the title_string is unicode, but it is not "latin1" code, so I wantto
> change it.

Errr, but the title of the page is written in Chinese and it is not
supposed to be crammed into latin1 encoding. What are you trying to do
with the string after you squeezed Chinese into latin1?


Report this thread to moderator Post Follow-up to this message
Old Post
Serge Orlov
04-10-06 09:06 AM


Re: a unicode question?
Errrrrrrr, it get's worse: not only is the title written in Chinese, it
is encoded as gb2312 -- here is the repr() of the first few chunks:

"<html>\n<head>\n    <title> \xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(600028)
 :
\xc4\xd
 a\xb2\xbf\xc8\xcb\xd4\xb1\xb3\xd6\xb9\xc
9 -
\xcb\xd1\xba\xfc\xb9\xc9\xc6\xb1</ti
tle>\n<meta http-equiv='Content-Type' content='text/html;
charset=gb2312'>\n"

and here is what you get after that_guff.decode('gb2312')

u"<html>\n<head>\n    <title>\u4e2d\u56fd\u77f3\u5316(600028) :
\u5185\u90e8\u
4eba\u5458\u6301\u80a1 - \u641c\u72d0\u80a1\u7968</title>\n<meta
http-equiv='Con
tent-Type' content='text/html; charset=gb2312'>\n"

The first 2 characters of the title are recognisable both visually on
the browser title and in the unicode as "zhong guo" i.e. China.

BUT the OP's first message is interpreting that gb2312-encoded stuff as
Unicode:
s1 =  u'\xd6\xd0\xb9\xfa\xca\xaf\xbb\xaf(60002
8) '

*SOMEBODY* is seriously deluded, and it ain't me, and it ain't Serge
:-)

... and yes Peter, info travels faster also from China that it does
from Armenia :-())


Report this thread to moderator Post Follow-up to this message
Old Post
John Machin
04-10-06 01:15 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Python archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 10:20 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.