Home > Archive > Java Help > May 2006 > Using Regular Expressions to process strings with unicode
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Using Regular Expressions to process strings with unicode
|
|
|
| I know that this is a very common problem (but I am pretty new to
regular expressions), but I just can't seem to figure it out. I am
trying to use a regular expression to match up with a string that
contains unicode in it. A really simplified example of the kind of
string I am trying match is:
<HTML><CENTER>Wandless ZIP\u2122
I have tried a bunch of different things like separating the unicode in
my regex into separate brackets, parentheses, etc. but it is either
unsuccessful matching it, or complains about an Unclosed Character
class exception (after the "\u" part). I think that when the regex is
being matched, it translates the unicode to the actually character,
thus not matching the text in the string.
Can anyone offer me some help?
Thanks in advance.
| |
| Oliver Wong 2006-05-25, 7:06 pm |
|
"hust6" <mhust6@gmail.com> wrote in message
news:1148584725.807574.146810@u72g2000cwu.googlegroups.com...
>I know that this is a very common problem (but I am pretty new to
> regular expressions), but I just can't seem to figure it out. I am
> trying to use a regular expression to match up with a string that
> contains unicode in it. A really simplified example of the kind of
> string I am trying match is:
>
> <HTML><CENTER>Wandless ZIP\u2122
>
> I have tried a bunch of different things like separating the unicode in
> my regex into separate brackets, parentheses, etc. but it is either
> unsuccessful matching it, or complains about an Unclosed Character
> class exception (after the "\u" part). I think that when the regex is
> being matched, it translates the unicode to the actually character,
> thus not matching the text in the string.
>
> Can anyone offer me some help?
To clarify, you have some strings, one example of which is
"<HTML><CENTER>Wandless ZIP\u2122", and you're trying to match it against
some regular expressions which you haven't shown. Am I right so far?
If so, then it might help if you told us what you're trying to check.
That the string is composed of all uppercase letters? That it is composed of
only numerals? Something else?
- Oliver
|
|
|
|
|