Code Comments
Programming Forum and web based access to our favorite programming groups.How do I write a pattern for removing roman numerals? The first 10 is enough. Thanks, Siegfried
Post Follow-up to this messageOn Jun 2, Siegfried Heintze said: > How do I write a pattern for removing roman numerals? The first 10 is > enough. Well, the first ten roman numerals are: I, II, III, IV, V, VI, VII, VIII, IX, X Just put those in a regex. s/\b(I|II|...)\b//g; would remove roman numerals, provided they aren't touching any word characters. -- Jeff "japhy" Pinyan % How can we ever be the sold short or RPI Acacia Brother #734 % the cheated, we who for every service http://japhy.perlmonk.org/ % have long ago been overpaid? http://www.perlmonks.org/ % -- Meister Eckhart
Post Follow-up to this messageOn 6/3/05, Jeff 'japhy' Pinyan <japhy@perlmonk.org> wrote: > On Jun 2, Siegfried Heintze said: >=20 >=20 > Well, the first ten roman numerals are: >=20 > I, II, III, IV, V, VI, VII, VIII, IX, X >=20 > Just put those in a regex. >=20 > s/\b(I|II|...)\b//g; >=20 > would remove roman numerals, provided they aren't touching any word > characters. >=20 > -- > Jeff "japhy" Pinyan % How can we ever be the sold short or This isn't going to get them all; it says to match (between word boundaries) "I" or "II" or any three non-newlines. So it will catch "I", "II", "III", and "VII". It will also catch "I" where it's a pronoun (assuming this is an english text file), and any three-letter words/constructs. I would trysomething like this: s/\bI(?:I+|V|X)?|VI*|XI*\b// Note that this will "I". You may want to go through and get those by hand instead if there is any chance of "I" having another function.=20 If you can identify the context where the numerals appear, you can make it easier on yourself. HTH, -- jay=20 -------------------- daggerquill [at] gmail [dot] com http://www.engatiki.org
Post Follow-up to this messageOn Jun 3, Jay Savage said:
> On 6/3/05, Jeff 'japhy' Pinyan <japhy@perlmonk.org> wrote:
>
>
> This isn't going to get them all; it says to match (between word
> boundaries) "I" or "II" or any three non-newlines. So it will catch
> "I", "II", "III", and "VII". It will also catch "I" where it's a
> pronoun (assuming this is an english text file), and any three-letter
> words/constructs.
I'm sorry, that regex wasn't meant to be taken literally. I just didn't
feel the need to reproduce the alternations *again*.
> I would trysomething like this:
>
> s/\bI(?:I+|V|X)?|VI*|XI*\b//
This will get rid of the "I" in "Ishmael". Your \b anchors aren't
effective on the *entire* pattern. You're matching
\bI(?:I+|V|X)?
or
VI*
or
XI*\b
The regex I would use would probably be
/\b(?:I{1,3}|IV|VI{0,3}|I?X)\b/
--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
http://japhy.perlmonk.org/ % have long ago been overpaid?
http://www.perlmonks.org/ % -- Meister Eckhart
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.