Home > Archive > PERL Beginners > October 2006 > spaced text
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| Norbert_L. 2006-10-30, 7:03 pm |
| Would someone please guide me with this problem for which I found not
the slighest hint on the web:
I have texts where some words are s p a c e d by adding blanks between
characters. I am looking for a regular expression to get rid of those
spaces.
If this is no good group to ask this question, could someone please
guide me to a better one?
Thank you very much in advance.
| |
| Paul Lalli 2006-10-30, 7:03 pm |
| Norbert_L. wrote:
> Would someone please guide me with this problem for which I found not
> the slighest hint on the web:
>
> I have texts where some words are s p a c e d by adding blanks between
> characters. I am looking for a regular expression to get rid of those
> spaces.
>
> If this is no good group to ask this question, could someone please
> guide me to a better one?
How do you determine which sequences of characters "spaced words" and
which are just one-letter words? How do you determine where one
"spaced word" ends and another begins?
H o w m a n y w o r d s d o I h a v e i n a s e n t e n c e l i k e t h
i s ?
Until you better define your problem set, there's no way to answer your
question.
Paul Lalli
| |
| Norbert_L. 2006-10-30, 7:03 pm |
| Oh, I apologize - I use a language that has no one-letter words
(German, that is). So I thought about looking for two letters or a
punctuation mark, than at least one whitespace, .... and at the end a
whitespace and an end-of-line | -page | printable character. My problem
is to access the unknown number of sequences [alpha blank].
Paul Lalli schrieb:
> Norbert_L. wrote:
>
> How do you determine which sequences of characters "spaced words" and
> which are just one-letter words? How do you determine where one
> "spaced word" ends and another begins?
>
> H o w m a n y w o r d s d o I h a v e i n a s e n t e n c e l i k e t h
> i s ?
>
> Until you better define your problem set, there's no way to answer your
> question.
>
> Paul Lalli
| |
| nobull67@gmail.com 2006-10-30, 7:03 pm |
|
On Oct 25, 3:14 pm, "Norbert_L." <n...@web.de> top-post:
[ please don't top post, it's rude ]
>
>
>
>
>
[color=darkred]
> Oh, I apologize - I use a language that has no one-letter words
> (German, that is). So I thought about looking for two letters or a
> punctuation mark, than at least one whitespace, .... and at the end a
> whitespace and an end-of-line | -page | printable character. My problem
> is to access the unknown number of sequences [alpha blank].
Isn't your question just how to remove all the spacs that are between
single letters?
s{ # look for...
(?<=(?<![[:alpha:]]) # Not preceded by an alpha
([[:alpha:]]) # One letter
\ # One space
(?=[[:alpha:]](?![[:alpha:]])) # One and _only_ one letter
}{$1}xg;
| |
| nobull67@gmail.com 2006-10-30, 7:03 pm |
|
On Oct 25, 3:14 pm, "Norbert_L." <n...@web.de> top-post:
[ please don't top post, it's rude ]
>
>
>
>
>
[color=darkred]
> Oh, I apologize - I use a language that has no one-letter words
> (German, that is). So I thought about looking for two letters or a
> punctuation mark, than at least one whitespace, .... and at the end a
> whitespace and an end-of-line | -page | printable character. My problem
> is to access the unknown number of sequences [alpha blank].
Isn't your question just how to remove all the spacs that are between
single letters?
s{ # look for...
(?<=(?<![[:alpha:]]) # Not preceded by an alpha
([[:alpha:]]) # One letter
\ # One space
(?=[[:alpha:]](?![[:alpha:]])) # One and _only_ one letter
}{$1}xg;
|
|
|
|
|