Home > Archive > PERL Programming > April 2006 > Convert plain text to html formatted text
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Convert plain text to html formatted text
|
|
| Justin C 2006-04-25, 9:57 pm |
|
To keep users away from the dirty end of html/perl-cgi I want to
automate some web-pages. The easiest way I can see of doing this is that
the users submit their data for this web-site in plain text. I know the
first line will always be <h1> and the second line onwards will be just
plain <p>, each paragraph will be separated by an extra CR (these text
files will be generated on OS X clients so I shouldn't have trouble with
MS CR/LF combinations).
I can see how I can get the first line to format it accordingly but I'm
not sure how to read the rest of the text, specifically: how to spot two
CRs (must remember not to chomp!).
So, I suppose the question is, how do I spot two consecutive CRs in a
string?
I have to do this without modules as the site will be hosted on a server
that has unknown modules and also on which I can't add or request
modules.
If external modules would make it all simpler then I can, probably,
pre-process the text files with a perl script locally before uploading
so they can just be dropped into the CGI output.
Thanks for any suggestions you care to make.
Justin.
--
Justin C, by the sea.
| |
| Gunnar Hjalmarsson 2006-04-25, 9:57 pm |
| Justin C wrote:
> how do I spot two consecutive CRs in a string?
/\n\n/
> I have to do this without modules as the site will be hosted on a server
> that has unknown modules and also on which I can't add or request
> modules.
You can always add modules, at least many of the pure Perl modules.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
| |
| Matt Garrish 2006-04-25, 9:57 pm |
|
"Justin C" <justin.0604@purestblue.com> wrote in message
news:107d.444e9f5b.ecbaa@stigmata...
>
> To keep users away from the dirty end of html/perl-cgi I want to
> automate some web-pages. The easiest way I can see of doing this is that
> the users submit their data for this web-site in plain text. I know the
> first line will always be <h1> and the second line onwards will be just
> plain <p>, each paragraph will be separated by an extra CR (these text
> files will be generated on OS X clients so I shouldn't have trouble with
> MS CR/LF combinations).
>
my @paras = split(/\r\r/, $userinput);
my $h1 = shift @paras;
foreach my $para (@paras) {
# do whatever
}
Matt
| |
| Jim Gibson 2006-04-25, 9:57 pm |
| In article <107d.444e9f5b.ecbaa@stigmata>, Justin C
<justin.0604@purestblue.com> wrote:
> To keep users away from the dirty end of html/perl-cgi I want to
> automate some web-pages. The easiest way I can see of doing this is that
> the users submit their data for this web-site in plain text. I know the
> first line will always be <h1> and the second line onwards will be just
> plain <p>, each paragraph will be separated by an extra CR (these text
> files will be generated on OS X clients so I shouldn't have trouble with
> MS CR/LF combinations).
>
> I can see how I can get the first line to format it accordingly but I'm
> not sure how to read the rest of the text, specifically: how to spot two
> CRs (must remember not to chomp!).
>
> So, I suppose the question is, how do I spot two consecutive CRs in a
> string?
>
> I have to do this without modules as the site will be hosted on a server
> that has unknown modules and also on which I can't add or request
> modules.
>
> If external modules would make it all simpler then I can, probably,
> pre-process the text files with a perl script locally before uploading
> so they can just be dropped into the CGI output.
>
> Thanks for any suggestions you care to make.
If you are reading one line at a time, then two CRs will result in a
blank line being read for the second one, whether or not you use chomp.
You can also read more than one line at a time to detect the end of a
paragraph signified by two CRs: perldoc -q paragraphs "How can I read
in a file by paragraphs?"
If you already have the entire text in a string, then you can use index
to locate two CRs in the string, or use split on two CRs to split the
string into paragraphs.
| |
| Joe Smith 2006-04-26, 3:58 am |
| Justin C wrote:
> each paragraph will be separated by an extra CR (these text
> files will be generated on OS X clients so I shouldn't have trouble with
> MS CR/LF combinations).
Are you sure the data being read by the CGI will have only CR
and not LF? The fact that Macs use \r inside files is somewhat
irrelevant to when it comes to data posted from an HTTP client.
Will you be handling multipart/form-data and application/x-www-form-urlencoded
in Content-Type headers?
-Joe
| |
| Justin C 2006-04-26, 6:58 pm |
| On 2006-04-26, Jim Gibson <jgibson@mail.arc.nasa.gov> wrote:
>
[snip]
> If you are reading one line at a time, then two CRs will result in a
> blank line being read for the second one, whether or not you use chomp.
> You can also read more than one line at a time to detect the end of a
> paragraph signified by two CRs: perldoc -q paragraphs "How can I read
> in a file by paragraphs?"
>
That would do it exactly, the trouble I have is that I don't know what I
should search for in perldoc!
> If you already have the entire text in a string, then you can use index
> to locate two CRs in the string, or use split on two CRs to split the
> string into paragraphs.
I can see from 'index' how to locate parts of the text but not how to
replace. I've gone with putting it all into one string and then doing
basic substitution - it's a bit ugly but doesn't appear too slow.
Thanks to all who replied. I still feel I'm new to Perl, I read the
Llama book two or three years ago now but don't get to write much code.
I've had to write a whole bunch recently for various automated web-pages
and I've learned a lot from doing so. When I look back over my early
code I find so many ways of improving it - especially it's readability.
Perhaps, in a few more years I'll look at this lot I've just written and
be able to improve it to the same extent. I like it when you can review
your code and you can find a way of making that 20 lines of ugly code
into 5 lines of clean, readable code.
Thanks again for your time and suggestions.
Justin.
--
Justin C, by the sea.
| |
| Justin C 2006-04-26, 6:58 pm |
| On 2006-04-25, Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
> Justin C wrote:
>
> /\n\n/
>
>
> You can always add modules, at least many of the pure Perl modules.
Do you mean instead of using CPAN to install modules I can just install
them to somewhere I have control and then call them in the script? I'm
guessing I'd have to specify the path too but that's no problem.
That's an interesting idea if I have understood you correctly.
It seems with Perl I just keep learning!
Justin.
--
Justin C, by the sea.
| |
| Justin C 2006-04-26, 6:58 pm |
| On 2006-04-25, Matt Garrish <matthew.garrish@sympatico.ca> wrote:
>
> "Justin C" <justin.0604@purestblue.com> wrote in message
> news:107d.444e9f5b.ecbaa@stigmata...
>
> my @paras = split(/\r\r/, $userinput);
>
> my $h1 = shift @paras;
>
> foreach my $para (@paras) {
> # do whatever
> }
That's interesting, thanks for posting. I'm going with another posters
suggestion of getting the text as one whole string, the substitutions
seem quite trivial after that.
Justin.
--
Justin C, by the sea.
| |
| Justin C 2006-04-26, 6:58 pm |
| On 2006-04-26, Joe Smith <joe@inwap.com> wrote:
> Justin C wrote:
>
> Are you sure the data being read by the CGI will have only CR
> and not LF? The fact that Macs use \r inside files is somewhat
> irrelevant to when it comes to data posted from an HTTP client.
>
> Will you be handling multipart/form-data and application/x-www-form-urlencoded
> in Content-Type headers?
Turns out they're \n's anyway. The files aren't being uploaded by
users, they're being saved to a specific place on a server and they're
being sourced from their by another perl script that then
uploads/mirrors our local web-site on that hosted by our ISP.
Thanks anyway.
Justin.
--
Justin C, by the sea.
| |
| Gunnar Hjalmarsson 2006-04-26, 6:58 pm |
| Justin C wrote:
> On 2006-04-25, Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
>
> Do you mean instead of using CPAN to install modules I can just install
> them to somewhere I have control and then call them in the script?
Yes.
> I'm guessing I'd have to specify the path too but that's no problem.
Se the docs of a module I wrote for an example:
http://search.cpan.org/perldoc? CGI...allation
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
| |
| Justin C 2006-04-27, 9:57 pm |
| On 2006-04-26, Gunnar Hjalmarsson <noreply@gunnar.cc> wrote:
> Justin C wrote:
>
> Yes.
>
>
> Se the docs of a module I wrote for an example:
> http://search.cpan.org/perldoc? CGI...allation
That's very interesting, thank you. I may have use for this... at least,
I did have a use for this! I'll have to try and remember what it was.
Justin.
--
Justin C, by the sea.
| |
| Jim Gibson 2006-04-27, 9:57 pm |
| In article <2792.444fef2e.40b77@stigmata>, Justin C
<justin.0604@purestblue.com> wrote:
> On 2006-04-26, Jim Gibson <jgibson@mail.arc.nasa.gov> wrote:
> [snip]
>
> That would do it exactly, the trouble I have is that I don't know what I
> should search for in perldoc!
Yes, I know the problem! The Perl Cookbook is a good source of complete
programs doing common tasks. If you can find one there that is close,
you can adapt for your needs.
>
> I can see from 'index' how to locate parts of the text but not how to
> replace. I've gone with putting it all into one string and then doing
> basic substitution - it's a bit ugly but doesn't appear too slow.
The substr function can be used in conjunction with index to do a
substitution and avoid regular expressions. You can use substr on the
left-hand-side of an assignment statement or use the 4-argument form
with a replacement string. It is not as elegant or as concise as using
regular expressions and the substitute operator, but it can be quicker.
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
| |
| Justin C 2006-04-30, 6:59 pm |
| On 2006-04-28, Jim Gibson <jgibson@mail.arc.nasa.gov> wrote:
> In article <2792.444fef2e.40b77@stigmata>, Justin C
><justin.0604@purestblue.com> wrote:
>
>
> Yes, I know the problem! The Perl Cookbook is a good source of complete
> programs doing common tasks. If you can find one there that is close,
> you can adapt for your needs.
Dammit! I've got that book too! I've hardly picked it up. I should
really use it from time to time... it cost enough.
>
>
>
> The substr function can be used in conjunction with index to do a
> substitution and avoid regular expressions. You can use substr on the
> left-hand-side of an assignment statement or use the 4-argument form
> with a replacement string. It is not as elegant or as concise as using
> regular expressions and the substitute operator, but it can be quicker.
Oooh. That looks messy (combining index and substr). Still, I think I'll
set myself a task and give it a try, then compare the solutions. If
nothing else I'll learn more perl.
Thank you for your reply.
Justin.
--
Justin C, by the sea.
|
|
|
|
|