For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > October 2006 > Cleaning smart quotes, etc from data pasted to a form









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Cleaning smart quotes, etc from data pasted to a form
Kevin Old

2006-10-05, 7:57 am

Hello everyone,

I have a set of web based admin tools that users in my company use to
update various pieces of a website. I've never been able to write
enough regexes, "clean routines", etc. to clean out all of the "bad
characters" that users put in. The big culprit is of course, good ole
cut and paste.

Like I said, I have several "sanitize" routines that clean control
characters, etc. out of the input fields. Just wondering if others
have found "the solution" for stuff like this.

Thanks for any help,
Kevin
--
Kevin Old
kevinold@gmail.com
Uri Guttman

2006-10-05, 6:58 pm

>>>>> "KO" == Kevin Old <kevinold@gmail.com> writes:

KO> I have a set of web based admin tools that users in my company use
KO> to update various pieces of a website. I've never been able to
KO> write enough regexes, "clean routines", etc. to clean out all of
KO> the "bad characters" that users put in. The big culprit is of
KO> course, good ole cut and paste.

KO> Like I said, I have several "sanitize" routines that clean control
KO> characters, etc. out of the input fields. Just wondering if
KO> others have found "the solution" for stuff like this.

search google for the 'demoronizer'. it is a perl script written a while
ago that cleans up moronic and broken html generated by winblows users
and applications that think they know better but don't.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
Mumia W.

2006-10-05, 6:58 pm

On 10/05/2006 08:47 AM, Kevin Old wrote:
> Hello everyone,
>
> I have a set of web based admin tools that users in my company use to
> update various pieces of a website. I've never been able to write
> enough regexes, "clean routines", etc. to clean out all of the "bad
> characters" that users put in. The big culprit is of course, good ole
> cut and paste.
>
> Like I said, I have several "sanitize" routines that clean control
> characters, etc. out of the input fields. Just wondering if others
> have found "the solution" for stuff like this.
>
> Thanks for any help,
> Kevin


Perhaps you could look at the problem in reverse. Strip out all
characters that are not in a certain set; e.g., you might take anything
that is not a digit, space, tab, alphanumeric character, period, or
comma and delete it.


Chad Perrin

2006-10-05, 6:58 pm

On Thu, Oct 05, 2006 at 09:06:11AM -0500, Mumia W. wrote:
> On 10/05/2006 08:47 AM, Kevin Old wrote:
>
> Perhaps you could look at the problem in reverse. Strip out all
> characters that are not in a certain set; e.g., you might take anything
> that is not a digit, space, tab, alphanumeric character, period, or
> comma and delete it.


That won't work so well for characters that are garbage versions of good
characters that are actually needed. Generally, quotes are there for a
reason, for instance -- so just throwing away "smart quotes" rather than
replacing them with standard vertical ASCII quotes might not be
desirable.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
"A script is what you give the actors. A program
is what you give the audience." - Larry Wall
Mumia W.

2006-10-05, 6:58 pm

On 10/05/2006 09:48 AM, Chad Perrin wrote:
> On Thu, Oct 05, 2006 at 09:06:11AM -0500, Mumia W. wrote:
>
> That won't work so well for characters that are garbage versions of good
> characters that are actually needed. Generally, quotes are there for a
> reason, for instance -- so just throwing away "smart quotes" rather than
> replacing them with standard vertical ASCII quotes might not be
> desirable.
>


You're right and figuring out what is truly garbage and what are garbled
bytes that need to be converted is not trivial. Maybe there's a module
on CPAN...



Chad Perrin

2006-10-05, 6:58 pm

On Thu, Oct 05, 2006 at 10:35:09AM -0500, Mumia W. wrote:
> On 10/05/2006 09:48 AM, Chad Perrin wrote:
>
> You're right and figuring out what is truly garbage and what are garbled
> bytes that need to be converted is not trivial. Maybe there's a module
> on CPAN...


If so, that'd definitely be the way to go. If not, there's potential
for a new module out there.

--
CCD CopyWrite Chad Perrin [ http://ccd.apotheon.org ]
Ben Franklin: "As we enjoy great Advantages from the Inventions of
others we should be glad of an Opportunity to serve others by any
Invention of ours, and this we should do freely and generously."
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com