For Programmers: Free Programming Magazines  


Home > Archive > PHP Language > May 2004 > Re: RegEx to delete // comments NOT in quotes: ( ' ) OR (")???









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: RegEx to delete // comments NOT in quotes: ( ' ) OR (")???
W. D.

2004-05-14, 12:30 pm

Thanks, Bill!

I am going to have to chew on this a while so I understand it
better--it seems fairly complicated.

Bill wrote:
>
> W. D. wrote:
>
> Derived from perlfaq on regex:
>
> #!/usr/bin/perl
>
> my $lines = <<ENDTEXT;
> // Some comments that should be trashed
> Code goes here;
> \$SomeVar = '// These comments should be left alone!';
> /* Some more comments that will remain */
> \" // These comments should also be left alone \"
> # Hash/Pound sign comments should remain for now
> More Code goes here
> { This code should stay // This comment should go
> }
> ENDTEXT
>
> print $lines, "\n\n\n";
>
> # this needs whitespace reformatting, but you're using PHP :{
> $lines =~
> s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#$2#gs;
> print $lines, "\n";


--
Start Here to Find It Fast!™ ->
http://www.US-Webmasters.com/best-start-page/
$8.77 Domain Names -> http://domains.us-webmasters.com/
W. D.

2004-05-14, 5:30 pm

Hi Bill, et. al.,

Am I interpreting this correctly?

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#$2#gs;

RegExPattern = '
# # Opening delimiter
/ # A forward slash
\* # An escaped *
[^*] # Any character (except: *)
* # Zero or more of the previous character
\* # An escaped *
+ # One or more of the previous
( # Begin a logical grouping
[^/*] # Any character (except: / * )
[^*] # Any character (except: * )
* # Zero or more of the previous
\* # An escaped *
+ # One or more of the previous
) # End logical grouping
* # Zero or more of the previous
/ # A slash
| # OR
// # 2 forward slashes
[^\n] # Can't be a newline
* # Zero or more of the previous
| # OR
( # Begin logical grouping
" # A double quote mark
( # Begin a nested logical grouping
\\ # An escaped backslash
. # Any character (except newline \n)
| # OR
[^"\\] # Any character (except " or \ )
) # End logical grouping
* # Zero or more of the previous
" # A quote mark
| # OR
' # A single quote mark
( # Begin logical grouping
\\ # An escaped backslash
. # Any character (except newline \n)
| # OR
[^'\\] # Any character except ' or \
) # End logical grouping
* # Zero or more of the previous
' # A single quote mark
| # OR
. # Any character (except newline \n)
[^/"'\\] # Any character (except these: / " ' \ )
* # Zero or more of the previous
) # End logical grouping
# # Closing delimiter
$2 # ?
#gsx # Modifiers? Global? String?
';



Bill wrote:
>
> W. D. wrote:
>
> Derived from perlfaq on regex:
>
> #!/usr/bin/perl
>
> my $lines = <<ENDTEXT;
> // Some comments that should be trashed
> Code goes here;
> \$SomeVar = '// These comments should be left alone!';
> /* Some more comments that will remain */
> \" // These comments should also be left alone \"
> # Hash/Pound sign comments should remain for now
> More Code goes here
> { This code should stay // This comment should go
> }
> ENDTEXT
>
> print $lines, "\n\n\n";
>
> # this needs whitespace reformatting, but you're using PHP :{
> $lines =~
> s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#$2#gs;
> print $lines, "\n";


--
Start Here to Find It Fast!™ ->
http://www.US-Webmasters.com/best-start-page/
$8.77 Domain Names -> http://domains.us-webmasters.com/
W. D.

2004-05-15, 2:30 am

Thanks, Abigail for your reply.

New RegEx below...

Abigail wrote:
>
> W. D. (NewsGroups@US-Webmasters.com) wrote on MMMCMIX September MCMXCIII
> in <URL:news:40A4579F.7D4@US-Webmasters.com>:
> }} Hi Folks,
> }}
> }} I am about to ship myself to a mental hospital! Can't figure
> }} out a regular expression to strip out comments that begin
> }} with double slashes "//" but are not contained in quotation
> }} marks, either single (') or double (").
>
> Assuming that quoted strings can contain backslashed quotes, as in:
>
> "This is a \"string\" that ends here --->"
>
> one can use Regexp::Common (well, if it's not allowed, it's still
> possible with Regexp::Common): (untested):
>
> use Regexp::Common;
>
> s/($RE{delimited}{-delim => q {'"}})|$RE{comment}{Portia}/$1||""/ge;
>
> Abigail


Here's what I've come up with using a trial and error process:

$TheCode = preg_replace('#^([^"'\/]*)//[^"']*[\n\r]$#mU', '$1',
$TheCode);

Applying this RegEx to the following text,
========================================
===================================


// 1a. Some comments that should be trashed
// 2a. Some comments that should be trashed
// 1b. Some comments that should be trashed
// 2b. Some comments that should be trashed
// 1c. Some comments that should be trashed
// 2c. Some comments that should be trashed





3.
Code goes here;
5.
6. $SomeVar = '// These comments should be left alone!';
7. $SomeVar = ' // These comments should be left alone!';
8.
/* 9. Some more comments that will remain */
10.
"// 11. These comments should also be left alone "
" // 12. These comments should also be left alone "
" // 13. These comments should also be left alone "
# 14. Hash/Pound sign comments should remain for now
15.
16. More Code goes here
17.
{ 18. This code should stay // 19. This comment should go
20.
} 21.
"This is a \"string\" that ends here 22. --->"
" 23. This is a \"string\" that ends here --->"
// 24. Some comments that should be trashed
// 25. Some comments that should be trashed
// 26. Some comments that should be trashed
// 27. Some comments that should be trashed
// 28. Some comments that should be trashed
// 29. Some comments that should be trashed

========================================
===================================

Produces:
========================================
===================================












3.
Code goes here;
5.
6. $SomeVar = '// These comments should be left alone!';
7. $SomeVar = ' // These comments should be left alone!';
8.
/* 9. Some more comments that will remain */
10.
"// 11. These comments should also be left alone "
" // 12. These comments should also be left alone "
" // 13. These comments should also be left alone "
# 14. Hash/Pound sign comments should remain for now
15.
16. More Code goes here
17.
{ 18. This code should stay
20.
} 21.
"This is a \"string\" that ends here 22. --->"
" 23. This is a \"string\" that ends here --->"








========================================
===================================

========================================
================================
// Here is the breakdown of how this works:
// ' // Opening quote to contain the RegEx String
// # // Opening RegEx delimiter. Using # because
//
// is used in the match
// ^ // Start at the beginning of the string
// ( // Begin capture section 1
// [^"'\/] // Any character except these 3: " ' /
// * // Zero or more of the previous
// ) // Close capture section 1
// // // Must have the double slashes // on the line
// [^"'] // Any character except these 2: " '
// * // Zero or more of the previous
// [\n\r] // End of line character
// $ // End of string marker
// # // Closing RegEx delimiter
// m // Multi-line string modifier
// U // Ungreedy modifier
// ' // Closing quote for RegEx string
========================================
=====================================

This appears to strip out all // comments that are NOT contained in
quote marks, whether single or double.

However, it doesn't remove the line completely if there are only
blank spaces remaining after the RegEx operation. Oh, well. I guess
these can be removed with a subsequent RegEx.

Any comments or suggestions?

--
Start Here to Find It Fast!™ ->
http://www.US-Webmasters.com/best-start-page/
$8.77 Domain Names -> http://domains.us-webmasters.com/
Abigail

2004-05-15, 7:30 am

W. D. (NewsGroups@US-Webmasters.com) wrote on MMMCMX September MCMXCIII
in <URL:news:40A5B018.6932@US-Webmasters.com>:
--
-- ========================================
================================
-- // Here is the breakdown of how this works:
-- // ' // Opening quote to contain the RegEx String
-- // # // Opening RegEx delimiter. Using # because
-- //
-- // is used in the match
-- // ^ // Start at the beginning of the string
-- // ( // Begin capture section 1
-- // [^"'\/] // Any character except these 3: " ' /
-- // * // Zero or more of the previous
-- // ) // Close capture section 1
-- // // // Must have the double slashes // on the line
-- // [^"'] // Any character except these 2: " '
-- // * // Zero or more of the previous
-- // [\n\r] // End of line character
-- // $ // End of string marker
-- // # // Closing RegEx delimiter
-- // m // Multi-line string modifier
-- // U // Ungreedy modifier
-- // ' // Closing quote for RegEx string
-- ========================================
=====================================

I've no idea which version of Perl you are using, but I've never heard
of an 'ungreedy' modifier. Anyway, your description suggests that you
don't strip out comments containing quotes. That is, you leave

// This is a "comment"

as is.


Abigail
--
perl -e '* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %;
BEGIN {% % = ($ _ = " " => print "Just Another Perl Hacker\n")}'
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com