Code Comments
Programming Forum and web based access to our favorite programming groups.Hi, I am using the following grep/sed combination to extract hyperlinks from a document on Windows 2000. This is a one-line command: grep -o -E "href=\"[^:;+?/\.0-9A-Za-z_-]*\"" Contents.html|sed -e "s/href=\"//" -e "s/\"//" Please could someone tell me how to pass the regular expression part (the argument to grep), to Pattern.compile in Java. I have tried putting \\ before characters which I think may cause problems, but Java still reports errors within the argument passed to Pattern.compile. Note: I realised that I had to remove the ^ before the & as this was only included to make the command valid to the Windows 2000 command interpreter. Any help would be much appreciated. Regards, Jonny
Post Follow-up to this messageOn Sat, 04 Jun 2005 10:20:50 +0000, Jonny wrote: > Note: I realised that I had to remove the ^ before the & as this was > only included to make the command valid to the Windows 2000 command > interpreter. man grep. ;) The ^ inverses the selection, this means you inverted the logic by removing the ^. Don't do this. -- In pioneer days they used oxen for heavy pulling, and when one ox couldn't budge a log, they didn't try to grow a larger ox. We shouldn't be trying for bigger computers, but for more systems of computers. --- Rear Admiral Grace Murray Hopper
Post Follow-up to this messageStefan Schulz wrote: > On Sat, 04 Jun 2005 10:20:50 +0000, Jonny wrote: > > man grep. ;) > > The ^ inverses the selection, this means you inverted the logic by > removing the ^. Don't do this. Thanks for your reply, Stefan. The problem I am having is not how to use grep, but how to pass the regular expression to Pattern.compile without getting Java compilation errors. Do certain characters need to be escaped? Regards, Jonny
Post Follow-up to this messageIn message <6gfoe.7317$%21.3303@newsfe2-gui.ntli.net>, Jonny <www.mail@ntlworld.com> writes >Hi, > >I am using the following grep/sed combination to extract hyperlinks from >a document on Windows 2000. This is a one-line command: > >grep -o -E "href=\"[^:;+?/\.0-9A-Za-z_-]*\"" Contents.html|sed -e >"s/href=\"//" -e "s/\"//" > >Please could someone tell me how to pass the regular expression part >(the argument to grep), to Pattern.compile in Java. I have tried >putting \\ before characters which I think may cause problems, but Java >still reports errors within the argument passed to Pattern.compile. In Java, you can combine the grep and the 2 sed commands. Try something like: href=\"(.*?)\" i.e. look for href=" then collect all characters up to but not including the first ". The ? makes the .* non-greedy. HTH
Post Follow-up to this messageJonny wrote: > Stefan Schulz wrote: > > > > > Thanks for your reply, Stefan. > > The problem I am having is not how to use grep, but how to pass the > regular expression to Pattern.compile without getting Java compilation > errors. Do certain characters need to be escaped? The string to compile is just a string. The only thing that has to be escaped is the backslash. So every backslash in the pattern must be doubled. -- Dale King
Post Follow-up to this messageNemo wrote: > In message <6gfoe.7317$%21.3303@newsfe2-gui.ntli.net>, Jonny > <www.mail@ntlworld.com> writes > > In Java, you can combine the grep and the 2 sed commands. > > Try something like: > > href=\"(.*?)\" > > i.e. look for href=" then collect all characters up to but not including > the first ". > > The ? makes the .* non-greedy. Thanks Nemo. That's exactly what I was looking for. I didn't realise Java used the advanced regexp syntax. The statement I finally used, is: regexp_pattern = Pattern.compile("href=\\\"(.*?)\\\"", Pattern.CASE_INSENSITIVE); Your help is appreciated. Regards, Jonny
Post Follow-up to this messageDale King wrote: > Jonny wrote: > > The string to compile is just a string. The only thing that has to be > escaped is the backslash. So every backslash in the pattern must be doubled.[/colo r] There's more to it than that. Characters also have to be escaped within the regular expression in order to be compiled by the regexp compiler. These characters are also escaped by using a backslash. See my response to Nemo's post, where I have to use three backslashes to escape a double-quote character. A regexp compiler would expect a double-quote to be passed as \", but so would the Java compiler, which in addition expects the backslash itself to be escaped. Hence the need for \\\" Thanks for your reply. Regards, Jonny
Post Follow-up to this messageJonny wrote: > Dale King wrote: > > > > > There's more to it than that. Characters also have to be escaped within > the regular expression in order to be compiled by the regexp compiler. > These characters are also escaped by using a backslash. I assume he knew that. I was referring to what had to be escaped on top of the normal regular expression escapes. > See my response to Nemo's post, where I have to use three backslashes to > escape a double-quote character. A regexp compiler would expect a > double-quote to be passed as \", but so would the Java compiler, which > in addition expects the backslash itself to be escaped. Hence the need > for \\\" D'oh. Yes forgot to mention the quote. My point was that the thing that really trips people up is the fact that backslashes must be escaped. I really wish they would adopt an alternate way to allow you to specify strings that would let you get around the escaping for things like regular expressions and path names. Such a thing has been proposed here before: <http://groups-beta.google.com/group...ae847810dfcb8e1> -- Dale King
Post Follow-up to this messageIn message <fazoe.1861$K5.16@newsfe4-win.ntli.net>, Jonny <www.mail@ntlworld.com> writes >See my response to Nemo's post, where I have to use three backslashes to >escape a double-quote character. A regexp compiler would expect a >double-quote to be passed as \", but so would the Java compiler, which Are you sure that you need to quote " in an RE? However, it won't do any harm, if not required. I now realise my previous posting was a bit ambiguous/incomplete. I quoted the " but left off the enclosing " and " - all of which are needed to turn it into a Java String. I should have written: String pattern="href=\"(.*?)\""; OK?
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.