Home > Archive > PERL Miscellaneous > November 2005 > Re: Help: String search in Windows 2000 doesn't find text in Windows XP: MS Word doc
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: Help: String search in Windows 2000 doesn't find text in Windows XP: MS Word doc
|
|
| Tad McClellan 2005-11-27, 6:58 pm |
| Barry Millman <millmanbarry@hotmail.com> wrote:
> The format for the HYPERLINK that I am
> searching for in the document is:
>
> HYPERLINK "mydoc.doc"
> PROBLEM: The program works on the Windows 2000 machine, but does not
> find the files on the Win Xp machine.
I don't think I can help with that part, but the code is too hokey
to just let it pass...
> ----------- start actual code segment --------------------
> while (/HYPERLINK(\s+.{1,80}?\.doc)/gim) # the "g" causes multiple
> matches
The //m does not do anything, so why is it there?
It changes the meaning of ^ and $, but you don't use those
anchors in your pattern, so you don't need //m.
.{1,80}?
is the same as
.{0,80}
Do you really want to match ' .doc' ?
We can't help you analyse why the match is failing because we
need two things to do that: the pattern and the string that
the pattern is to be matched against.
We have only one of those two things...
>
> {
> $fndxx = $1;
>
> $fndxx =~ s/\"//; # remove leading quote
> $fndxx =~ s/\s+//; # remove leading spaces
Why capture them only to strip them out of the captured string?
Why not just leave them out of the capture in the first place?
while (/HYPERLINK\s+"(.{1,78}\.doc")/gi)
or, probably better:
while (/HYPERLINK\s+"([^"]{1,78}\.doc")/gi)
> $dir="C:\\IGINproducts\\UserDocuments\\";
>
Use single quotes unless you want to make use of one of the two
extra things that double quotes give you (interpolation
and backslash escapes).
Use forward slashes instead of silly slashes unless the path
is going to be fed to the "command interpreter".
$dir='C:/IGINproducts/UserDocuments/';
> print(OUTFILE $fndxx,",",$date_string,", in: ",basename($file),
> "\n") ;
Gak!
Use double quoted strings to concatenate your output string:
print(OUTFILE "$fndxx,$date_string, in: ", basename($file), "\n") ;
> If I try this with a test program (the string to test is in the program
> itself ) it works fine on the XP machine.
If you had shown us your complete test program, then we could
have helped you debug it.
But you didn't, so we can't. (hint)
> I would really appreciate any comments or suggestions about what I am
> doing wrong.
Not posting a short and complete program that we can run that
illustrates your problem.
Have you seen the Posting Guidelines that are posted here frequently?
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
| |
| Purl Gurl 2005-11-27, 6:58 pm |
| Tad McClellan wrote:
(snipped)
> I don't think I can help with that part, but the code is too hokey
> to just let it pass...
Have you helped the author resolve his problem?
Purl Gurl
| |
| Tad McClellan 2005-11-27, 6:58 pm |
| Purl Gurl <purlgurl@purlgurl.net> wrote:
> Tad McClellan wrote:
>
> (snipped)
>
>
> Have you helped the author resolve his problem?
Have you?
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
| |
| Purl Gurl 2005-11-27, 6:58 pm |
| Tad McClellan wrote:
> Purl Gurl wrote:
(snipped)
[color=darkred]
[color=darkred]
[color=darkred]
> Have you?
I have. You have not.
Clearly you are trolling, as is your habit. Before posting this troll
article of yours, you knew I have and do know, I am helping the
author reach resolution of his problem.
This troll article of yours again affirms you are the troll many
of us know you to be, a very persistent troll at that.
Purl Gurl
| |
| Purl Gurl 2005-11-27, 6:58 pm |
| Barry Millman wrote:
(snipped)
> If I examine the MS Word file using a Hex editor, I get the following
> values for bytes 5 through 7 (calling the first byte as zero):
> B1 1A E1
> The 1A is the seventh byte of the file.
> The PERL program (above) seems to stop at this character.
Possible false end of file (eof) signal or a general collapse
of the read filehandle function because of illegal characters
for the specific read mode, ASCII for what you show.
Give binmode a try.
binmode (STDOUT);
open (TEST ....
binmode (TEST);
My sincere suggestion is you pursue your binary files for fun, only.
Should you need to accomplish your task, soon, use your RTF format
or convert your Word documents to plaintext.
Working with binary files via Perl, is very challenging. Perl core is simply
not designed to handle binary data. Perl core is designed to open filehandles
for various functions, tell a system to read or write in a specific mode, but
perl core is not involved in the actual transfer of data, ASCII or binary. Perl
is designed to manipulate "plaintext" data, not binary.
You can be successful in reading and writing binary data, but most likely will
not be successful using Perl to manipulate binary data, such as substr,
index, regex and other functions; Perl is not binary capable.
I have not looked at CPAN for binary handling modules. Have a look. You might
find a module which can be adapted for your needs.
If not, I suggest you stop mucking around with binary data and get your task done. =)
Purl Gurl
| |
| Tad McClellan 2005-11-27, 6:58 pm |
| Barry Millman <millmanbarry@hotmail.com> wrote:
> OK. Sorry about the bad code.
Please do not send stealth Cc's.
That is considered a rude practice, so I'm moving on to
someone else's post...
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
| |
| A. Sinan Unur 2005-11-27, 6:59 pm |
| Tad McClellan <tadmc@augustmail.com> wrote in
news:slrndok4h1.o2e.tadmc@magna.augustmail.com:
> Barry Millman <millmanbarry@hotmail.com> wrote:
>
>
>
> Please do not send stealth Cc's.
>
> That is considered a rude practice, so I'm moving on to
> someone else's post...
Well, he seems to have found a good match (see elsethread) ;-)
Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(reverse each component and remove .invalid for email address)
comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/c...guidelines.html
|
|
|
|
|