Home > Archive > AWK > March 2007 > Printing substrings of a line ?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Printing substrings of a line ?
|
|
| mmichaelz@gmail.com 2007-03-07, 6:57 pm |
| Hi, I have one or severl long lines in a file.
I'd like to print out substrings in these lines,
preferably each as a single line. The strings are,
<a href="http...pdf"> where ... can vary.
Ideally I'd get out,
http...pdf (string 1)
http...pdf (string 2)
....
http...pdf (string 3)
Thanks for any help and tips.
m
| |
| Vassilis 2007-03-07, 6:57 pm |
|
=CF/=C7 mmichaelz@gmail.com =DD=E3=F1=E1=F8=E5:
> Hi, I have one or severl long lines in a file.
> I'd like to print out substrings in these lines,
> preferably each as a single line. The strings are,
> <a href=3D"http...pdf"> where ... can vary.
> Ideally I'd get out,
>
> http...pdf (string 1)
> http...pdf (string 2)
> ...
> http...pdf (string 3)
>
> Thanks for any help and tips.
> m
Try
awk -F\" '{ gsub(/<a +href=3D/, ""); gsub(/>/, ""); for (i =3D 1; i <=3D NF;
i++) if ($i) print $i }' file.html
| |
| Ed Morton 2007-03-08, 6:57 pm |
| mmichaelz@gmail.com wrote:
> Hi, I have one or severl long lines in a file.
> I'd like to print out substrings in these lines,
> preferably each as a single line. The strings are,
> <a href="http...pdf"> where ... can vary.
> Ideally I'd get out,
>
> http...pdf (string 1)
> http...pdf (string 2)
> ...
> http...pdf (string 3)
>
> Thanks for any help and tips.
> m
>
a) Can there be multiple occurences of these substrings on one line?
b) Can the text of a substring be split across lines?
c) Can any of the text that delimits the substrings (e.g. "<a href=")
appear within the quoted parts (e.g. <a href="http <a href= .pdf"> )?
d) Can the quotation characters appear escaped within the quoted parts
(e.g. <a href="this is a \" character.pdf"> )
I assume the text inside the quotes always ends in an explicit
"<dot>pdf" rather than "<anychar>pdf".
Regards,
Ed.
|
|
|
|
|