For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > February 2005 > compacting '..' path segments using File::Spec









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author compacting '..' path segments using File::Spec
ofer@netapt.com

2005-02-01, 4:00 am

Here's the scenario:

I am given a path to a file, which is actually a relative symlink.
Example:

/foo/bar/somelink -> ../somefile

I tried the following code to follow the symlink and then elegantly
combine the two into a final absolute path to the real file:

use File::Spec;
my $symlink = '/foo/bar/somelink';
my $realfile = readlink( $symlink );
unless ( File::Spec->file_name_is_absolute( $realfile ) ) {
my ( $volume, $directories, $file ) = File::Spec->splitpath( $symlink
);
$realfile = File::Spec->rel2abs( $realfile, $directories );
}

It works... but instead of producing '/foo/somefile', which is what I
want, it produces '/foo/bar/../somefile'. It is technically correct,
but not as elegant as I would like, and makes the final result
unnecessarily depend on the continued existance of the 'bar'
subdirectory in order for the path to remain valid (this data is going
into a long-term database).

Any ideas?

-ofer

Josef Moellers

2005-02-01, 3:59 pm

ofer@netapt.com wrote:
> Here's the scenario:
>=20
> I am given a path to a file, which is actually a relative symlink.
> Example:
>=20
> /foo/bar/somelink -> ../somefile
>=20
> I tried the following code to follow the symlink and then elegantly
> combine the two into a final absolute path to the real file:
>=20
> use File::Spec;
> my $symlink =3D '/foo/bar/somelink';
> my $realfile =3D readlink( $symlink );
> unless ( File::Spec->file_name_is_absolute( $realfile ) ) {
> my ( $volume, $directories, $file ) =3D File::Spec->splitpath( $symlink=


> );
> $realfile =3D File::Spec->rel2abs( $realfile, $directories );
> }
>=20
> It works... but instead of producing '/foo/somefile', which is what I
> want, it produces '/foo/bar/../somefile'. It is technically correct,
> but not as elegant as I would like, and makes the final result
> unnecessarily depend on the continued existance of the 'bar'
> subdirectory in order for the path to remain valid (this data is going
> into a long-term database).
>=20
> Any ideas?


Well, since you haven't shown us what you have tried to canonizise the=20
path name, we have few ideas.

1. You could try a regex which replaces /./s by /s and /<anyname>/../s=20
by /s.
2. You could split the pathname and work on the components.

There's even a CPAN module iirc.

--=20
Josef M=F6llers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize
-- T. Pratchett

A. Sinan Unur

2005-02-01, 3:59 pm

Josef Moellers <josef.moellers@fujitsu-siemens.com> wrote in news:ctncm2
$til$1@nntp.fujitsu-siemens.com:

> ofer@netapt.com wrote:

[color=darkred]

....
[color=darkred]

I think you need canonpath:

D:\Home\asu1\UseNet\clpmisc> cat fs.pl
#! perl

use strict;
use warnings;

use File::Spec;

my $p = '../../../asu1/../';
$p = File::Spec->rel2abs($p);
$p = File::Spec->canonpath($p);

print "$p\n";
__END__

D:\Home\asu1\UseNet\clpmisc> fs
D:\Home

Sinan.
Brian McCauley

2005-02-01, 3:59 pm



ofer@netapt.com wrote:
> Here's the scenario:
>
> I am given a path to a file, which is actually a relative symlink.
> Example:
>
> /foo/bar/somelink -> ../somefile
>
> I tried the following code to follow the symlink and then elegantly
> combine the two into a final absolute path to the real file:
>
> use File::Spec;
> my $symlink = '/foo/bar/somelink';
> my $realfile = readlink( $symlink );
> unless ( File::Spec->file_name_is_absolute( $realfile ) ) {
> my ( $volume, $directories, $file ) = File::Spec->splitpath( $symlink
> );
> $realfile = File::Spec->rel2abs( $realfile, $directories );
> }
>
> It works... but instead of producing '/foo/somefile', which is what I
> want, it produces '/foo/bar/../somefile'. It is technically correct,
> but not as elegant


So elegant is more important than correct?

IIRC File::Spec works only with the file spec in astract. It does not
assume the file spec refers to a file system to which it currently has
access. As such it cannot tell if bar is a directory or a symlink.

Cwd::abspath, on the other hand will give you the canonical absolute
path with no symbolic references.

> as I would like, and makes the final result
> unnecessarily depend on the continued existance of the 'bar'
> subdirectory in order for the path to remain valid (this data is going
> into a long-term database).
>
> Any ideas?


Yes, don't do it.

Seriously, if the user chooses to specify the path to the file using one
or more symlinks it is quite possibly because that is what they consider
to be the (logically) canonical location of the data and the absolute
(physically) canonical path is subject to change.

ofer@netapt.com

2005-02-01, 3:59 pm

I'll ignore the two morons and reply to the one who seems to have
understood what I'm trying to accomplish.

By the description of canonpath ('a logical cleanup of a path'), and
your example, it would seem to be what I'm looking for. I threw it in
to my test script... and it didn't change the path at all. It still
returns /foo/bar/../somefile instead of /foo/somefile.

Of course, I'm running on Linux, and I see you're running on DOS or
Windows. So I copied my test script over to my windows desktop,
tweaked it a bit, and tried it. It works!

So it appears canonpath does what I want on DOS/Windows, but not on
Linux.

How bizarre.

Anyways, thanks for the tip. It was valid, even if it doesn't work for
me.

A. Sinan Unur

2005-02-01, 3:59 pm

ofer@netapt.com wrote in news:1107283019.666326.273640
@c13g2000cwb.googlegroups.com:

[ Please provide some context when you are posting. ]

> I'll ignore the two morons


OK, you got me curious, let me look up who those are ... Hmmmm ... It's
Brian McCauley and Josef Moellers who have provided insight and help to
countless people. In fact, I just learned something from reading Brian's
post. Thank you Brian.

Oh, getting back to the topic at hand ...

* PLONK *

I hope you'll enjou Xahzilla's company.

Sinan.
Villy Kruse

2005-02-02, 8:56 am

On 1 Feb 2005 10:36:59 -0800,
ofer@netapt.com <ofer@netapt.com> wrote:


> I'll ignore the two morons and reply to the one who seems to have
> understood what I'm trying to accomplish.
>
> By the description of canonpath ('a logical cleanup of a path'), and
> your example, it would seem to be what I'm looking for. I threw it in
> to my test script... and it didn't change the path at all. It still
> returns /foo/bar/../somefile instead of /foo/somefile.
>


Some news message in this group said quite a while ago that eliminating
/../ sequences might not be the reight thing to do if symbolic links
are involved. If /foo/bar is a symbolic link then .. whont get you back
to /foo but the parent directory to the directory the symbolic link is
pointing to. That is, unless you interpret the .. logically as is often
done by the shell.


Villy
Darren Dunham

2005-02-03, 8:57 pm

ofer@netapt.com wrote:
> By the description of canonpath ('a logical cleanup of a path'), and
> your example, it would seem to be what I'm looking for. I threw it in
> to my test script... and it didn't change the path at all. It still
> returns /foo/bar/../somefile instead of /foo/somefile.


That's be cause on unix filesystems, /foo/bar/../somefile can not
usually be determined to be the same as /foo/somefile.

> Of course, I'm running on Linux, and I see you're running on DOS or
> Windows. So I copied my test script over to my windows desktop,
> tweaked it a bit, and tried it. It works!


> So it appears canonpath does what I want on DOS/Windows, but not on
> Linux.


> How bizarre.


On windows/dos, (no symlinks), the two files are the same.

Even on unix, you can call the windows canonpath directly if you don't
care about it breaking in some cases.

--
Darren Dunham ddunham@taos.com
Senior Technical Consultant TAOS http://www.taos.com/
Got some Dr Pepper? San Francisco, CA bay area
< This line left intentionally blank to confuse you. >
John W. Krahn

2005-02-03, 8:57 pm

In-Reply-To: <toTLd.20$zr.12@newssvr23.news.prodigy.net>
X-Enigmail-Version: 0.85.0.0
X-Enigmail-Supports: pgp-inline, pgp-mime
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 20
Message-ID: <TlxMd.131$Vy6.55@edtnps91>
Date: Thu, 03 Feb 2005 22:13:07 GMT
NNTP-Posting-Host: 154.20.171.111
X-Trace: edtnps91 1107468787 154.20.171.111 (Thu, 03 Feb 2005 15:13:07 MST)
NNTP-Posting-Date: Thu, 03 Feb 2005 15:13:07 MST
Xref: number1.nntp.dca.giganews.com comp.lang.perl.misc:564363

Darren Dunham wrote:
> ofer@netapt.com wrote:
>
>
> That's be cause on unix filesystems, /foo/bar/../somefile can not
> usually be determined to be the same as /foo/somefile.


Sure it can. lstat() both files and if the device numbers and inode numbers
are the same then they are the same file.


John
--
use Perl;
program
fulfillment
ofer@netapt.com

2005-02-19, 3:57 am

Since I asked the question, it's only fair that I share the answer when
I find it. Ken Williams, author of the wonderful Path::Class, had this
to say on the topic:

---
The reason File::Spec and Path::Class don't handle this case is that,
for instance, /data/current could be a symlink to somewhere else. If
/data/current points to /foo/bar, then
/data/current/../backup/thefile-20050211.tgz really points to
/foo/backup/thefile-20050211.tgz , not
/data/backup/thefile-20050211.tgz .

The "correct" way to do this is to use the Cwd.pm module, which has a
realpath() function that will resolve any '.' and '..' components in
the path.
---

Excellent point. In that case, splitting directories or fancy regexs
would have yielded a blantantly incorrect answer. After trying the
Cwd::realpath() function on some hairy tests, it seems it perfectly
follows the twisty maze of symlinks and returns the most beautiful,
canonical path you could ever ask for.

-ofer

Brian McCauley

2005-02-19, 8:56 am



ofer@netapt.com wrote:

> Since I asked the question, it's only fair that I share the answer when
> I find it.


> The "correct" way to do this is to use the Cwd.pm module, which has a
> realpath() function that will resolve any '.' and '..' components in
> the path.


You'll note that I mentioned this in my response dated 2005-02-01.

> Excellent point. In that case, splitting directories or fancy regexs
> would have yielded a blantantly incorrect answer. After trying the
> Cwd::realpath() function on some hairy tests, it seems it perfectly
> follows the twisty maze of symlinks and returns the most beautiful,
> canonical path you could ever ask for.


But as I explained before the correct thing to do is often to do nothing.

On Unix the _physically_ cannonical path to a file is typically subject
to change as filesystems are expediently reorganised. If the user
chooses to specify their perferred _logically_ cannonical path that
resolves via a twisty maze of symlinks then is is counterproductive for
a program to second guess the user.

Indeed I have oft put forward this very assertion as a test to
distinguish between someone who speaks Unix as a foriegn language and
one who speaks Unix as a native. You can tell when you are really at
home in Unix by when the idea of cannoicalizing a file path as you
describe ceases to seem intuatively attractive and starts to appear
intuatively unattractive.

Note: I'm not saying that there are never any situations where physical
cannonicalization is appropriate. I'm just saying that it is
inappropriate in the vast majority of cases.

ofer@netapt.com

2005-02-20, 8:57 pm

> You'll note that I mentioned this in my response dated 2005-02-01.

Yes, you did. I didn't respond to that post because it didn't contain
anything resembling a solution, but you deserve credit for informing us
about that trap (and it is a nasty one).

> On Unix the _physically_ cannonical path to a file is typically

subject
> to change as filesystems are expediently reorganised. If the user
> chooses to specify their perferred _logically_ cannonical path that
> resolves via a twisty maze of symlinks then is is counterproductive

for
> a program to second guess the user.


(sigh) Why do you keep assuming that I'm writing some app that will be
released to users? Why do you assume the situation is such that I
probably shouldn't be cannonicalizing paths? I know you're trying to
be helpful, but I know my application, and I know what I'm trying to
accomplish and why. The only thing I asked for help with was how.

I'm writing back-end code that works off other back-end systems in my
company. I need to cannonicalize paths in order to log which exact
file was processed in a run, which would otherwise be hidden by the
symlink, which is not version specific. This is done on purpose to
make it easier to figure out what file to process. Something like
this:

/somedir/thefile.txt -> thefile.2005021802.txt
/somedir/thefile.2005021801.txt (old)
/somedir/thefile.2005021802.txt (current)
/somedir/thefile.2005021803.txt (in the middle of being created)

So the script processes /somedir/thefile.txt, which is wonderful. But
then I log the fact that it was pointing to
/somedir/thefile.2005021802.txt at the time, so we can investigate
problems later.

The moral of the story is... if you feel someone might be approaching a
problem from the wrong angle, it's extremely thoughtful of you to
mention this, but phrase it like "if the situation is thus, then know
that this is probably not the way to go...", and then preferably also
answer the question. Don't simply assume the situation IS thus, and
say "you're wrong".

-ofer

Martien Verbruggen

2005-02-21, 3:58 am

On 20 Feb 2005 13:57:05 -0800,
ofer@netapt.com <ofer@netapt.com> wrote:
>
> Yes, you did. I didn't respond to that post because it didn't contain
> anything resembling a solution, but you deserve credit for informing us
> about that trap (and it is a nasty one).


It did have the same solution that you posted in your own followup. To
quote from Brian's post:

> Cwd::abspath, on the other hand will give you the canonical
> absolute path with no symbolic references.


Martien
--
|
Martien Verbruggen | Blessed are the Fundamentalists, for they
| shall inhibit the earth.
|
ofer@netapt.com

2005-02-21, 3:58 am

(blink) (blink)
How the heck did I miss that...

Maybe I should just shut up now.

-ofer

David Combs

2005-02-23, 3:59 pm

In article <1108795536.702248.313500@o13g2000cwo.googlegroups.com>,
<ofer@netapt.com> wrote:
>Since I asked the question, it's only fair that I share the answer when
>I find it. Ken Williams, author of the wonderful Path::Class, had this
>to say on the topic:
>
>---
>The reason File::Spec and Path::Class don't handle this case is that,
>for instance, /data/current could be a symlink to somewhere else. If
>/data/current points to /foo/bar, then
>/data/current/../backup/thefile-20050211.tgz really points to
>/foo/backup/thefile-20050211.tgz , not
>/data/backup/thefile-20050211.tgz .
>
>The "correct" way to do this is to use the Cwd.pm module, which has a
>realpath() function that will resolve any '.' and '..' components in
>the path.


I'm surely doing something wrong, or maybe you have a newer emacs
than mine, but I can't find a "sub.*realpath' anywhere:

egrep -in 'sub.*realpath' `gauf Cwd.pm`

(gauf is alias that greps for its arg, in batch-created
via-find huge list of all files on my (standalone) computer)

David




Brian McCauley

2005-02-24, 8:57 pm



David Combs wrote:
> In article <1108795536.702248.313500@o13g2000cwo.googlegroups.com>,
> <ofer@netapt.com> wrote:
>
>
>
> I'm surely doing something wrong, or maybe you have a newer emacs
> than mine,


What does emacs have to do with it?

> I can't find a "sub.*realpath' anywhere


So what?

I just so happens that realpath() is implemented as an alias to another
subroutine.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com