For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > April 2008 > sort without ignoring hyphens









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author sort without ignoring hyphens
tc314@hotmail.com

2008-03-29, 7:12 pm

When I do string comparisons in perl the strings seem to ignore the
embedded hyphens.
I want to sort strings assuming the 'dictionary' order of the chars is
ASCII order: hypen, 0-9, A-Z.
It appears linux sort also has the problem (LC_ALL is blank).
Any ideas? I want to avoid a brute force char by char sort if
possible.
Thanks

Rob Dixon

2008-03-29, 7:12 pm

tc314@hotmail.com wrote:
>
> When I do string comparisons in perl the strings seem to ignore the
> embedded hyphens.
>
> I want to sort strings assuming the 'dictionary' order of the chars is
> ASCII order: hypen, 0-9, A-Z.
>
> It appears linux sort also has the problem (LC_ALL is blank).
>
> Any ideas? I want to avoid a brute force char by char sort if
> possible.


It appears that your problem is more complex than you have diagnosed.
Look:

use strict;
use warnings;

use List::Util qw/shuffle/;

my @list = shuffle ('-', '0' .. '9', 'A' .. 'Z', 'a' .. 'z');

print join(',', @list), "\n";
print join(',', sort @list), "\n";;

**OUTPUT**

9,L,c,1,M,2,m,J,5,t,8,y,W,N,k,h,Y,b,f,E,
q,P,X,- ,Z,B,I,K,4,V,e,F,x,g,3,H,u,v,R,w,r,T,d,O
,G,7,U,l,z,6,a,s,A,p,0,o,C,i,n,Q,j,D,S
- ,0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J
,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,a,b,c,d
,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x
,y,z


Can you please show us the code and data that is causing you problems?

Rob
John W. Krahn

2008-03-29, 7:12 pm

tc314@hotmail.com wrote:
> When I do string comparisons in perl the strings seem to ignore the
> embedded hyphens.
> I want to sort strings assuming the 'dictionary' order of the chars is
> ASCII order: hypen, 0-9, A-Z.
> It appears linux sort also has the problem (LC_ALL is blank).
> Any ideas? I want to avoid a brute force char by char sort if
> possible.


Please provide an *example* of your data, what it would look like if
sorted "properly", and what it actually looks like after being sorted.


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
tc314@hotmail.com

2008-03-31, 4:39 am

On Mar 29, 4:19 pm, kra...@telus.net (John W. Krahn) wrote:
> tc...@hotmail.com wrote:
>
> Please provide an *example* of your data, what it would look like if
> sorted "properly", and what it actually looks like after being sorted.
>
> John
> --
> Perl isn't a toolbox, but a small machine shop where you
> can special-order certain sorts of tools at low cost and
> in short order. -- Larry Wall



unsorted:
22
2-2
2-3
23
21

linux sort produces:
21
22
2-2
23
2-3

desired sort:
2-2
2-3
21
22
23
(in ASCII order: hyphen (ascii 45) then 0-9 (ascii 48-57) then A-Z
(ascii 65-90))

TIA

Uri Guttman

2008-03-31, 4:41 am

>>>>> "t" == tc314 <tc314@hotmail.com> writes:

t> unsorted:
t> 22
t> 2-2
t> 2-3
t> 23
t> 21

t> linux sort produces:

with what options? i don't get that result

t> 21
t> 22
t> 2-2
t> 23
t> 2-3

i got this which makes more sense as - is earlier in ascii than the
digits. sort defaults to a text sort (same as perl).

2-2
2-3
21
22
23

t> desired sort:
t> 2-2
t> 2-3
t> 21
t> 22
t> 23
t> (in ASCII order: hyphen (ascii 45) then 0-9 (ascii 48-57) then A-Z
t> (ascii 65-90))

that is what linux/unix/gnu sort returns. and perl sort should return
the same thing if you keep it a text sort.

so show a runnable short example of your perl sort with input and output
that shows your results. what you claim above doesn't make sense.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
John W. Krahn

2008-03-31, 4:42 am

tc314@hotmail.com wrote:
>
> On Mar 29, 4:19 pm, kra...@telus.net (John W. Krahn) wrote:
>
> unsorted:
> 22
> 2-2
> 2-3
> 23
> 21
>
> linux sort produces:
> 21
> 22
> 2-2
> 23
> 2-3
>
> desired sort:
> 2-2
> 2-3
> 21
> 22
> 23


It appears to work in Perl:

$ perl -le'@x = qw[22 2-2 2-3 23 21]; print for sort @x'
2-2
2-3
21
22
23



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Dr.Ruud

2008-03-31, 7:56 pm

tc314@hotmail.com schreef:

> unsorted:
> 22
> 2-2
> 2-3
> 23
> 21
>
> linux sort produces:
> 21
> 22
> 2-2
> 23
> 2-3


$ echo '
21
22
2-4
2-2
23
2-3
' |sort -n


2-2
2-3
2-4
21
22
23

--
Affijn, Ruud

"Gewoon is een tijger."
tc314@hotmail.com

2008-03-31, 8:00 pm

On Mar 30, 10:57=A0pm, kra...@telus.net (John W. Krahn) wrote:
> tc...@hotmail.com wrote:
>
>
>
[color=darkred]
>
>
>
>
>
> It appears to work in Perl:
>
> $ perl -le'@x =3D qw[22 2-2 2-3 23 21]; print for sort @x'
> 2-2
> 2-3
> 21
> 22
> 23
>
> John
> --
> Perl isn't a toolbox, but a small machine shop where you
> can special-order certain sorts of tools at low cost and
> in short order. =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-- =

Larry Wall- Hide quoted text -
>
> - Show quoted text -


I'm looking for the perl way of comparing strings.

Your posted code 'diff says memory exhausted need help with perl
Options'
is the basis I want to build from.
But it appears the gt,eq,lt comparisons ignore the hyphens in the
strings.

I really want to avoid slowing things to a crawl by doing brute force
byte
comparisons rather than string compares.

(I'm really looking for a minimal patch to your code to produce the
desired results.)

Thanks


tc314@hotmail.com

2008-03-31, 8:01 pm

On Mar 31, 11:44=A0am, rvtol+n...@isolution.nl (Dr.Ruud) wrote:
> tc...@hotmail.com schreef:
>
>
>
> $ echo '
> 21
> 22
> 2-4
> 2-2
> 23
> 2-3
> ' |sort -n
>
> 2-2
> 2-3
> 2-4
> 21
> 22
> 23
>
> --
> Affijn, Ruud
>
> "Gewoon is een tijger."


That's exactly my point.

The standard sort doesn't act like the sort -n.
So when I do string compares in perl (gt, eq, lt)
they compare like the standard sort.

I want to do string compares like the sort -n.
I presume a byte comparision of each char in the string
will work (at the expense of speed).

Is there a way to do a string comparison in perl
so that the relationship is identical to the sort -n?

TIA

Uri Guttman

2008-04-01, 4:04 am

>>>>> "t" == tc314 <tc314@hotmail.com> writes:

t> On Mar 31, 11:44_am, rvtol+n...@isolution.nl (Dr.Ruud) wrote:[color=darkred]

t> That's exactly my point.

t> The standard sort doesn't act like the sort -n.

this is the first time you mentioned sort -n. i asked you about any sort
options and you didn't answer. nor did you show your perl code. how can
we read your mind without the PSI::ESP module??

t> So when I do string compares in perl (gt, eq, lt)
t> they compare like the standard sort.

perl sort defaults to string compares. we said that already. and it uses
the equivilent of cmp, not gt, eq and lt.

t> I want to do string compares like the sort -n.

and your requested order IS not the output of sort -n:

echo '
21
22
3-4
2-2
23
2-3
' |sort -n


2-2
2-3
3-4
21
22
23

what sort -n is doing is sorting the initial number string. note that
3-4 sorts before 21, 22 and 22 since 3 is less than those. it is not
looking at the -4 at all

there are ways to do it but you need to be more specific on how to do
the sort. you keep changing the specs. first is was unix sort (which is
a string sort like perl's). then it is sort -n but you aren't specifying
if you really want to ignore the - or all the stuff after the -. a
simple addition of 3-4 breaks your specification.

t> Is there a way to do a string comparison in perl so that the
t> relationship is identical to the sort -n?

but is that what you want? please make a cleaner spec with numbers other
than 2- or 21. and perl can sort numerically easily and it is an FAQ
(perldoc -q sort). but if you really want the full number to be sorted
with the - ignored or sorted below digits you need to specify the rules
carefully and create a sort to do it. also doable but not until you
clean up your specs.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
John W. Krahn

2008-04-01, 4:04 am

tc314@hotmail.com wrote:
> On Mar 30, 10:57 pm, kra...@telus.net (John W. Krahn) wrote:
>
> I'm looking for the perl way of comparing strings.
>
> Your posted code 'diff says memory exhausted need help with perl
> Options'
> is the basis I want to build from.
> But it appears the gt,eq,lt comparisons ignore the hyphens in the
> strings.


No they do not.

> I really want to avoid slowing things to a crawl by doing brute force
> byte comparisons rather than string compares.


Perl's string comparison operators *do* compare each byte.

> (I'm really looking for a minimal patch to your code to produce the
> desired results.)


perldoc -q "How do I handle binary data correctly"
perldoc perllocale


John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Dr.Ruud

2008-04-01, 7:05 pm

tc314@hotmail.com schreef:
> John W. Krahn:


>
> I'm looking for the perl way of comparing strings.
>
> Your posted code 'diff says memory exhausted need help with perl
> Options'


Maybe your shell requires dquotes?

--
Affijn, Ruud

"Gewoon is een tijger."
tc314@hotmail.com

2008-04-01, 10:02 pm

On Mar 29, 3:19=A0pm, kra...@telus.net (John W. Krahn) wrote:
> tc...@hotmail.com wrote:
>
> Please provide an *example* of your data, what it would look like if
> sorted "properly", and what it actually looks like after being sorted.
>
> John
> --
> Perl isn't a toolbox, but a small machine shop where you
> can special-order certain sorts of tools at low cost and
> in short order. =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-- =

Larry Wall

Given my data: (echo.txt)
21A
22A
2-4A
2-2A
23A
2-3A
08E
08F
08G
08GA
08H
08-J

I want perl (and linux sort) to see its order as:
08-J
08E
08F
08G
08GA
08H
2-2A
2-3A
2-4A
21A
22A
23A

However (on my system):
1) linux sort: sort echo.txt
produces the undesired result:
08E
08F
08G
08GA
08H
08-J
21A
22A
2-2A
23A
2-3A
2-4A

2) linux sort with -n: sort -n echo.txt
produces the undesired result:
2-2A
2-3A
2-4A
08E
08F
08G
08GA
08H
08-J
21A
22A
23A

3) perl's sort produces the DESIRED result
perl -le'@x =3D qw[21A 22A 2-4A 2-2A 23A 2-3A 08E 08F 08G 08GA 08H 08-
J]; print for sort @x'
08-J
08E
08F
08G
08GA
08H
2-2A
2-3A
2-4A
21A
22A
23A


So, how do I write a perl script to use in place of linux sort since
perl's sort
produces the desired results?

I want perlsort to accept input from STDIN or a filename as an argv.
I don't care about any command line options since perl seems to do
exactly
what I desire. (I thought perl's comparisons were wrong because I used
files sorted
by linux sort, my mistake.)

My files are significantly larger than physical memory.

Any help is appreciated.

Matthew Whipple

2008-04-02, 7:18 pm

LC_ALL=C sort echo.txt

tc314@hotmail.com wrote:
> On Mar 29, 3:19 pm, kra...@telus.net (John W. Krahn) wrote:
>
>
> Given my data: (echo.txt)
> 21A
> 22A
> 2-4A
> 2-2A
> 23A
> 2-3A
> 08E
> 08F
> 08G
> 08GA
> 08H
> 08-J
>
> I want perl (and linux sort) to see its order as:
> 08-J
> 08E
> 08F
> 08G
> 08GA
> 08H
> 2-2A
> 2-3A
> 2-4A
> 21A
> 22A
> 23A
>
> However (on my system):
> 1) linux sort: sort echo.txt
> produces the undesired result:
> 08E
> 08F
> 08G
> 08GA
> 08H
> 08-J
> 21A
> 22A
> 2-2A
> 23A
> 2-3A
> 2-4A
>
> 2) linux sort with -n: sort -n echo.txt
> produces the undesired result:
> 2-2A
> 2-3A
> 2-4A
> 08E
> 08F
> 08G
> 08GA
> 08H
> 08-J
> 21A
> 22A
> 23A
>
> 3) perl's sort produces the DESIRED result
> perl -le'@x = qw[21A 22A 2-4A 2-2A 23A 2-3A 08E 08F 08G 08GA 08H 08-
> J]; print for sort @x'
> 08-J
> 08E
> 08F
> 08G
> 08GA
> 08H
> 2-2A
> 2-3A
> 2-4A
> 21A
> 22A
> 23A
>
>
> So, how do I write a perl script to use in place of linux sort since
> perl's sort
> produces the desired results?
>
> I want perlsort to accept input from STDIN or a filename as an argv.
> I don't care about any command line options since perl seems to do
> exactly
> what I desire. (I thought perl's comparisons were wrong because I used
> files sorted
> by linux sort, my mistake.)
>
> My files are significantly larger than physical memory.
>
> Any help is appreciated.
>
>
>


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com