For Programmers: Free Programming Magazines  


Home > Archive > SQL Server Programming > February 2006 > How to compare strings for equality in Transact SQL









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author How to compare strings for equality in Transact SQL
Bill

2006-02-16, 7:04 pm

I just learned that if a VARCHAR column contains 'ABC ' (three letters
and a space) then

SELECT * FROM ATABLE WHERE THECOLUMN = 'ABC'

will return the record since SQL Server incorrectly pads the shorter
value with spaces to the length of the longer before comparing them. Is
there a way to compare strings for equality that works correctly?

--
..Bill.
Grant

2006-02-16, 7:04 pm

trim?

SELECT * FROM ATABLE WHERE trim(THECOLUMN) = 'ABC'



"Bill" <no@no.com> wrote in message
news:uhKlUmxMGHA.3728@tk2msftngp13.phx.gbl...
>I just learned that if a VARCHAR column contains 'ABC ' (three letters
> and a space) then
>
> SELECT * FROM ATABLE WHERE THECOLUMN = 'ABC'
>
> will return the record since SQL Server incorrectly pads the shorter
> value with spaces to the length of the longer before comparing them. Is
> there a way to compare strings for equality that works correctly?
>
> --
> .Bill.



JT

2006-02-16, 7:04 pm

THECOLUMN like 'ABC'

or

THECOLUMN+'$' = 'ABC'+'$'

"Bill" <no@no.com> wrote in message
news:uhKlUmxMGHA.3728@tk2msftngp13.phx.gbl...
>I just learned that if a VARCHAR column contains 'ABC ' (three letters
> and a space) then
>
> SELECT * FROM ATABLE WHERE THECOLUMN = 'ABC'
>
> will return the record since SQL Server incorrectly pads the shorter
> value with spaces to the length of the longer before comparing them. Is
> there a way to compare strings for equality that works correctly?
>
> --
> .Bill.



Jack Vamvas

2006-02-16, 7:04 pm

In SQL Server use RTRIM

--
Jack Vamvas
________________________________________
__________________________
Receive free SQL tips - register at www.ciquery.com/sqlserver.htm
New article by Jack Vamvas - Improper Use of indexes on MS SQL: Server
2000 - www.ciquery.com/articles/useofindexes.asp
"Bill" <no@no.com> wrote in message
news:uhKlUmxMGHA.3728@tk2msftngp13.phx.gbl...
> I just learned that if a VARCHAR column contains 'ABC ' (three letters
> and a space) then
>
> SELECT * FROM ATABLE WHERE THECOLUMN = 'ABC'
>
> will return the record since SQL Server incorrectly pads the shorter
> value with spaces to the length of the longer before comparing them. Is
> there a way to compare strings for equality that works correctly?
>
> --
> .Bill.



Bill

2006-02-16, 7:04 pm

Grant wrote:

> trim?
>
> SELECT * FROM ATABLE WHERE trim(THECOLUMN) = 'ABC'


I must be missing something. Using trim gets the same incorrect result
as not using trim. The values 'ABC ' and 'ABC' are NOT equal. If I have
a table with a VARCHAR column that contains two rows with the values
'ABC ' and 'ABC', how do I write a SELECT statement that will return
the row that contains 'ABC' but not the row that contains 'ABC '?


--
..Bill.

JT

2006-02-16, 7:04 pm

I think this is the problem he is having:

if 'ABC ' = 'ABC' print 'True' else print 'False'

Result: True

I am assuming he wants this condition to be False. In that base, he actually
doesn't want to trim the spaces.

"Jack Vamvas" <DELETE_BEFORE_REPLY_jack@ciquery.com> wrote in message
news:dt2c11$m5j$1@nwrdmz01.dmz.ncs.ea.ibs-infra.bt.com...
> In SQL Server use RTRIM
>
> --
> Jack Vamvas
> ________________________________________
__________________________
> Receive free SQL tips - register at www.ciquery.com/sqlserver.htm
> New article by Jack Vamvas - Improper Use of indexes on MS SQL: Server
> 2000 - www.ciquery.com/articles/useofindexes.asp
> "Bill" <no@no.com> wrote in message
> news:uhKlUmxMGHA.3728@tk2msftngp13.phx.gbl...
>
>



Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

Grant and Jack, I think the OP wants to consider 'ABC' and 'ABC ' to be
*different* (in most configurations, SQL Server considers them equal because
the trailing spaces on varchar columns are ignored).

So, using RTRIM() does not help because it yields the same result the OP is
already getting.




"Jack Vamvas" <DELETE_BEFORE_REPLY_jack@ciquery.com> wrote in message
news:dt2c11$m5j$1@nwrdmz01.dmz.ncs.ea.ibs-infra.bt.com...
> In SQL Server use RTRIM
>
> --
> Jack Vamvas
> ________________________________________
__________________________
> Receive free SQL tips - register at www.ciquery.com/sqlserver.htm
> New article by Jack Vamvas - Improper Use of indexes on MS SQL: Server
> 2000 - www.ciquery.com/articles/useofindexes.asp
> "Bill" <no@no.com> wrote in message
> news:uhKlUmxMGHA.3728@tk2msftngp13.phx.gbl...
>
>



Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

T-SQL has RTRIM() and LTRIM() but no TRIM().

For other info, please see my reply to Jack.


"Grant" <email@nowhere.com> wrote in message
news:ejQpurxMGHA.3144@TK2MSFTNGP11.phx.gbl...
> trim?
>
> SELECT * FROM ATABLE WHERE trim(THECOLUMN) = 'ABC'



JT

2006-02-16, 7:04 pm

The LIKE comparison operator does not have this quirk.

"Bill" <no@no.com> wrote in message
news:uVunH4xMGHA.1032@TK2MSFTNGP11.phx.gbl...
> Grant wrote:
>
>
> I must be missing something. Using trim gets the same incorrect result
> as not using trim. The values 'ABC ' and 'ABC' are NOT equal. If I have
> a table with a VARCHAR column that contains two rows with the values
> 'ABC ' and 'ABC', how do I write a SELECT statement that will return
> the row that contains 'ABC' but not the row that contains 'ABC '?
>
>
> --
> .Bill.
>



Bill

2006-02-16, 7:04 pm

JT wrote:

> THECOLUMN like 'ABC'


Tried that. LIKE returns the same incorrect result as = does. It thinks
that 'ABC ' and 'ABC' are equal, which they most certainly are not.

>
> or
>
> THECOLUMN+'$' = 'ABC'+'$'


That will work but since an index cannot be used it will be slow on a
large table.

--
..Bill.

Bill

2006-02-16, 7:04 pm

Jack Vamvas wrote:

> In SQL Server use RTRIM


RTRIM will make the two values equal which is not what I want. Given a
table that contains two rows with the values 'ABC ' and 'ABC' in a
VARCHAR column I want to know how to select just the row whose value is
'ABC' and not get the row that contains 'ABC '.

--
..Bill.
Bill

2006-02-16, 7:04 pm

JT wrote:

> I am assuming he wants this condition to be False.


That is correct.

--
..Bill.
Bill

2006-02-16, 7:04 pm

Aaron Bertrand [SQL Server MVP] wrote:

> Grant and Jack, I think the OP wants to consider 'ABC' and 'ABC ' to
> be different


That is exactly what I want.

--
..Bill.
Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

> will return the record since SQL Server incorrectly pads the shorter
> value with spaces to the length of the longer before comparing them.


Actually, SQL Server is ignoring the trailing spaces, but the result is the
same.

There are a few ways to solve this, JT's solution is pretty simple to
implement...


Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

Sure it does...

IF 'ABC ' LIKE 'ABC' PRINT 'True' ELSE PRINT 'False';





"JT" <someone@microsoft.com> wrote in message
news:umoMP6xMGHA.2560@TK2MSFTNGP09.phx.gbl...
> The LIKE comparison operator does not have this quirk.



Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

> That will work but since an index cannot be used it will be slow on a
> large table.


Do you actually have an index on this column? Can you explain the business
rule that dictates where a trailing space changes the meaning of this value?
Maybe show some examples of where ABC and ABC_ really mean different things?

Anyway, here is a slightly different approach that will use an index s on
bar if possible, but has an extra computation (datalength() does not ignore
trailing spaces):






use tempdb;
go

create table #foo (bar varchar(5));
create index b ON #foo(bar);

set nocount on;
insert #foo(bar) select 'ABC';
insert #foo(bar) select 'ABC ';

SELECT bar, DATALENGTH(bar)
FROM #foo
WHERE bar = 'ABC'
AND DATALENGTH(bar) = DATALENGTH(RTRIM(bar));

drop table #foo;


KBuser

2006-02-16, 7:04 pm

I would go with like. Trimming the column will produce the same effect
he is having now, as it would simply cut the space off the ABC and then
compare it, returning another false match.

Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

>I would go with like.

Can you show an example that produces the correct results and uses the index
in a positive way?


Bill

2006-02-16, 7:04 pm

Aaron Bertrand [SQL Server MVP] wrote:

> Do you actually have an index on this column?


No.

> Can you explain the
> business rule that dictates where a trailing space changes the
> meaning of this value? Maybe show some examples of where ABC and ABC_
> really mean different things?


Suffice it to say that I did not create this mess.<g> The database is
part of a vertical market application. My daddy told me, "if you can't
say something nice about someone, don't say anything" and I have
absolutely nothing to say about the folks that developed this mess. :)

That aside, the values are different and as far as I am concerned this
is a bug in SQL Server. This behavior certainly does not conform to any
rules of string comparison that I have every heard of. If two strings
have different lengths they are not equal.

>
> Anyway, here is a slightly different approach that will use an index
> s on bar if possible, but has an extra computation (datalength()
> does not ignore trailing spaces):


Thanks for the solutions. I doubt that I would ever have thought of
that approach.

--
..Bill.
Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

>> Do you actually have an index on this column?
>
> No.


Then you are going to see negligible differences between =, LIKE, and doing
any kind of conversion on the lhs or rhs.

> That aside, the values are different and as far as I am concerned this
> is a bug in SQL Server.


I believe most will disagree with you, and at least part of it has to do
with ANSI standards. See the topic SET ANSI_PADDING in Books Online.


Bill

2006-02-16, 7:04 pm

Aaron Bertrand [SQL Server MVP] wrote:

>
> Can you show an example that produces the correct results and uses
> the index in a positive way?


One problem with the LIKE solution is that it requires a detailed
comment everywhere it is used so anyone that looks at the SQL will
understand that the left/right relationship is crucial to get the
expected behavior. Because it will use an index and is clear on its own
I prefer Aaron's solution using DataLength.

My thanks to everyone. This has been a valuable education.

--
..Bill.
Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

> Because it will use an index

But you don't even have an index to use!


JT

2006-02-16, 7:04 pm

Too late, he's already out the door.

"Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message
news:%23hzbtkyMGHA.740@TK2MSFTNGP12.phx.gbl...
>
> But you don't even have an index to use!
>



Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

> Too late, he's already out the door.

His choice of which door to leave through was likely based on whether or not
his suitcase* fit through it.

* [He didn't bring his suitcase.]


Bill

2006-02-16, 7:04 pm

Aaron Bertrand [SQL Server MVP] wrote:

> I believe most will disagree with you, and at least part of it has to
> do with ANSI standards. See the topic SET ANSI_PADDING in Books
> Online.


My understanding of ANSI_PADDING is that it only affects the storage of
trailing spaces in VARCHAR fields and of trailing zeros in VARBINARY
fields. The following is from BOL.

----------------
When set to ON, trailing blanks in character values inserted into
varchar columns and trailing zeros in binary values inserted into
varbinary columns are not trimmed. Values are not padded to the length
of the column. When set to OFF, the trailing blanks (for varchar) and
zeros (for varbinary) are trimmed. This setting affects only the
definition of new columns.

Char(n) and binary(n) columns that allow nulls are padded to the length
of the column when SET ANSI_PADDING is set to ON, but trailing blanks
and zeros are trimmed when SET ANSI_PADDING is OFF. Char(n) and
binary(n) columns that do not allow nulls are always padded to the
length of the column.
-----------------

This does not deal with string comparison. AFAIK, the SQL standard does
not include sting comparison rules. If it does I would appreciate a
reference so I can read it.

All the rules for string comparison that I have ever seen say that
strings of unequal length are not equal and the longer string is
considered to be greater than the shorter string.

--
..Bill.
Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

> All the rules for string comparison that I have ever seen say that
> strings of unequal length are not equal and the longer string is
> considered to be greater than the shorter string.


You win. Yay. File a bug. Let us know how they respond.


ML

2006-02-16, 7:04 pm

Would an accurate memory of the suitcase suffice in this case?


ML

---
http://milambda.blogspot.com/
ML

2006-02-16, 7:04 pm

And some German developers sometimes wish Mueller was equal to Müller. Add
that to the bug report. :)


ML

---
http://milambda.blogspot.com/
Aaron Bertrand [SQL Server MVP]

2006-02-16, 7:04 pm

> Would an accurate memory of the suitcase suffice in this case?

I think a more accurate analogy to deriving benefit from a non-existant
index would have been that he doesn't even know what a suitcase is.


ML

2006-02-16, 7:04 pm

Maybe he prefers pockets. Big bulging pockets, where the only way to find
anything is to empty them on the floor and then pick up what you need, and
just stuff everything else back in. Heaps of it.


ML

---
http://milambda.blogspot.com/
JT

2006-02-16, 7:04 pm

Well, at least he doesn't have to defrag his padded extent.

I'm tired, it's time to go home...

"ML" <ML@discussions.microsoft.com> wrote in message
news:EEDBC45C-A2A0-49CB-A211-D9EAE38352BA@microsoft.com...
> Maybe he prefers pockets. Big bulging pockets, where the only way to find
> anything is to empty them on the floor and then pick up what you need, and
> just stuff everything else back in. Heaps of it.
>
>
> ML
>
> ---
> http://milambda.blogspot.com/



ML

2006-02-16, 7:04 pm

Squatting causes page splits while jumping up and down constitutes order by
newid.


ML

---
http://milambda.blogspot.com/
Hugo Kornelis

2006-02-16, 7:05 pm

On Thu, 16 Feb 2006 11:40:11 -0800, Bill wrote:

(snip)
>This does not deal with string comparison. AFAIK, the SQL standard does
>not include sting comparison rules. If it does I would appreciate a
>reference so I can read it.


Hi Bill,

ISO/IEC 9075-2:2003(E), chapter 4.2.2 Comparison of character strings:

"When values of unequal length are compared. if the collation for the
comparison has the NO PAD characteristic and the shorter value is equal
to some prefix of the longer value, then the shorter value is considered
less than the longer value. If the collation for the comparison has the
PAD SPACE characteristic, for the purpose of the comparison, the shorter
value is effectively extended to the length of the longer by
concatenation of <space>s on the right."

The behaviour that you expect (and that I would consider the most
logical as well) would conform to the behaviour for a NO PAD collation.
But the collations in SQL Server appaarently all have PAD SPACE as their
characteristic. :-(

--
Hugo Kornelis, SQL Server MVP
Hugo Kornelis

2006-02-16, 7:05 pm

On Thu, 16 Feb 2006 12:38:16 -0500, Aaron Bertrand [SQL Server MVP]
wrote:

>
>Actually, SQL Server is ignoring the trailing spaces, but the result is the
>same.


Hi Aaron,

The result is not the same. And SQL Server is not ignoring trailing
spaces, but adding spaces to the shorter string. This can be easily
shown by using < and > comparisons or an ORDER BY.

declare @a varchar(5), @b varchar(5), @c varchar(5)
set @a = 'a'
set @b = 'a' + char(10) -- less then 'a '
set @c = 'a' + char(40) -- more then 'a '
if @a < @b print '@a < @b'
if @a < @c print '@a < @c'
if @b < @a print '@b < @a'
if @b < @c print '@b < @c'
if @c < @a print '@c < @a'
if @c < @b print '@c < @b'
select variable
from (select @a AS value,'@a' AS variable
union all
select @b, '@b'
union all
select @c, '@c') AS x
order by value

@a < @b
@a < @c
@b < @c
variable
--------
@a
@b
@c



--
Hugo Kornelis, SQL Server MVP
Hugo Kornelis

2006-02-16, 7:05 pm

On Thu, 16 Feb 2006 12:37:33 -0800, ML wrote:

>And some German developers sometimes wish Mueller was equal to Müller.


Hi ML,

declare @a varchar(20), @b varchar(20)
set @a = 'Mueller'
set @b = 'Müller'
if @a = @b
print 'Equal'
else
print 'Not equal'
if @a collate German_PhoneBook_CI_AI = @b collate German_PhoneBook_CI_AI
print 'Equal'
else
print 'Not'


--
Hugo Kornelis, SQL Server MVP
ML

2006-02-16, 7:05 pm

Brilliant! I learn something new every day.

Awhile ago there was a post in this newsgroup and the OP was having this
exact problem. Have you by any chance had time to help him?


ML

---
http://milambda.blogspot.com/
Hugo Kornelis

2006-02-16, 7:05 pm

On Thu, 16 Feb 2006 15:43:27 -0800, ML wrote:

>Brilliant! I learn something new every day.
>
>Awhile ago there was a post in this newsgroup and the OP was having this
>exact problem. Have you by any chance had time to help him?


Hi ML,

I think I missed it. I try to read all messages in this group, but if
I've been away for a few days I usually just mark everything as read.

Feel free to echo my suggestion or post a link to this discussion in the
other thread (if you still know to find it).

--
Hugo Kornelis, SQL Server MVP
ML

2006-02-17, 4:00 am

In fact, Razvan already answered the question:
http://msdn.microsoft.com/newsgroup...5756&sloc=en-us


ML

---
http://milambda.blogspot.com/
--CELKO--

2006-02-17, 7:01 pm

>> SQL Server incorrectly pads the shorter value with spaces to the length of the longer before comparing them. <<

Of course; that is how Standard SQL defines it. Duh!

What did you want to do?

Hugo Kornelis

2006-02-17, 7:01 pm

On 17 Feb 2006 09:38:05 -0800, --CELKO-- wrote:

>
>Of course; that is how Standard SQL defines it. Duh!
>
>What did you want to do?


Hi Joe,

Can you quote the part of the standard that states this?

The only relevant snippet I found is this, which doesn;t quite say that
this is the only correct way:

ISO/IEC 9075-2:2003(E), chapter 4.2.2 Comparison of character strings:

"When values of unequal length are compared. if the collation for the
comparison has the NO PAD characteristic and the shorter value is equal
to some prefix of the longer value, then the shorter value is considered
less than the longer value. If the collation for the comparison has the
PAD SPACE characteristic, for the purpose of the comparison, the shorter
value is effectively extended to the length of the longer by
concatenation of <space>s on the right."

(This is a quote from SQL-2003, but it's an almost verbatim copy of a
paragraph in the SQL-92 standard you participated in)

--
Hugo Kornelis, SQL Server MVP
--CELKO--

2006-02-18, 6:59 pm

Same thing in SQL-92, but nobody used it because it would destroy
SQL-86 and SQL-89 programs. The worst situations are mixes of pad/no
pad inthe same schema, so people leave the default no pad on
collations.

Hugo Kornelis

2006-02-18, 6:59 pm

Xref: TK2MSFTNGP08.phx.gbl microsoft.public.sqlserver.programming:585697

On 18 Feb 2006 11:59:02 -0800, --CELKO-- wrote:

>Same thing in SQL-92, but nobody used it because it would destroy
>SQL-86 and SQL-89 programs. The worst situations are mixes of pad/no
>pad inthe same schema, so people leave the default no pad on
>collations.


Hi Joe,

Thanks for the clarification - I always believed it was as you wrote it:
"padding is defined in the standard". But when I looked it up, I saw
that the standard allowed for two versions, and I wondered why the one
was chosen that makes the least amount of logical sense.

Apparently, it's for historical reasons.

--
Hugo Kornelis, SQL Server MVP
--CELKO--

2006-02-18, 9:57 pm

Actually, I think paddding makes more sense. The Xbase rule is that
"Smith" and "Smithsonian" are equal, but the SQL rule says that they
are different, which I agree with.

Hugo Kornelis

2006-02-27, 7:13 pm

On 18 Feb 2006 17:37:09 -0800, --CELKO-- wrote:

>Actually, I think paddding makes more sense. The Xbase rule is that
>"Smith" and "Smithsonian" are equal, but the SQL rule says that they
>are different, which I agree with.


Hi Joe,

Sorry for the late reply; I've been away from anything with more than
five keys for a w.

I don't think it's a choice between padding or the nonsensical Xbase
rule you quote. Actually, the description of string comparisons for
collations with NO PAD in the ANSI standard is exactly the version that
I would like to see implemented.

--
Hugo Kornelis, SQL Server MVP
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com