Home > Archive > SQL Server Programming > June 2005 > Moving GROUP BY Clauses
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Moving GROUP BY Clauses
|
|
| Nikola Milic 2005-06-08, 4:02 am |
| Hi,
At link
http://msdn.microsoft.com/library/d...l/queryproc.asp
chapter Query Optimization there is description of "Moving GROUP BY
Clauses".
How it can be controlled programmatically?
I'm using SS2000 Enterprise edition SP3 on Win2000 Advanced Server SP4
Thanks in advance
Nikola Milic
| |
| Mike Epprecht \(SQL MVP\) 2005-06-08, 4:02 am |
| By ensuring that your where is on a column that is indexed (clustered), and
your group by uses the name column.
Regards
--------------------------------
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland
IM: mike@epprecht.net
MVP Program: http://www.microsoft.com/mvp
Blog: http://www.msmvps.com/epprecht/
"Nikola Milic" <hotmnikola@hotmail.com> wrote in message
news:udseVv%23aFHA.1040@TK2MSFTNGP10.phx.gbl...
> Hi,
> At link
> http://msdn.microsoft.com/library/d...l/queryproc.asp
> chapter Query Optimization there is description of "Moving GROUP BY
> Clauses".
>
> How it can be controlled programmatically?
>
>
> I'm using SS2000 Enterprise edition SP3 on Win2000 Advanced Server SP4
>
> Thanks in advance
> Nikola Milic
>
| |
| Nikola Milic 2005-06-08, 4:02 pm |
| Thanks for reply,
I don't understand you, what is "the name column"?
Example in mentioned article does not show any clustered indexes, but joins
with three tables. Do you have more examples?
Regards
Nikola
"Mike Epprecht (SQL MVP)" <mike@epprecht.net> wrote in message
news:OGHH81%23aFHA.3040@TK2MSFTNGP14.phx.gbl...
> By ensuring that your where is on a column that is indexed (clustered),
> and your group by uses the name column.
>
> Regards
> --------------------------------
> Mike Epprecht, Microsoft SQL Server MVP
> Zurich, Switzerland
>
> IM: mike@epprecht.net
>
> MVP Program: http://www.microsoft.com/mvp
>
> Blog: http://www.msmvps.com/epprecht/
>
> "Nikola Milic" <hotmnikola@hotmail.com> wrote in message
> news:udseVv%23aFHA.1040@TK2MSFTNGP10.phx.gbl...
>
>
| |
| Gert-Jan Strik 2005-06-08, 4:02 pm |
| Nikola,
It cannot be "controlled" programmatically. The article explains how the
query optimizer works, and that the query optimizer can choose to move
the GROUPing operator as long as the end result will still be correct.
There are situations where you can help the query optimizer, and this is
probably what you are asking. Let's assume the example query from the
article:
SELECT c_name, c_custkey, count (*), sum (l_tax)
FROM customer, orders, lineitem
WHERE c_custkey = o_custkey and o_orderkey = l_orderkey and
o_orderdate between '9/1/1994' and '12/31/1994'
GROUP BY c_name, c_custkey
One thing you could do (and should do) is specify all Primary Key
constraint, Unique constraints and indexes whenever they are unique.
This is the basis for the example in the article and it provides
SQL-Server with the necessary information to deduct that a lookup of the
c_name for a c_custkey is sufficient.
The other thing you could do is to remove unnecessary columns from the
GROUP BY clause. You will need to add aggregates to the select list
columns that are not part of the GROUP BY clause. For example:
SELECT MIN(c_name) AS c_Name, c_custkey, count (*), sum (l_tax)
FROM customer, orders, lineitem
WHERE c_custkey = o_custkey and o_orderkey = l_orderkey and
o_orderdate between '9/1/1994' and '12/31/1994'
GROUP BY c_custkey
Hope this helps,
Gert-Jan
Nikola Milic wrote:
>
> Hi,
> At link
> http://msdn.microsoft.com/library/d...l/queryproc.asp
> chapter Query Optimization there is description of "Moving GROUP BY
> Clauses".
>
> How it can be controlled programmatically?
>
> I'm using SS2000 Enterprise edition SP3 on Win2000 Advanced Server SP4
>
> Thanks in advance
> Nikola Milic
| |
| Tibor Karaszi 2005-06-08, 4:02 pm |
| The article just show you that SQL Server can internally transform your query to another
semantically same query plan. The optimizer is smart and does this for you, but only if they are
semantically the same.
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://www.solidqualitylearning.com/
"Nikola Milic" <hotmnikola@hotmail.com> wrote in message
news:udseVv%23aFHA.1040@TK2MSFTNGP10.phx.gbl...
> Hi,
> At link http://msdn.microsoft.com/library/d...l/queryproc.asp
> chapter Query Optimization there is description of "Moving GROUP BY Clauses".
>
> How it can be controlled programmatically?
>
>
> I'm using SS2000 Enterprise edition SP3 on Win2000 Advanced Server SP4
>
> Thanks in advance
> Nikola Milic
>
| |
| Nikola Milic 2005-06-09, 3:59 am |
| Thank you all for your replies,
I understood that article is about how optimizer works. What I asked is it
possible to force optimizer with hints or sql syntax to use moving GROUP BY
clause.
Gert-Jan,
I think that moving GROUP BY clause can be used to solve my problem which I
posted here on June 1st under subject "Subqueries with DISTINCT" (see
below). You took part in conversation and you asked to see my queries and
DDL. You can make your sample on any of your large table if it has column
with a few values. It is common performance problem which I see very often.
Thanks again
Nikola
"Subqueries with DISTINCT"
--------------------------------
How can I force queries below FIRST to execute statements S and T and then
to make join of S and T?
It makes me a performance problem on real data with a few million rows -
second query will run much, much longer. On this sample from pubs database
you can see that execution plan for second query is much slower. My notion
is first to find all distinct values to narrow data sets, and then to join
two small data sets.
--1st query
SELECT *
FROM (
SELECT DISTINCT title_id, qty
FROM pubs.dbo.sales
)S
JOIN(
SELECT DISTINCT title_id, royalty
FROM pubs.dbo.titles
)T
ON S.title_id = T.title_id
--2nd query
SELECT *
FROM (
SELECT DISTINCT title_id, qty
FROM pubs.dbo.sales
)S
JOIN(
SELECT DISTINCT title_id, royalty
FROM pubs.dbo.titles
)T
ON S.title_id = T.title_id
--HERE IS THE ONLY DIFFERENCE FROM 1st query
OR S.qty = T.royalty -- meaningless, just for test
"Gert-Jan Strik" <sorry@toomuchspamalready.nl> wrote in message
news:42A72280.55F54504@toomuchspamalready.nl...[color=darkred]
> Nikola,
>
> It cannot be "controlled" programmatically. The article explains how the
> query optimizer works, and that the query optimizer can choose to move
> the GROUPing operator as long as the end result will still be correct.
>
> There are situations where you can help the query optimizer, and this is
> probably what you are asking. Let's assume the example query from the
> article:
>
> SELECT c_name, c_custkey, count (*), sum (l_tax)
> FROM customer, orders, lineitem
> WHERE c_custkey = o_custkey and o_orderkey = l_orderkey and
> o_orderdate between '9/1/1994' and '12/31/1994'
> GROUP BY c_name, c_custkey
>
> One thing you could do (and should do) is specify all Primary Key
> constraint, Unique constraints and indexes whenever they are unique.
> This is the basis for the example in the article and it provides
> SQL-Server with the necessary information to deduct that a lookup of the
> c_name for a c_custkey is sufficient.
>
> The other thing you could do is to remove unnecessary columns from the
> GROUP BY clause. You will need to add aggregates to the select list
> columns that are not part of the GROUP BY clause. For example:
>
> SELECT MIN(c_name) AS c_Name, c_custkey, count (*), sum (l_tax)
> FROM customer, orders, lineitem
> WHERE c_custkey = o_custkey and o_orderkey = l_orderkey and
> o_orderdate between '9/1/1994' and '12/31/1994'
> GROUP BY c_custkey
>
> Hope this helps,
> Gert-Jan
>
>
> Nikola Milic wrote:
| |
| Nikola Milic 2005-06-09, 4:02 pm |
| Thanks for reply Tibor,
That is what I asked: how to make semantically the same query? Is there more
examples?
Regards
Nikola
"Tibor Karaszi" <tibor_please.no.email_karaszi@hotmail.nomail.com> wrote in
message news:O$UIyxEbFHA.3132@TK2MSFTNGP09.phx.gbl...
> The article just show you that SQL Server can internally transform your
> query to another semantically same query plan. The optimizer is smart and
> does this for you, but only if they are semantically the same.
>
> --
> Tibor Karaszi, SQL Server MVP
> http://www.karaszi.com/sqlserver/default.asp
> http://www.solidqualitylearning.com/
>
>
> "Nikola Milic" <hotmnikola@hotmail.com> wrote in message
> news:udseVv%23aFHA.1040@TK2MSFTNGP10.phx.gbl...
>
| |
| Gert-Jan Strik 2005-06-10, 8:58 pm |
| Let me start by saying that you should use hints only if the situation
really needs it, and it is a good idea to check periodically whether the
hints are still necessary. The situation (table sizes, data
distribution) may have changed, or the query optimizer may have become
smarter (with a hotfix or service pack).
A very drastic hint to use is the FORCE ORDER hint. This hint could help
in some situations, but it doesn't specify when to apply the GROUP BY,
so it may still not do what you want.
If you are running Enterprise Edition, you could create an indexed view.
See example below. It uses your "S" and "T" example 2:
create view DistinctSales with schemabinding as
select title_id, qty,count_big(*) as countbig
from dbo.sales
group by title_id, qty
create unique clustered index PK_DistinctSales on
DistinctSales(title_id, qty)
set showplan_text on
go
SELECT *
FROM (
SELECT DISTINCT title_id, qty
FROM pubs.dbo.sales
)S
JOIN(
SELECT DISTINCT title_id, royalty
FROM pubs.dbo.titles
)T
ON S.title_id = T.title_id
OR S.qty = T.royalty -- meaningless, just for test
go
set showplan_text off
StmtText
-------------------------------------------------------------------------------------------------------------------------------------------
|--Nested Loops(Inner Join,
WHERE:([DistinctSales].[title_id]=[titles].[title_id] OR
Convert([DistinctSales].[qty])=[titles].[royalty]))
|--Clustered Index
Scan(OBJECT:([pubs].[dbo].[titles].[UPKCL_titleidind]))
|--Clustered Index
Scan(OBJECT:([pubs].[dbo].[DistinctSales].[PK_DistinctSales]))
(3 row(s) affected)
Gert-Jan
Nikola Milic wrote:
>
> Thank you all for your replies,
> I understood that article is about how optimizer works. What I asked is it
> possible to force optimizer with hints or sql syntax to use moving GROUP BY
> clause.
>
> Gert-Jan,
> I think that moving GROUP BY clause can be used to solve my problem which I
> posted here on June 1st under subject "Subqueries with DISTINCT" (see
> below). You took part in conversation and you asked to see my queries and
> DDL. You can make your sample on any of your large table if it has column
> with a few values. It is common performance problem which I see very often.
>
> Thanks again
> Nikola
>
> "Subqueries with DISTINCT"
> --------------------------------
> How can I force queries below FIRST to execute statements S and T and then
> to make join of S and T?
>
> It makes me a performance problem on real data with a few million rows -
> second query will run much, much longer. On this sample from pubs database
> you can see that execution plan for second query is much slower. My notion
> is first to find all distinct values to narrow data sets, and then to join
> two small data sets.
>
> --1st query
> SELECT *
> FROM (
> SELECT DISTINCT title_id, qty
> FROM pubs.dbo.sales
> )S
> JOIN(
> SELECT DISTINCT title_id, royalty
> FROM pubs.dbo.titles
> )T
> ON S.title_id = T.title_id
>
> --2nd query
> SELECT *
> FROM (
> SELECT DISTINCT title_id, qty
> FROM pubs.dbo.sales
> )S
> JOIN(
> SELECT DISTINCT title_id, royalty
> FROM pubs.dbo.titles
> )T
> ON S.title_id = T.title_id
> --HERE IS THE ONLY DIFFERENCE FROM 1st query
> OR S.qty = T.royalty -- meaningless, just for test
<snip>
|
|
|
|
|