sql local data: Location of a condition in a join

Hi,
I often do something like this:
SELECT ...
FROM Table1
LEFT OUTER JOIN Table2 ON (Table2.Id = Table1.Id AND Table2.Deleted<>0)
LEFT OUTER JOIN Table3 ON (Table3.Id = Table2.Id AND
Table3.Deleted<>0)
WHERE
Table1.Deleted<>1
I've been suggested on here to break out the "Table2.Deleted<>1" and
"Table3.Deleted<>1" conditions out of the "ON" clause like this:
SELECT ...
FROM Table1
LEFT OUTER JOIN Table2 ON (Table2.Id = Table1.Id)
LEFT OUTER JOIN Table3 ON (Table3.Id = Table2.Id)
WHERE
Table1.Deleted<>1
AND Table2.Deleted<>1
AND Table3.Deleted<>1
But I don't think these are equal.. in my test, the latter version gives
smaller result set than the prior version.
I think this is because in the first version, only 1 record in Table2 could
match with Table1, but by joining Table3 with Table2, the number of Table2
records returned can be more than the number of records in Table2 alone
(when multiple records in Table3 match those of Table2).. in return,
multiple records in joining Table3 with Table2, will lead to more records
matching with Table1, so the number of records returned is not equal.
But I am not sure about this theory as I don't think the order in which
these joins are executed should matter?
LisaI think Steve has already confirmed this for me in the previous thread "VERY
ODD (LEFT OUTER JOIN)"
Any further comments are always educational and welcome though.
Lisa
"Lisa Pearlson" <no@.spam.plz> wrote in message
news:OdgXWiL5FHA.4076@.tk2msftngp13.phx.gbl...
> Hi,
> I often do something like this:
>
> SELECT ...
> FROM Table1
> LEFT OUTER JOIN Table2 ON (Table2.Id = Table1.Id AND Table2.Deleted<>0)
> LEFT OUTER JOIN Table3 ON (Table3.Id = Table2.Id AND
> Table3.Deleted<>0)
> WHERE
> Table1.Deleted<>1
> I've been suggested on here to break out the "Table2.Deleted<>1" and
> "Table3.Deleted<>1" conditions out of the "ON" clause like this:
> SELECT ...
> FROM Table1
> LEFT OUTER JOIN Table2 ON (Table2.Id = Table1.Id)
> LEFT OUTER JOIN Table3 ON (Table3.Id = Table2.Id)
> WHERE
> Table1.Deleted<>1
> AND Table2.Deleted<>1
> AND Table3.Deleted<>1
>
> But I don't think these are equal.. in my test, the latter version gives
> smaller result set than the prior version.
> I think this is because in the first version, only 1 record in Table2
> could match with Table1, but by joining Table3 with Table2, the number of
> Table2 records returned can be more than the number of records in Table2
> alone (when multiple records in Table3 match those of Table2).. in return,
> multiple records in joining Table3 with Table2, will lead to more records
> matching with Table1, so the number of records returned is not equal.
> But I am not sure about this theory as I don't think the order in which
> these joins are executed should matter?
> Lisa
>|||How about this one though?
I used to do this:
SELECT ...
FROM People
LEFT OUTER JOIN Addresses ON (Addresses.ParentId = People.Id)
LEFT OUTER JOIN Countries ON (Countries.Id = Addresses.CountryId)
And lately I have started doing this:
SELECT ...
FROM People
LEFT OUTER JOIN Addresses
LEFT OUTER JOIN Countries ON (Countries.Id = Addresses.CountryId)
ON (Addresses.ParentId = People.Id)
By nesting it, it's easier to see that "Countries" is only important within
the context of Address and not People.
But do these 2 layouts always lead to exactely the same resultsets?
Lisa
"Lisa Pearlson" <no@.spam.plz> wrote in message
news:OdgXWiL5FHA.4076@.tk2msftngp13.phx.gbl...
> Hi,
> I often do something like this:
>
> SELECT ...
> FROM Table1
> LEFT OUTER JOIN Table2 ON (Table2.Id = Table1.Id AND Table2.Deleted<>0)
> LEFT OUTER JOIN Table3 ON (Table3.Id = Table2.Id AND
> Table3.Deleted<>0)
> WHERE
> Table1.Deleted<>1
> I've been suggested on here to break out the "Table2.Deleted<>1" and
> "Table3.Deleted<>1" conditions out of the "ON" clause like this:
> SELECT ...
> FROM Table1
> LEFT OUTER JOIN Table2 ON (Table2.Id = Table1.Id)
> LEFT OUTER JOIN Table3 ON (Table3.Id = Table2.Id)
> WHERE
> Table1.Deleted<>1
> AND Table2.Deleted<>1
> AND Table3.Deleted<>1
>
> But I don't think these are equal.. in my test, the latter version gives
> smaller result set than the prior version.
> I think this is because in the first version, only 1 record in Table2
> could match with Table1, but by joining Table3 with Table2, the number of
> Table2 records returned can be more than the number of records in Table2
> alone (when multiple records in Table3 match those of Table2).. in return,
> multiple records in joining Table3 with Table2, will lead to more records
> matching with Table1, so the number of records returned is not equal.
> But I am not sure about this theory as I don't think the order in which
> these joins are executed should matter?
> Lisa
>|||You're right: with Outer Join, there is usually a difference in the
resultsets returned which depends on where you put additional filtering:
inside the ON clause or in the WHERE clause.
When it's inside the ON clause, the filtering is done *before* making the
Outer Join and when it's inside the WHERE clause, it's done *after* the
join. Then, for the WHERE clause, the number of records will be equal or
less than the result that you will get with the ON clause. Some or all of
these records might also be different.
This is a big difference between OUTER JOIN and INNER JOIN.
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF
"Lisa Pearlson" <no@.spam.plz> wrote in message
news:exAvkmL5FHA.2092@.TK2MSFTNGP12.phx.gbl...
>I think Steve has already confirmed this for me in the previous thread
>"VERY ODD (LEFT OUTER JOIN)"
> Any further comments are always educational and welcome though.
> Lisa
> "Lisa Pearlson" <no@.spam.plz> wrote in message
> news:OdgXWiL5FHA.4076@.tk2msftngp13.phx.gbl...
>|||These should produce the same results. The only difference is the order
in which the joins will be performed. As I understand it in the first
example the Addresses table will be joined to the People table and then
the Countries table joined to that result. In the second example the
Countries table will be joined to the Addresses table and then that
result joined to the People table. Depending on the size and indexing
of the respective tables this could have potential performance implications.
This next one is simply a matter of opinion, but I personally find the
first example easier to read. In the second format trying to match up
the ON to the JOIN can get cumbersome in larger queries.
Lisa Pearlson wrote:
> How about this one though?
> I used to do this:
> SELECT ...
> FROM People
> LEFT OUTER JOIN Addresses ON (Addresses.ParentId = People.Id)
> LEFT OUTER JOIN Countries ON (Countries.Id = Addresses.CountryId)
> And lately I have started doing this:
> SELECT ...
> FROM People
> LEFT OUTER JOIN Addresses
> LEFT OUTER JOIN Countries ON (Countries.Id = Addresses.CountryId)
> ON (Addresses.ParentId = People.Id)
> By nesting it, it's easier to see that "Countries" is only important withi
n
> the context of Address and not People.
> But do these 2 layouts always lead to exactely the same resultsets?
> Lisa
> "Lisa Pearlson" <no@.spam.plz> wrote in message
> news:OdgXWiL5FHA.4076@.tk2msftngp13.phx.gbl...
>
>
>|||Here is how OUTER JOINs work in SQL-92. Assume you are given:
Table1 Table2
a b a c
====== ======
1 w 1 r
2 x 2 s
3 y 3 t
4 z
and the outer join expression:
Table1
LEFT OUTER JOIN
Table2
ON Table1.a = Table2.a <== join condition
AND Table2.c = 't'; <== single table condition
We call Table1 the "preserved table" and Table2 the "unpreserved table"
in the query. What I am going to give you is a little different, but
equivalent to the ANSI/ISO standards.
1) We build the CROSS JOIN of the two tables. Scan each row in the
result set.
2) If the predicate tests TRUE for that row, then you keep it. You also
remove all rows derived from it from the CROSS JOIN
3) If the predicate tests FALSE or UNKNOWN for that row, then keep the
columns from the preserved table, convert all the columns from the
unpreserved table to NULLs and remove the duplicates.
So let us execute this by hand:
Let @. = passed the first predicate
Let * = passed the second predicate
Table1 CROSS JOIN Table2
a b a c
=========================
1 w 1 r @.
1 w 2 s
1 w 3 t *
2 x 1 r
2 x 2 s @.
2 x 3 t *
3 y 1 r
3 y 2 s
3 y 3 t @.* <== the TRUE set
4 z 1 r
4 z 2 s
4 z 3 t *
Table1 LEFT OUTER JOIN Table2
a b a c
=========================
3 y 3 t <= only TRUE row
--
1 w NULL NULL Sets of duplicates
1 w NULL NULL
1 w NULL NULL
--
2 x NULL NULL
2 x NULL NULL
2 x NULL NULL
3 y NULL NULL <== derived from the TRUE set - Remove
3 y NULL NULL
--
4 z NULL NULL
4 z NULL NULL
4 z NULL NULL
the final results:
Table1 LEFT OUTER JOIN Table2
a b a c
=========================
1 w NULL NULL
2 x NULL NULL
3 y 3 t
4 z NULL NULL
The basic rule is that every row in the preserved table is represented
in the results in at least one result row.
There are limitations and very serious problems with the extended
equality version of an outer join used in some diseased mutant
products. Consider the two Chris Date tables
Suppliers SupParts
supno supno partno qty
========= ==============
S1 S1 P1 100
S2 S1 P2 250
S3 S2 P1 100
S2 P2 250
and let's do an extended equality outer join like this:
SELECT *
FROM Supplier, SupParts
WHERE Supplier.supno *= SupParts.supno
AND qty < 200;
If I do the outer first, I get:
Suppliers LOJ SupParts
supno supno partno qty
=======================
S1 S1 P1 100
S1 S1 P2 250
S2 S2 P1 100
S2 S2 P2 250
S3 NULL NULL NULL
Then I apply the (qty < 200) predicate and get
Suppliers LOJ SupParts
supno supno partno qty
===================
S1 S1 P1 100
S2 S2 P1 100
Doing it in the opposite order
Suppliers LOJ SupParts
supno supno partno qty
===================
S1 S1 P1 100
S2 S2 P1 100
S3 NULL NULL NULL
Sybase does it one way, Oracle does it the other and Centura (nee
Gupta) lets you pick which one -- the worst of both non-standard
worlds! In SQL-92, you have a choice and can force the order of
execution. Either do the predicates after the join ...
SELECT *
FROM Supplier
LEFT OUTER JOIN
SupParts
ON Supplier.supno = SupParts.supno
WHERE qty < 200;
.. or do it in the joining:
SELECT *
FROM Supplier
LEFT OUTER JOIN
SupParts
ON Supplier.supno = SupParts.supno
AND qty < 200;
Another problem is that you cannot show the same table as preserved and
unpreserved in the extended equality version, but it is easy in SQL-92.
For example to find the students who have taken Math 101 and might
have taken Math 102:
SELECT C1.student, C1.math, C2.math
FROM (SELECT * FROM Courses WHERE math = 101) AS C1
LEFT OUTER JOIN
(SELECT * FROM Courses WHERE math = 102) AS C2
ON C1.student = C2.student;|||> You're right: with Outer Join, there is usually a difference in the
> resultsets returned which depends on where you put additional filtering:
> inside the ON clause or in the WHERE clause.
Well, that really depends on whether the additional clause is on the "inner"
or "outer" table in the join, doesn't it?
A

Wednesday, March 21, 2012

Location of a condition in a join

No comments:

Post a Comment

sql local data

Blog Archive

About Me