Louis's profileThe SQL Doctor is In (Re...PhotosBlogListsMore ![]() | Help |
|
January 25 NOT IN and NULL valuesFellow MVP Roji P Thomas (http://toponewithties.blogspot.com/) the other day in the sqlserver.programming newsgroup pointed out something about using NOT IN in a query where NULL values may exist that I hadn't really thought about. He also answered the question of the person, where I failed to. So, I wanted to share with you this little tidbit, all the while reinforcing it in my mind for the next time (oops, did I give away the reason that blogging is useful to the blogger...)
The idea here is that you want to find all rows in one table (the parent) where the key value does not exist in another (the child table.) So quite logically, you say SELECT from parent where value not in (select value from child). The problem is, if there exists a value of NULL in the child value, only two possible results can be returned: FALSE, or UNKNOWN. FALSE, the value exists, or UNKNOWN because any value = NULL is unknown.
For example, take the following tables:
create table parent
( parentId int not null primary key ) create table child ( childId int not null primary key, parentId int null references parent(parentId) ) go insert into parent(parentId)
select 1 union all select 2 union all select 3 union all select 4 go insert into child(childId, parentId) select 10,1 union all select 20,1 union all select 30,2 union all select 40,null go So, you write your query as:
select *
from parent where parentId not in (select parentId from child) Clearly parentId values 3 and 4 are not in the child table:
parentId
----------- Logically, the IN operation does a series of compares. Like for the first row in parent, where parentId = 1, you get the following comparison:
(1=1 or 1=1 or 1=2 or 1=NULL) = (TRUE or TRUE or FALSE or UNKNOWN) = TRUE. NOT(TRUE) = FALSE, so the row where parentId = 1 is not returned.
Now, when you get to the row where parentId = 3, the following happens:
(3=1 or 3=1 or 3=2 or 3=NULL) = (FALSE or FALSE or FALSE or UNKNOWN) = UNKNOWN (check the truth tables here if this is confusing to you) The value here is NOT(UNKNOWN) = UNKNOWN. Since UNKNOWN is not TRUE, 3 is not returned.
Hence, no value could ever be returned if a single null value is in the set. So you can rewrite as:
select *
from parent where parentId not in (select parentId from child where parentId is not null) or use what is probably a better syntax in any case, as works nulls or not, since not every value need be compared, since it is looking for the existence of a value in a join, not the entire set:
select *
from parent where not exists (select 1 from child where child.parentId = parent.parentId) These queries both return:
parentId
----------- 3 4 Which is the correct answer, after all... Comments (4)
Louis Davidson
has turned off comments on this page.
TrackbacksWeblogs that reference this entry
|
|
|