Louis's profileThe SQL Doctor is In (Re...PhotosBlogListsMore Tools Help

Blog


    January 25

    NOT IN and NULL values

    Fellow MVP Roji P Thomas (http://toponewithties.blogspot.com/) the other day in the sqlserver.programming newsgroup pointed out something about using NOT IN in a query where NULL values may exist that I hadn't really thought about.  He also answered the question of the person, where I failed to.  So, I wanted to share with you this little tidbit, all the while reinforcing it in my mind for the next time (oops, did I give away the reason that blogging is useful to the blogger...)
     
    The idea here is that you want to find all rows in one table (the parent) where the key value does not exist in another (the child table.)  So quite logically, you say SELECT from parent where value not in (select value from child).  The problem is, if there exists a value of NULL in the child value, only two possible results can be returned:  FALSE, or UNKNOWN.  FALSE, the value exists, or UNKNOWN because any value = NULL is unknown.
     
    For example, take the following tables:
     
    create table parent
    (
            parentId int not null primary key
    )
    create table child
    (
            childId  int not null primary key,
            parentId int null references parent(parentId)
    )
    go
    insert into parent(parentId)
    select 1
    union all
    select 2
    union all
    select 3
    union all
    select 4
    go
    insert into child(childId, parentId)
    select 10,1
    union all
    select 20,1
    union all
    select 30,2
    union all
    select 40,null
    go
     
    So, you write your query as:
     
    select *
    from parent
    where parentId not in (select parentId from child)
     
    Clearly parentId values 3 and 4 are not in the child table:
     
    parentId
    -----------
     
    Logically, the IN operation does a series of compares.  Like for the first row in parent, where parentId = 1, you get the following comparison:
     
    (1=1 or 1=1 or 1=2 or 1=NULL) = (TRUE or TRUE or FALSE or UNKNOWN) = TRUE.  NOT(TRUE) = FALSE, so the row where parentId = 1 is not returned.
     
    Now, when you get to the row where parentId = 3, the following happens:
     
    (3=1 or 3=1 or 3=2 or 3=NULL) = (FALSE or FALSE or FALSE or UNKNOWN) = UNKNOWN (check the truth tables here if this is confusing to you) The value here is NOT(UNKNOWN) = UNKNOWN.  Since UNKNOWN is not TRUE, 3 is not returned. 
     
    Hence, no value could ever be returned if a single null value is in the set. So you can rewrite as:
     
    select *
    from parent
    where parentId not in (select parentId from child where parentId is not null)
     
    or use what is probably a better syntax in any case, as works nulls or not, since not every value need be compared, since it is looking for the existence of a value in a join, not the entire set:
     
    select *
    from parent
    where not exists (select 1
                                from   child
                                where  child.parentId = parent.parentId)

    These queries both return:
     
    parentId
    -----------
    3
    4
     
    Which is the correct answer, after all...

    Comments (4)

    Please wait...
    Sorry, the comment you entered is too long. Please shorten it.
    You didn't enter anything. Please try again.
    Sorry, we can't add your comment right now. Please try again later.
    To add a comment, you need permission from your parent. Ask for permission
    Your parent has turned off comments.
    Sorry, we can't delete your comment right now. Please try again later.
    You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
    Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
    Complete the security check below to finish leaving your comment.
    The characters you type in the security check must match the characters in the picture or audio.
    Louis Davidson has turned off comments on this page.
    Picture of Anonymous
    Denis10504 wrote:
    Louis,

    I know you wrote about it that is the reason I put 'up for debate of course' in parentheses

    here is more NULL fun when using where field = NULL instead of IS NULL

    -- = null doesn't work with ansi null setting on
    set ansi_nulls on -- default for most people
    select * from child where parentId = null
    set ansi_nulls off
    go
    select * from child where parentId = null
    set ansi_nulls on

    -- works no matter what the setting
    set ansi_nulls on -- default
    select * from child where parentId IS null
    set ansi_nulls off
    go
    select * from child where parentId IS null
    set ansi_nulls on

    Yep nulls are a blog by itself

    Denis
    Jan. 26
    Picture of Anonymous
    Louis_Davidson wrote:
    Denis, your first comment is an article in and of itself, but yeah, aggregates and nulls are interesting.

    The second comment I don't agree with really (in fact I wrote a rant about it a week or so ago
    http://spaces.msn.com/members/drsql/Blog/cns!1pqLLlQgw9hZKLV1K4eLatKw!766.entry)

    Beause it is a join, it looks like what you are trying to do is get parent and child rows. Plus, it turns out that it can be less efficient, especially when indexing is scarce on the tables.
    Jan. 26
    Picture of Anonymous
    Denis10504 wrote:
    And of course a left join would probably be the 'best' choice (up for debate of course)

    select p.* from parent p left join child c on p.parentid =c.parentid
    where c.parentid is null
    Jan. 26
    Picture of Anonymous
    Denis10504 wrote:
    I am sure you know this but NULLS have also a strange effect with count(), take a look at the following 2 queries

    select count(*) from child --4
    select count(parentId) from child --3

    when you do a count(*) on the field it only counts the non NULL values

    Of course you could do
    select count(coalesce(parentId,0)) from child --4

    This has always taken a lot of people by suprise

    Denis
    Jan. 26

    Trackbacks

    Weblogs that reference this entry
    • None