Louis's profileThe SQL Doctor is In (Re...PhotosBlogListsMore Tools Help

Blog


    July 10

    Blogging the DMV's

    Well, I am working on a project that is going to be a book on dynamic management views (and functions, but DMF is an interesting acronym, and objects, as in DMO, has a well known other meaning.  What I will post will be one object in the following format:

    Name of object

    Type: Function or View

    Parameters: If it is a function

    Scope: Really denotes how long the data's life is for.  Many of the dynamic management functions are used to record some event, like an index being use for a query.  There is not usually (well, I haven't finished cataloging them, but so far there is never) any way to reset the values without rebooting the server.  There are tools to use to mitigate the issue, (like Tom Davidson's DMVStats tool that Kalen mentions here,) but the first time you look at the data in these objects it will be a bit confusing if you don't know how long the data has been gathered.

    Columns: A listing for all of the columns that are output by the DMV, including a description

    Example Usage: A query or two regarding possible uses

    Then finally, there will be a section of the book that has a list of ways you can use the DMV's to diagnose common things about your server.

    The book will be very reasonably priced (or not priced) and you will hear more about that over the next few weeks.  What I am asking of you (and you over there to the left) is for your comments/likes, dislikes, and especially query ideas.  I will include your query in the book (or coalesce it with others) and everyone who comments constructively will be mentioned in the acknowledgements of the book.  I will also include any links you might have to DMV related queries in the Bibliography (which already has 16 items in it from just getting started.)

    I will be posting these to my http://sqlblog.com/blogs/louis_davidson/default.aspx site only, since comments there can be entered without a passport.  I will make mention of it on my http://drsql.spaces.live.com blog and provide more loose commentary about the writing process. 

    September 12

    SSMS find and replace option, and a confession (of sorts)

    Much thanks to Peter DeBetta for his blog on Replacing Quoted Identifiers with Bracketed Identifiers.  It is just another of those little features that I have not noticed.  It is absolutely marvelous of course, because if you ever have to work with quoted identifiers, they are ugly (Maybe not ugly, but they seem out of place, like English class gone awry).  Replacing them is even uglier because they start and finish with the same character.  In his blog, it just replaces them very nicely.
     
    It also highlights a weakness of mine that kind of hit me today when I was trying to introduce 2005 to a group of developers and engineers. (the confession...of sorts.)
     
    "I have a very narrow focus".  There I said it.  I tend to focus almost all of my energy on database design and T-SQL, and techniques using T-SQL.  I barely use SSMS except as more than an editor, and any managment tasks I have to do, I always find a way to encapsulate them into a stored procedure so they are easy.  Or play a little dumb and get someone else to do it for me (hopefully no coworkers are reading this blog.)  I have really been dying to learn so many things, including regular expressions, managing servers, clustering, and so many other things.
     
    It's not out of desire, it is just a matter of time.  I already am spending 4 or 5 hours a day writing, testing, and basically goofing around trying to find out more stuff about T-SQL, and all of the management stuff kind of requires hardware, none of which I have readily available (and I have no plans to start working on hardware because it is too darn expensive, and T-SQL, well it is FREE in the Express Edition (ok, so I also have an MSDN subscription, but it is free to you too, even if you dont: http://www.microsoft.com/sql/editions/express/default.mspx)
     
    If you have read this blog for long (I will hit two years in January!  Maybe a little celebration giveaway?  Perhaps, I have a contest starting next month for the end of the year, so that would be good), you know what I do.  I work during the day, and whenever I come across a nifty idea, much like if I had come up with the thing that Peter did, I scribble it down electronically on my blog, email, or somewhere.  Then at night I write it up so I will have it later.  I am always amazed at how much I go to my blog index to remember a technique that I did even a few days ago.  I do the same thing with the msdn forums.  Answer questions, learn something new from other's needs/mistakes. 
     
    And I just heard some FANTASTIC news that I am dying to tell you.   Perhaps not earth shattering, but it will put a smile on your face.  Darn NDA's.  Sometimes I am not sure if I can tell my family that I love them without checking first.  I have added this to the other several things that I am bursting to tell you.  Oh well patience is a virtue I suppose.
    August 15

    Collation Change, this time it's for real

    I had written a quickie article () back in February about changing table collations, and it has actually come in handy (though it was a bit naive, not really thinking about having to change the change the collations of every table in the database, plus all of the code.)  So I figured I would update my article for doing it for real.  Edit: performance wise, the costs are: dropping clustered indexes, rebuilding indexes, and the updates that are done in the background to change the collation. 
     
    The scenario is that our new primary database was created with a Windows collation Latin1_General_CI_AS, while all of our server defaults are "old faithful" SQL_Latin1_General_CP1_CI_AS.  Collations are one of those annoyances back when I wrote the stuff about collations in my book.  Back before 2005 shipped, they mentioned that they were going to start suggesting a Windows collation, but all of the install defaults were in fact to SQL collations. 
     
    This is a good article that covers collations and why Windows collations are better: http://support.microsoft.com/?id=322112.
     
    So I have to do some work changing the databases. To do this, I am using a simple process, well simple might be a misstatement.  Perhaps straightforward:
     
    1.  Make a copy/script of the database, preferrably in a manner that does not specify collations.  I will be using Red-Gate SQL Compare 5 for this task.  I use their snapshot capabilities to create a code only copy of the database.  If you have build scripts for your database that don't include table drops/creates, but have all constraints, that is as good too.  This database is not under any good source control, so this method works best.  Currently I don't know of a good tool to selectively script parts of a database (like all checks, defaults, etc without tables.  And I am a fan of the Red-Gate tool anyhow.  For full disclosure, the copy I am using was given to me by Red-Gate as an MVP.  I am not however being paid for an endorsement in any way. )
     
    2.  Remove most code and constraints from the database.  I remove all of them using a set of procedures that you can download from http://drsql.org/Documents/utility.drop_objects_procs.zip that you can use to remove most everything from the databse.  (I drop CLR objects, but not assemblies and have not done any testing along these lines yet. I have not implemented a procedure for full text objects, but will add one at some time.  We only have a single full text index in use at this point, so I haven't looked into it fully yet.)
     
    The procs are in a utility schema, and they are:
     
    • utility.relationships$remove
    • utility.checkConstraints$remove
    • utility.defaultConstraints$remove
    • utility.indexes$remove
    • utility.uniqueConstraints$remove
    • utility.primaryKeyConstraints$remove
    • utility.codedObjects$remove --triggers, stored procedures, the lot can be removed
    • utility.columns$changeCollation
    • utility.computedColumns$remove --removes computed columns because they are based on columns that we have to change
    Each of these has a parameter combination (generally defaults or '%' wildcards) that will let you slap get rid of everything in the database other than tables (I might build one of those too one day.  I have a query in a draft post that order tables using a tree algorithm that would let me order the tables in the right order.  Of course a loop with a Try...Catch would work too, what was I posting on again?.) 
     
    exec utility.checkConstraints$remove @table_schema = '%'
    exec utility.defaultConstraints$remove @table_schema = '%'
    exec utility.indexes$remove @table_name = '%',@index_name = '%'
    exec utility.uniqueConstraints$remove @table_schema = '%', @table_name = '%', @index_name = '%'
    exec utility.codedObjects$remove @object_name = '%'
    exec utility.computedColumns$remove @table_name = '%', @column_name = '%', @table_schema = '%'

    The script I ran to remove objects was basically very much a "dumb" process.  For example,

    I chose to just drop all indexes rather than only dropping indexes that involved columns with char datatypes.  In the future I am going to upgrade my index, unique and primary key constraint dropping code to discriminate between rows of different types.  Certainly if I have to do a larger database that is in production (something I am assuming I will have to do at some point in the near future.)
     
    It will not drop any objects used in a computed column, so you might have to do this drop manually.
     
    Note, I didn't do foreign keys or primary keys as I use identities for all keys in this database.  Hence they will not be affected. 
     
    3.  Change the collations.  To do this, you simply alter each character column setting the new collation in the ALTER COLUMN statement.  This is built into my procedure, so I just run this:
     
    exec utility.columns$changeCollation @fromCollation = 'SQL_Latin1_General_CP1_CI_AS',
                                         @toCollation = 'Latin1_General_CI_AS'
                                        
    If you have missed anything (like you didn't remove coded objects, or you had full-text indexes that you needed to drop), the error handling of the utility procs basically returns the error as a result set, and in that message I include the statement that was tried to be executed.  If I didn't screw something up (like the first time I ran it I had messed up the quotename() usage to return [varchar(20)] instead of [varchar](20).  That looks funny, but obviously the other way was thinking it was a user-defined type of "varchar(20)"
     
    The procedure basically is a cursor on the following statement
     
    Declare @fromCollation sysname,
                @toCollation sysname

    select   @fromCollation = 'SQL_Latin1_General_CP1_CI_AS' --or whatever
    select   @toCollation = 'Latin1_General_CI_AI'  --or whatever

    SELECT 'ALTER TABLE ' + quotename(TABLE_NAME) +
                ' ALTER COLUMN ' + quotename(COLUMN_NAME) + ' ' + quotename(DATA_TYPE) +
                CASE WHEN CHARACTER_MAXIMUM_LENGTH = -1 then '(max)'
                         WHEN DATA_TYPE in ('text','ntext') then ''
                         WHEN CHARACTER_MAXIMUM_LENGTH IS NOT NULL
                                    THEN '('+(CONVERT(VARCHAR,CHARACTER_MAXIMUM_LENGTH)+')' )
                      ELSE isnull(CONVERT(VARCHAR,CHARACTER_MAXIMUM_LENGTH),' ') END
                 +' COLLATE ' + @toCollation+ ' ' + CASE IS_NULLABLE
                                                                               WHEN 'YES' THEN 'NULL'
                                                                               WHEN 'No' THEN 'NOT NULL'
                                                                    END
    FROM   INFORMATION_SCHEMA.COLUMNS
    WHERE DATA_TYPE IN ('varchar' ,'char','nvarchar','nchar','text','ntext')
        and  COLLATION_NAME like @fromCollation

     
    4.  Change the database collation (or this will be all for naught when people try to create objects and you will have to try again.)
     
    alter database <databaseName>
         set single_user with rollback immediate
    go
    alter database <databaseName>
         collate Latin1_General_CI_AS
    go
    alter database <databaseName>
         set multi_user

     
    5. Now I just run the scripts to recreate all of the objects, or compare the RedGate snapshot, etc.  Make sure that you don't include the old collation in any of your scripts/comparisons, otherwise you will lose all of your hard work :)
     
    6. Finally I drop the utility objects, and the database is back to where it stared, with different collations for all of the text data.
     
    Edit 9/28/2006:  added a call to drop the computed columns.

    August 10

    Getting the schema name from only the object_id

    One of my very biggest influences Erland Sommarskag *, who has the website with some of the most frequently referred to articles on the newsgroups/forums that I partake in (I refer to it weekly at least) gave me this tidbit the other day and I have already used it in several utility procedures that I have built recently created (and plan to post this weekend, more on that in a minute. **)
     
    The tip was basically in response to my suggestion that the object_name function should return schema, possibly by a parameter, or even a new function. I use schemas quite extensively in my new projects, and as such I have to be very concious of the fact that s1.table is not s2.table. It isn't too oftent that there is overlap, but it can happen.
     
     
    Anyhow, the workaround, which is pretty cool is to use the objectpropertyex function and get the schemaId:
     
    objectpropertyex(object_id, 'SchemaId')
     
    then cast it to an integer, since objectpropertyex returns a variant
     
    cast(objectpropertyex(object_id, 'SchemaId'))
     
    and finally use schema_name to get the schema name:
     
    schema(cast(objectpropertyex(object_id, 'SchemaId')))
     
    Ugly?  Sure, workarounds almost always are.  But when you need to write a query like this to get the columns of tables, it is nice to be able to not have to join to the sys.objects view.  I should also note that using the INFORMATION_SCHEMA you will get this information, but sometimes it is more expedient to use the catalog views to get more implementation information.
     
    select schema_name(cast(objectpropertyex(object_id, 'SchemaId') as int))
           + '.' + object_name(object_id), name
    from sys.columns
     
    * If you havent read the articles on Erland's site: http://www.sommarskog.se/, do yourself a favor and do so.  They are quite excellent and well worth your time.
     
    ** I have been working far more than my normal amount lately on several projects, day job, night job, family thing, etc and haven't had much time for blogging or working on my book addendum.  I have a brief respite this weekend (I hope) from anything other than a litle day job work so I plan to catch up on a few changes that I need to add.  One of them was just raised by a reader in this thread: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=620460&SiteID=1.  My wording wasn't all that great, so clarification is definitely in order, but the final point of the thread about included columns in a unique index where you might have had a constraint is a ponderer that is interesting. It is going to make me change some stuff all around.
    August 07

    Deploying with SQLCMD, Part 2, From a DOS Coding Standpoint

    Two days ago, I posted an article (more or less) on using SQLCMD variables to support coding objects that are going to be deployed to multiple environments.  Today, I will finish this thought by discussing a method for deployment.
    Say you have a file "procedure.schema.object.sql" that contains the following:
     
    -------------------------------------------------------
    use $(databaseName)
    go
    if object_id('schema.object') is not null
        drop procedure schema.object
    go
    create procedure schema.object
    as
        select name
        from   $(referencedb).dbo.tablename
    go
    -------------------------------------------------------
     
    Now, I want to execute this script on server1, database devObjects, referencedb crmDatabase.  Using DOS variables, I build the following CMD file:
     
    -------------------------------------------------------
    set databaseName=devObjects
    set referencedb=crmDatabase
    sqlcmd -SMyServer -E -iprocedure.schema.object.sql
    -------------------------------------------------------
     
    SQLCMD will use the dos variables and replace them in the file and then execute it.  This in and of itself is a decent thing, but you can take it a step further by ecapsulating the sqlcmd calls on their own, and the set commands on their own so you can easily vary the target server, database, reference, etc.
    First, I create a file, call it build.sql.  I will add another variable called sqlserver use DOS variable and use it in the -S parameter of the sqlcmd call:
    -------------------------------------------------------
    sqlcmd -S%sqlserver% -E -iprocedure.schema.object.sql
    -------------------------------------------------------
     
    Now, I create another file called myserver_devObjects.sql
     
    -------------------------------------------------------
    set sqlserver=MyServer
    set databaseName=devObjects
    set referencedb=crmDatabase
    call build.sql
    -------------------------------------------------------
     
    Executing this sets the variables, then calls the build script to actually start the build.  Next, when I need to deploy to a second server:
     
    -------------------------------------------------------
    set sqlserver=MyOtherServer
    set databaseName=devObjects
    set referencedb=crmDatabase2
    call build.sql
    -------------------------------------------------------
     
    Now, to add files to the build, all you have to do is change the build.sql file.  Now you can run the other two batch files to update both servers quite easily:
     
    -------------------------------------------------------
    sqlcmd -S%sqlserver% -E -iprocedure.schema.object.sql
    sqlcmd -S%sqlserver% -E -iprocedure.schema.object2.sql
    sqlcmd -S%sqlserver% -E -iprocedure.schema.object3.sql
    -------------------------------------------------------
     
    And now three object will be created on the different servers.  Obviously there are lots more possibilities for using sqlcmd, for example, -i parm can take comma delimited values. However, I use excel or SQL to build the build.sql file, depending on how formal the application is. 
    Finally, I have been considering a method of taking the list of files in the project to build the files and put the project into a SQL Server database and then use using xp_cmdshell to execute the sqlcmd scripts, but that is a long ways off.
    August 05

    Deploying with SQLCMD, Part 1, From a T-SQL Coding Standpoint

    This past week I have been struggling with deploying code to several different servers for multiple development servers, in multiple different databases, and for a different client.  So I had to figure a way to push the tsql code around to the different locations pretty easily. 
     
    What is even more complicated is that some of the code requires access to a different database via code like:
     
    FROM  databaseName.schema.table
     
    And the databaseName for the database that has the code in it can change, as well as the references to the external database.
    I settled on using DOS/SQLCMD variables to replace hard coded database names, and the result has been excellent.  So if I was building a procedure, function, or even a table, the first line of the file would be:
     
    use $(databaseName)
    go
     
    And when I had to hard code a reference to a different database, I set it like:
     
    FROM $(referencedDatabase).schema.table
     
    Then, when I am editing the file during design, I use SSMS's SQLCMD mode, and paste in:
     
    :setvar databaseName dbName
    :setvar referencedDatabase otherDbName
     
    Then, when you run the script, SSMS replaces your tags with these values where it finds the tags.  This is really useful because it means that I can work with hardcoded values from a code standpoint, but not from a coding standpoint.
     
    One thing you have to be really careful about is to make sure and remove the :setvar commands when saving the final values.  When I do this in DOS, you will want to to use the values that we will set in the DOS variable, (which does the same thing, effectively as the :setvar command)
     
    What is really cool is that I have files for database creation, table creation, and object creation.  Each of these is highly parameterized to use the different variables.
     
    Another cool thing I do with SQLCMD is generating code from T-SQL that will then be executed on the same server (not that this is the coolest thing, but this is :) or other servers.  Hence, I tokenize the output so it can be reexecuted using SQLCMD, and then run it using the variables.  However, the annoying thing with this is you have to be really careful not to put code like:
     
    select '$(name)'
     
    In your code that generates the query file.  Otherwise you will get an error because that token doesn't exist.  You will have to do something like: 
     
    select '$' + '(name)'
     
    This will put $(name) in the output, but it won't be interpreted as such in the statement.

    Note: In Part 2 I will discuss the DOS .cmd files I used. to build and deploy the files to different servers.
    August 02

    UTC Date conversion functions using the CLR

    A new system we are building is storing all data as UTC datetime values which is generally speaking a best practice, but can be a real pain to work with, especially since the data that is being converted is in Eastern Time (Daylight Saving OR Regular, depending on when it was stored) so you can''t just use the same offset (and we are working with millions of rows, many of which have several dates and time values.
    It is is easy enough to get the offset for the current time by doing:
     
    select datediff(hour,getutcdate(),getdate())
     
    Returns:
     
    -----------
    -5
     
    Which is true, since I am in the Central Time Zone right now.  Then use this offset to add to date values. But what if you need to convert the date on January the first?  There are lots of algorithms of how to do this, but this page (http://www.aspfaq.com/show.asp?id=2218) got me started thinking, hmm, if javascript has it built in, will the CLR?
     
    So I and my coworker (the programmer formerly known as Junior DBA) went a hunting and bam, there is was. The date object had methods for converting to and from the server time zone and UTC.  This was perfect for our needs, since all data being converted was in Eastern and we needed it in UTC.  Plus, all of the system reporting is done in Eastern, so from what I can tell this covers it.
     
    My version, which I wrote (last night) after work in VB.NET (the production code is actually in C#) is really simple:
     
    Option Strict On
    Imports System
    Imports System.Data
    Imports System.Data.SqlClient
    Imports System.Data.SqlTypes
    Imports Microsoft.SqlServer.Server
     
    Partial Public Class UserDefinedFunctions
        <SqlFunction(IsDeterministic:=True, DataAccess:=DataAccessKind.None, Name:="date$serverTimeZoneToUTC", IsPrecise:=True)> _
        Public Shared Function serverTimeZoneToUTC(ByVal inputDate As SqlDateTime) As SqlDateTime
            Dim dateValue As Date
            dateValue = inputDate.Value
            Return New SqlDateTime(dateValue.ToUniversalTime)
        End Function
        <SqlFunction(IsDeterministic:=True, DataAccess:=DataAccessKind.None, Name:="date$UTCToserverTimeZone", IsPrecise:=True)> _
        Public Shared Function UTCToserverTimeZone(ByVal inputDate As SqlDateTime) As SqlDateTime
            Dim dateValue As Date
            dateValue = inputDate.Value
            Return New SqlDateTime(dateValue.ToLocalTime)
        End Function
    End Class
     
    I am not a functional programmer, so there might be a more stylish way to do this, and if so please hit the comments :)  Build it, deploy it to a database of your choice, and then test it:
     
    Current date time,then convert UTC date to server time zone, should be equivalent

    select  getdate(),dbo.date$UTCToserverTimeZone(getutcdate())
     
    ----------------------- -----------------------
    2006-08-01 20:18:07.690 2006-08-01 20:18:07.690
     
    current UTC date time,then convert server time zone datetime to UTC again should match

    select  getutcdate(), dbo.date$serverTimeZoneToUTC(getdate())
     
    ----------------------- -----------------------
    2006-08-02 01:18:07.690 2006-08-02 01:18:07.690

    As Mr Burns would say, penting his fingers..."Excellent"

    There is however a "glitch" currently when working with future dates and times.  Starting next year, we Americans will have a larger amount of time to have light in the evening, as our daylight saving time period grows for over a month.  It will now start on the second Sunday in March and end on the First Sunday in November.  I am personally glad, as I like longer nights (and I don't get up until the sun does in any time zone!) but the Windows API doesn't yet know about it yet. 
     
    This really bothered me, so I started coding a fix tonight, when I realized that:
     
    a.  All of the dates we deal with that include a time element will be in the current realm of time
    b.  Microsoft (or someone) must be doing something about it
    c.  I was at home and there were far better things to be doing with my time than writing code that will never be used
     
    Well, frankly c is sort of a lie because I do this all of the time, but I then started surfing the web for a solution that didn't require me to work out the code, and I found a few articles of merit:
     
    Designing "Time Zone conversions"

    Preparing for Daylight Saving Time changes in 2007
    http://www.microsoft.com/windows/timezone/dst2007.mspx
    "The update required to modify the Windows time zone behavior and properties will be made available to businesses in the mid-August 2006 timeframe. "  It continues on to say: "These updates are currently planned to be available for production deployment for all customer starting in November 2006 through..."
     
    So there should be a patch a'coming before it is important to me.  But if you have coded a fix (or know of one) just let me know and I will use your code :)
     
     
    July 27

    Getting previous month/years/days from your calendar table

    It is a rare thing indeed when a person like me comes up with an independent thought. 99% of everything I do is based almost entirely on the work of others.  Today I had an independent thought that I had not seen anywhere else.  To be quite clear, the idea does piggyback on the common idea of a calendar table.
     
    The problem I wanted to solve was finding the last 90 days, or the last 4 months, or the next 10 months.  My calendar table has month, year, etc, but how do you find what month was 12 months ago?  For year it is easy to figure out the previous year (current_year - 1), but what about 5 months ago?  Unfortunately, the people that came up with the whole calendar thing thought it would be funny to number dates 1-12. 
     
    So I added to my calendar table:
     
    create table calendar
    (
        date_value datetime primary key,
     
        day_count int,
        month_count int,
        week_count int
    )
    Then I load the data (well, I wrote an update in the real situation) for the years from 2000 to 2008. Note that all of the data is loaded sequentially with no breaks so you now can use basic math operations to find date ranges.
     
    ;with digits (i) as(
                         select 1 as i union all  select 2 as i union all select 3 union all
                         select 4 union all select 5 union all select 6 union all select 7
                         union all select 8 union all select 9 union all select 0)
    ,sequence (i) as (
    SELECT  D1.i + (10*D2.i) + (100*D3.i) + (1000*D4.i)
    FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3 CROSS JOIN digits AS D4
    )
    insert into calendar (date_value, day_count, week_count, month_count)
    select dateadd(day,i,'20000101') as date_value,
           datediff(day,'20000101',dateadd(day,i,'20000101')) as day_count,
           --use 20000102, as this is a sunday
           datediff(week,'20000102',dateadd(day,i,'20000101')) as week_count,
           datediff(month,'20000101',dateadd(day,i,'20000101')) as month_count
    from sequence
    where dateadd(day,i,'20000101') <= '20081231'
    order by i
     
    Now, using the calendar table I can get the dates of the previous week, the last 10 weeks, or any combination.  For example, for the previous week:
     
    --the previous week
    select min(calendar.date_value) as minDate,
              max(calendar.date_value) as maxDate
    from   calendar
                    join (select * from calendar where date_value = '20060727') as currentDate
                           on   calendar.week_count = currentDate.week_count - 1
     
    This returns:
     
    minDate                 maxDate
    ----------------------- -----------------------
    2006-07-16 00:00:00.000 2006-07-22 00:00:00.000
     
    The basics of the query is simply to take a derived table that fetches the calendar row for the current day, then join that to the full calendar table using a range of dates in the join.  In the previous week example, I used the following:
     
    on   calendar.week_count = currentDate.week_count - 1
     
    So just the previous week only.  Now, what if you want the previous 12 months:
     
    select min(calendar.date_value) as minDate,
           max(calendar.date_value) as maxDate
    from   calendar
               join (select * from calendar where date_value = '20060727') as currentDate
                    on   calendar.month_count < currentDate.month_count
                         and  calendar.month_count >= currentDate.month_count - 12
     
    This returns:
     
    minDate                 maxDate
    ----------------------- -----------------------
    2005-07-01 00:00:00.000 2006-06-30 00:00:00.000
     
    --not the current, month, but all earlier
    calendar.month_count < currentDate.month_count
     
    Notice that the results did not include the current month.  Change the < to <= and it will include the current month. 
     
    --and months greater than the monts 26 months ago
    and  calendar.month_count >= currentDate.month_count - 26

    For a 24 month window, 12 months +/- the current month:
     
    select min(calendar.date_value) as minDate,
              max(calendar.date_value) as maxDate
    from   calendar
                   join (select * from calendar where date_value = '20060727') as currentDate
                         on   calendar.month_count < currentDate.month_count + 12
                                and  calendar.month_count >= currentDate.month_count - 12
     
    This returns:
     
    minDate                 maxDate
    ----------------------- -----------------------
    2005-07-01 00:00:00.000 2007-06-30 00:00:00.000
     
    Now you can use these dates in other criteria either by assigning these values to a variable, or (of you are one of the cool kids) you can use the tables in a join to other tables.  For example, let's hop in the wayback machine and go to the Adventureworks database:
     
    select *
    from   sales.salesOrderHeader
               join calendar
                   join (select * from calendar where date_value = '20040727') as currentDate
                        on   calendar.week_count = currentDate.week_count - 1
                   on calendar.date_value = salesOrderHeader.ShipDate
    order by shipDate desc
     
    You will see a set of rows from 2004-07-18 00:00:00.000 to 2004-07-24 00:00:00.000.

    If you have come up with this before, please let me know so I can link to/direct readers to your materials.  Like I said, I haven't seen this yet.  If it turns out there was a better way to do this, let me know that too :)
    July 22

    July refresh of Books Online

    I am traveling today, so I just love these kind of easy announcement articles :)
     
    They have released a new update to books online.  You can retrieve it from here:
     
    A nice touch is the topic "New and Updated Books Online Topics (17 July 2006) " that will give you links to the new and updated topics in this release.
    Just paste this into the books online URL:
     
    ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/sqlwhtn9/html/be97726e-e501-4b4a-b40d-86682bb7a73b.htm

    A lot has changed.  This topic is really cool (ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/ca202580-c37e-4ccd-9275-77ce79481f64.htm)
     
    The MSDN online topics will be updated this weekend.
     
    As another bonus, there are some new and updated samples available for download at:
    http://www.microsoft.com/downloads/details.aspx?FamilyId=E719ECF7-9F46-4312-AF89-6AD8702E4E6E&displaylang=en
     
    July 14

    Generating a sequence table

    A thing I find myself constantly needing is a table of numbers, especially to demonstrate some other concept for other people.  The technique I will present is not technically my own, as I find it here in this post by Joe Celko.  If this is your technique, or you have an article where you present this technique, let me know and I will link you in. (I just want the script handy :)
     
    ;with digits (i) as(
                                 select 1 as i union all  select 2 as i union all select 3 union all
                                 select 4 union all select 5 union all select 6 union all select 7 union all 
                                 select 8 union all select 9 union all select 0)
    ,sequence (i) as (
    SELECT  D1.i
            + (10*D2.i)
            + (100*D3.i)
            + (1000*D4.i)
            + (10000*D5.i)
            + (100000*D6.i)
    FROM digits AS D1
         ,digits AS D2
         ,digits AS D3
         ,digits AS D4
         ,digits as D5
         ,digits As D6
    )
    select i from sequence order by i
     
    Without the order by, it will be ordered funny, which should be fine for most queries, but terrible for you checking things out.
     
    This will produce a table of numbers between 0 and 999999.  The concept is to first take a table of digits, in this case the digits CTE:
     
    ;with digits (i) as(
    select 1 as i union all  select 2 as i union all select 3 union all
    select 4 union all select 5 union all select 6 union all select 7 union all
    select 8 union all select 9 union all select 0)
    select * from digits
     
    This returns:
     
     i
    -----------
    1
    2
    3
    4
    5
    6
    7
    8
    9
    0
     
    Next, this is used in cross joins, with each additional cross join increased by a power of 10.  Then each match is added together:
     
    ,sequence (i) as (
    SELECT  D1.i
            + (10*D2.i)
            + (100*D3.i)
            + (1000*D4.i)
            + (10000*D5.i)
            + (100000*D6.i)
    FROM digits AS D1
         ,digits AS D2
         ,digits AS D3
         ,digits AS D4
         ,digits as D5
         ,digits As D6
    )
     
    This is really ingenious, and I wish I had thought of it. Looking at just two sets of digits, you can see how this works:
     
    ;with digits (i) as(
                                  select 1 as i union all  select 2 as i union all select 3 union all
                                  select 4 union all select 5 union all select 6 union all select 7 union all
                                  select 8 union all select 9 union all select 0)
    ,sequence as (
    SELECT  D1.i as d1
            , (10*D2.i) as d2
    FROM digits AS D1
         ,digits AS D2
    )
    select * from sequence
     
    You will see a set like this:
     
    d1          d2
    ----------- -----------
    1           10
    1           20
    1           30
    1           40
    1           50
    1           60
    1           70
    1           80
    1           90
     
    So for each value in d1, you will add each value in d2.  Really neat, right?  When you add in d3, it adds in each of the hundreds values, including the 0 row, which leaves the set from the first two cross joins.
     
    I formatted the text above in such a long manner because when I use this, I often want to limit the number of rows returned:
     
    ;with digits (i) as(
                                  select 1 as i union all  select 2 as i union all select 3 union all
                                  select 4 union all select 5 union all select 6 union all select 7 union all
                                  select 8 union all select 9 union all select 0)
    ,sequence (i) as (
    SELECT  D1.i
            + (10*D2.i)
    --        + (100*D3.i)
    --        + (1000*D4.i)
    --        + (10000*D5.i)
    --        + (100000*D6.i)
    FROM digits AS D1
         ,digits AS D2
    --     ,digits AS D3
    --     ,digits AS D4
    --     ,digits as D5
    --     ,digits As D6
    )
    select i from sequence order by i
     
    Of course you can also just use a where clause too.  If you are going to frequently use a sequence table, you really ought to build a permanent table (in fact in my book I suggest everyone build a sequence table for their databases.)
     
    Some additional resources I have found:
     
    SQL Server Mag - Generating Sequences in T-SQL
    By:  Alexander Netrebchenko
    http://www.sqlmag.com/Article/ArticleID/22442/sql_server_22442.html
     
     
    This one uses a technique like the one I presented, but not based on tens.  Very interesting, looks hard to extend though:

    Lewis Bruck's Blog - Generating a table of integers
    http://blogs.msdn.com/lbruck/archive/2005/11/09/491153.aspx
     
    EDIT: From the comments, this is another technique that is pretty awesome:
    I also like his site name.  Codinet.  His explanation: codine (addictive drug, one of my favorite parts of getting hurt, I will admit :) and .NET (addictive programming drug).
    July 11

    Submit feedback and vote!

    I am packing tonight to leave town tomorrow, but I just want to implore you, if you have gripes about SQL Server go here:  http://connect.microsoft.com/SQLServer/.  Sign in with your passport account (or get one!) and submit feedback, or vote for other peoples feedback.  If you find a bug, report it. 

    There are many many great ideas that people mention all of the time in the forums, but so rarely do they seem to go to Connect and let their voice be heard.  Don't be like that. 

    Just today someone asked about how you can turn off delayed name resolution so that when you create a procedure like:

    create procedure test
    as
    select *
    from snurdlet
    go

    It compiles fine. However when it is executed:

    Msg 208, Level 16, State 1, Procedure test, Line 3
    Invalid object name 'snurdlet'.

    You know this annoys you, how could it not?  I know why they enabled this several versions back.  I worked really hard to build tools with pre-scripts and post-scripts to create temp tables because they had to exist before the procedure could be created if they were used.  So in this case, it would be nice to allow delayed name resolution.  Or even when creating tons of procedures, perhaps.  But for the most part, when you are editing a procedure in your development environment, this kind of mistake (my table is really named snurdle :) can hurt you later if testing isn't exactly perfect.

    I have started a page (http://drsql.org/vote.aspx) on my site with some of my more pet-peeve requests that if you feel so inclined, go vote!

    July 05

    RSS from SSIS Part 3 – Preparing and transforming the data, complex case

    I will try to be a bit less detailed in this post, because if you have done the previous two (links) most of this will be quite straightforward.  This is the last of the series. I hope you liked it.  If you did like it, please email me or comment here.  Unless I see a rather sizable response (which could be just a few if they are good enough :) I will probably never do this again.  It was a large amount of work to maintain an example so large, but I am glad I did it.  I learned more about SSIS than I had imagined I would.
     
    Step 8. Download another feed.
     
    Repeat the steps from part one (link) to download another RSS feed. In this case we are going to fetch the RSS for my site: http://drsql.spaces.msn.com/feed.rss.
     
    What I have done is to go back to the control flow page, highlight everything, then right click and disable. 
     
    Then Just highlight the script task and xml task and copy and paste.  You will likely want to name your objects now.  I will name the tasks from the previous steps: "PASS - (existing name)"  and the new ones "DRSQL -".  You can change the name by selecting the object and pressing F2, or going into the edit dialog.
     
    From here, just go in, enable the task script, set the variables, etc.  Enable, the  script task, set the parms, run it, then disable it so you don't download from my feed "too much" and get me all excited that 1.4 million people have started reading.
     
    I will put the xml file in: c:\mssql\ssisarticle\xml\drsql.xml
     
    Finally, attach the output of the "DRSQL - XML Task" to the "Data Flow Task".  Once you have downloaded the file and set up the XML task, disable this task and reenable the "PASS - XML Task" and "Data Flow task" and go to the data flow editor.
     
    Step 9 - Setting up the XML Source
     
    Copy the XML Source and Derived Column tasks, paste them, and rename like in the last step.  Delete the link between "DRSQL - XML Source" and "DRSQL - Derived Column".  We will come back to the derived column in a little bit.
     
    Edit the "DRSQL - XML Source", and change the XML Location to the drsql.xml file you just created in the previous step. Then click generate schema.  I will place it in c:\mssql\ssisarticle\xsd\drsql.xsd.  Then click OK to close the dialog.
     
    Next right click and go to the advanced editor. On the Input and Output Properties tab, choose the item interface, and then the output columns.  Not the same as in the other feed (nor is it like any other I have found and had this not been my site that was annoying, I probably wouldn't have bothered as this was kind of a pain.)  It was however, a very nice learning opportunity, as I learned about how the Merge Join task worked :)
     
    To get everything we need, we will have to work with 4 interfaces (I was fooled by the channel interface once, but it only has one row in it.)  And as an additional step, for each interface we will need to sort the interface on the item_id column.  So for the first parameter, the @publishDate, right click the "DRSQL - XML Source", and choose the advanced editor.
     
    Then go to the Input and Output Parameters and find the pubdate interface.  With it highlighted, change IsSorted to Yes.  Then click on item_id and set the SortKeyPosition to 1.  just leave item_id as an eight-byte unsigned integer.  We will use it later as our join key. Next click on the pubDate column and change the datatype to database timestamp [DT_DBTIMESTAMP].
     
    Then do this for each of the following interfaces for the mentioned parameters, making sure to sort on item_id for each of the interfaces:
     
    * link  --    @permalink nvarchar(300) -- Unicode string [DT_WSTR], length 300
    * title --    @title nvarchar(300) -- Unicode string [DT_WSTR], length 300
    * description  --    @text nvarchar(max) -- Unicode text stream [DT_NTEXT]
     
    These we will derive later, since the spaces feed has no author:
     
        @blogName nvarchar(20),    --derived later
        @author nvarchar(100),     --derived later (no author on spaces)
     
    Once you have set these values, the XML Source is prepared for use.
     
    Step 10 - Assembling the data
     
    Now we need to join together the XML Sources we have just prepared. For this we will use the Merge Join transform.  Drag the Merge Join transformation to the data flow task.  Now attach "DRSQL - XML Source" to the "Merge Join" and a dialog will appear.  For the output, choose pubDate, and for the input, choose Merge Join Left Input (or right, this is not important really.) 
     
    Note that you can pretty much run this task to test it anytime you want.  You should see the different tasks turning green and telling you how many rows are affected (usually that will be 25.)
     
    Click OK to close the dialog.  Then grab the same green line from the "DRSQL - XML source", and this time choose link (the only input left will be the Merge Join Right Input.  Then right click the Merge Join task and choose edit.  Here you will verify that item_id is tied to item_id, then click the checkboxes on the left side of the table boxes for all columns (only one copy of item_id is needed.)  The list should now have item_id, pub_date, and link in it. Click OK to close the dialog now.  Rename this transformation to "DRSQL - Merge Join - link -- pubDate" (Or whatever.  I am naming it like this primarily so I can refer back to it without pictures.  Also, if you want to be neat, you can resize the boxes so the whole name shows.)
     
    Add another Merge Join transformation to the data flow task.  Connect the "DRSQL - Merge Join - link -- pubDate" task to it, and set it as the left input. Click OK.  Then connect the "DRSQL - XML Source" to this new transform.  Choose title as the input.  Again edit the transformation and choose the outputs.  Click OK to close the dialog, and repeat once more for the description interface.  I will name the two merges: "DRSQL - Merge Join - linkPubDate -- title", and finally "DRSQL - Merge Join - linkPubDateTitle -- Description".  The output of this transform should be: pubDate, link, title and description (no need for item_id any longer)

    Step 11 - Add author and blogname
     
    Connect the output from "DRSQL - Merge Join - linkPubDateTitle -- Description" (big name) to the "DRSQL - Derived Column" transform we pasted and renamed earlier.  Then right click and edit. You should still have a BlogName column.  Change it's expression to "DRSQL".  Then add a Creator derived column and set it to "Louis Davidson".  Set the lengths to 20 characters. Click OK to close this dialog.
     
    Step 12 - Set it up to allow multiple flows into the "OLE DB Command" task.
     
    Now our dataset is exactly where it was before.  And we could duplicate the "OLE DB Command" task, but that would not be very elegant.  So instead, we will use a UNION ALL transformation.  So drag a Union All transformation to the data flow task. And then connect The "DRSQL - Derived Column" transform to it.
     
    Next sever the connection between the "PASS - Derived Column" task, and instead connect it to the "UNION ALL" transform.  Right click and edit to see if you have to set any columns manually.  I had to set one because 'creator' isn't immediately recognized as Creator. Click OK to close the dialog.
     
    Then connect the output of the "Union All" to the "OLE DB Command" task.  Then right click the "OLE DB Command" and it will show you a dialog asking you to "Restore Invalid Column References" This should match up the new inputs from the "Union All" to the proper column references.
     
    Now, if all has gone well, when you click run, some number of rows will move from task to task into the blog table.  Run this query in your database:
     
    select *
    from blog
     
    And you should now see rows with DRSQL in the title. So there you have it.  If you have done the steps and poked around you should have a pretty decent idea of the basics of SSIS.  After I did this the first time, I had to build a package to transform a spreadsheet and put the data (with some modification) into a couple of tables.  It was really quite straightforward.
     
    I have uploaded a copy of this package to http://drsql.org/Documents/RSSfromSSISFiles.zip for you to check out.  To use/edit the package(s), add to an SSIS project.  You will need to modify the DatabaseServer connection to match your server.  I have also included a cmd file to create the directories that I used in the articles.
    July 03

    RSS from SSIS Part 2 – Preparing and transforming the data, simple case

    In the previous article, (link) we set up a package that will download an RSS XML file from a site (we used the PASS SIG BLOGS site).  I titled this article "simple case" because not every RSS feed will present such a nice interface as this site does (though most will.)  In this article I will be taking that data and creating it in the database (as well as updating it if the text has changed.)
     
    Step 3 - Create table for the data.
     
    For the data steps, we will need a table to hold the data, which will also make it more clear as to why we are getting certain values from the feeds.  So to this end I will create a database and a table to hold the blog data:
     
    create database testRSSfromSSIS
    GO
    use testRSSfromSSIS
    go
    CREATE TABLE blog
    (
           blogId int IDENTITY(1,1) NOT NULL CONSTRAINT PKblog PRIMARY KEY,
           blogName nvarchar(20)  NOT NULL ,  --name we will give to the feed
           permalink nvarchar(300)  NOT NULL , --link to the article
           publishDate datetime NOT NULL , --date the artice was published
           author nvarchar(100)  NOT NULL , --name of the author of the post
           title nvarchar(300)  NOT NULL , --title of the post
           text nvarchar(max)  NOT NULL , --text of the post
           loadDate datetime NOT NULL CONSTRAINT DFblog_loadDate DEFAULT (getdate()),
           checksumValue as checksum(blogName, permalink, publishdate,author, title, text) persisted,
        CONSTRAINT AKblog UNIQUE NONCLUSTERED (blogName, permalink)
    )

    I chose the blogName and permalink name as the alternate key that we will use to determine if we have downloaded an article previously.  It isn't a perfect key, but it is "good enough" for my purposes. I will insert rows with new links, and update those where the checksum of the columns doesn't match (that will be in the next article)
     
    The following stored procedure will be used to for each row in the input to create a new row, or update an existing row. 
     
    Note: I originally used a lookup transform, but it required more roundtrips and lookups to the server than just using this procedure.
     
    CREATE PROCEDURE blog$insUpd
    (
        @publishDate datetime,
        @permalink nvarchar(300),
        @blogName nvarchar(20),
        @author nvarchar(100),
        @title nvarchar(300),
        @text nvarchar(max)
    )
    AS
    DECLARE @msg nvarchar(4000), @blogId int
    BEGIN TRY
        select @blogId = (select blogId
                                     from    blog
                                     where  blogName = @blogName
                                         and  permalink = @permalink)
        if @blogId is null
          begin
                set @msg = 'Error creating new blog row' --if an error occurs, this is read
                insert blog(publishDate, permalink, blogName, author, title, text)
                values (@publishDate, @permalink, @blogName, @author, @title, @text)
          end
        else
         begin
            set @msg = 'Error updating blog row' --if an error occurs, this is read
            update blog
            set       publishDate = @publishDate,
                        permalink = @permalink,
                        blogName = @blogName,
                        author = @author,
                        title = @title,
                        text = @text
             where blogId = @blogId
                 and checksum(@blogName, @permalink, @publishdate, @author, @title, @text) <> checksumValue
         end
    END TRY
    BEGIN CATCH
        set @msg = @msg + 'The error message was:' + error_message()
        raiserror (@msg,16,1)
    END CATCH
     
    Step 4, Add a data flow task
     
    From the Control Flow Items pane, drag on a Data Flow task.  Make it the next task after the XML task we previously created. Then on the tabs, click on Data Flow.  You are now editing the data flow task. 
     
    Step 5. Setting up the XML Source
     
    Add an XML Source to the Data Flow Task.  Right click, and edit to document. For the XML Location, choose the path to the pass.xml file (I will use "c:\mssql\SSISArticle\XML\pass.xml", from Part 1 (link))  Then, click on Generate XSD, and choose a path.  I will use "c:\mssql\SSISArticle\XSD\pass.xsd". Click OK and exit the editor.
     
    Right click and choose "Show Advanced Editor".  The important part here is the Input and Output Properties tab.  There are several different interfaces listed, and we need to choose one and set it up.  For most cases that I have found, the item interface is what we want for RSS feeds (one of the reasons I am not using my blog for this article is that it has a wierd format that will require some advanced handling due to the lack of this interface.  You will be able to find the data in one or more of thes interfaces, something I will cover in part 3.)
     
    So open the item tree item, and choose Output Columns.  Datatypes are a really tricky business in SSIS (or so I have found) so we need to check/set the datatypes.  All of the character data will be unicode, which is why I used unicode in my table declaration (if you don't want to use Unicode, you can use a Data Conversion Task).  Looking through the outputs, the Item one has everything we will want.  So we want to make sure that the output from the item interface matches.  So, for the following columns, make sure that they are set to the proper datatypes by changing the Data Type Properties.  These size are what I set for my table, even though they might be too large.
     
    DataType and Lengths:
     
    title           Unicode String        300
    link            Unicode String        300
    description     Unicode Text String
    creator         Unicode String        100
    pubdate         Database Timestamp (a datetime value)
     
    Click OK to close this editor.
     
    You will see later when you execute this package warnings about unused outputs.  I have not been able to remove these warnings as when I try to remove the outputs it gives me errors.
     
    Step 6. Add the blog name
     
    To make this blog fetching process valuable, you will likely want to have multiple feeds.  Hence, to make it clear which blog is which, I will add a name column to the feed, calling it blogName.  To add this value to the set of data for the final task, I will use the Derived Column task.  Make it occur after the XML Source task using the green arrow. (the green arrow is for positive results, the red for negative, though red is not always "bad", just negative based on the task.  I won't be using the error output in this article, but it can be used to funnel off data that fails some criteria in your database or task for checking later, rather than failing the SSIS package.)
     
    When you drag the green arrow from XML Source to the Derived Column task, a dialog will pop up asking for you to choose the Output.  Since we set up the Item interface, choose that one.  There is only one input to a Derived  Column task, so it is already chosen. Click OK.
     
    Right click the Derived Column task and edit it.  In the blank for Derived Column Name, use BlogName.  Let it be added as a new column. For the expression, type "PASSBLOGS" (with the double quotes.)  Set it to Unicode string type, length 20 (since our column length is 20.)
     
    At this point you can try out the package by running it. You should see the XML Source and Derived Columns change to green and let you know that 25 rows have been moved (this is the last 25 items feed.)  If you want to view the data, stop the package, right click the line between the tasks and check out Data Viewers.  Just adding a Data Viewer will allow you to pause the task and look at the data that is being moved from task to task.  Very useful when things go wrong.
     
    Step 7 - Create or modify the row
     
    For this step , I will use the OLE DB Command task to execute the stored procedure created back in step 5.  So add an OLE DB Command task to the data flow task and set it to follow on after the derived column task using the green arrow again.
     
    To use this task type, you need to create a connection to the database.  You can go to the SSIS menu and choose New Connection, or there is likely a Connection Managers tab open (there are a lot of tabs, arent there?)  Create this new connection to your SQL Server using an OLE DB Connection type.  Be sure to specify the database where your blog table is (I have forgotten this a few times).
     
    Now, right click the OLE DB Command task.  On the Connection Managers tab, set the Connection Manager to the connection you have just created.  On the Component Properties tab, edit the SqlCommand property.  Set it to:
     
    blog$insUpd  @publishDate =?, @permalink = ?,
                          @blogName  = ?, @author = ?,
                          @title = ?, @text = ?
     
    Next switch to the Column Mappings tab, and match the input value to the parameters:
     
    @publishDate       pubDate
    @permalink         link
    @blogName          blogName
    @author            creator
    @title             title
    @text              description
     
    Click OK to close the dialog.
     
    Now, if everything is set up right, start the task and you will now have 25 rows in your table.
    For a final test, run this update:
     
    update blog
    set   text = ''
     
    This will change the data as well as modify the checksum value as well. Verify that it has changed the data:
     
    select *
    from blog
     
    Then run the package again and you should see updated data.
     
    Rhe next (and final) article in the series entitled:
    Part 3 – Transforming and insert/updating the data, complex case, I will demonstrate an ugly feed that does not include all of the data in such a nice package for us (caused by the namespaces that spaces.msn.com currently uses. )
     
    Also, I have uploaded a copy of this package to http://drsql.org/Documents/RSSfromSSISFiles.zip for you to check out.  To use/edit the package(s), add to an SSIS project.  You will need to modify the DatabaseServer connection to match your server.  I have also included a cmd file to create the directories that I used in the articles.
     
     
    June 29

    RSS from SSIS Part 1 – Downloading and preparing the XML file

    A week ago, I sat down with the intention to learn how the Full-Text Search feature worked, since I had never actually used it.  (oddly enough I did this because I am working on a backup and restore utility and the full text files were giving me mild fits. Both of these articles are forthcoming.)  So I needed some test data, and hence I decided to replicate the trick I had seen using SSIS to fetch RSS feed data. I figured this was an easy task. It is sort of, but there are lots of little tricks that I did not find overly easy to find.
     
    I finally got everything working (including a CLR object to remove HTML markup that I will include in a later article when I make use of this text data I am gathering) and I figured I would document it here (and share in case you want to try the same thing...)
     
    Note:  I am NOT an expert on SSIS, but I have tested this method and it works.  If anyone has better ideas, I am more than willing to change this article.  I would like to have documented my entire process of learning, because I actually did several things differently during my first tries.  I expect that if you are new to SSIS, if you too will take the time to smell the roses along the way and look at the many different pieces of data/controls that are available to you in SSIS, you will learn far more than by simply doing these steps (or even just downloading the package, which will be available to download at a link at the end of each article.
     
    For the first article in the series, I will simply build the bits that download the file for the RSS feed and prepare it to be used in a Data Flow Task. 
     
    Step 1 – Create an SSIS project
     
    Just start BI Development Studio, and click File; New Project.  Then choose Business Intelligence Projects; Integration Services Projects.  Name it and create it. You should have a blank project to work with, and a package named Package.dtsx.   Name it what you want, and save it where you want.
     
    Step 2 – Download the RSS file
    There is no intrinsic control to do this in SSIS, but I found a script on Ashvini Sharma’s blog: (http://sqljunkies.com/WebLog/ashvinis/archive/2005/05/25/15653.aspx)  To execute this script, use the Script Task (not the ActiveX script task).
     
    Drop the Script task from the Control Flow Items and then right click and edit the task.
    On the script section, click Design Script and paste in the script from that site in for the main function.  Move the Imports System.Net into the declarations for the module.  Close the Visual Studio for Applications Window.
    For variables, define 2 by putting the following in the ReadOnlyVariables entry: RemoteUri,LocalFileName.
    Close this window by clicking OK, and go the SSIS; Variables menu (making sure that the script task is still in focus). Create the two variables needed, both string type and scoped to the task (note, the variable names are case sensitive).
    If there is no little red X on your package (signifying that you messed something up, with no blame whatsoever on my article), at this point, you should be able to save and run your package.  You should see something like this in the output from the task:
     
    SSIS package "Package.dtsx" starting.
    Information: 0x0 at Script Task: Downloading 'c:\mssql\ssisarticle\xml\pass.xml' from 'http://sigs.sqlpass.org/DesktopModules/DnnForge%20-%20NewsArticles/Rss.aspx?TabID=46&ModuleID=368&MaxCount=25'
    SSIS package "Package.dtsx" finished: Success.
     
    Also, there should be a file named pass.xml in the directory you chose.  If this fails, it could be because of the site.  Try downloading it to a browser, if that fails, try a different feed.  If that works, it is your package that is messed up.  It should look more or less like this:
     
    Step 3– Remove Namespaces

    Ok, the real spanner in the works for me was the fact that the XML source in SSIS would not accept an XML document with multiple namespaces. For example, at the header of the pass.xml file that was just created, you will find this as the first element:
     
     
    Notice that it has multiple xmlns: values.  These are namespace declarations.  So when you try to generate an XSD for the file, you get the message: Unable to infer the XSD from the XML file. The XML contains multiple namespaces.
     
    I tried and tried to find feeds that would work to very little avail (TV Guide feeds only had the single namespace values in their feeds, but what fun is that?)  Then I tried google, but frankly the problem with any search engine is the same problem you had in 3rd grade spelling.  If you need to spell the name of the musical instrument that you hit and it sounds like bells, even if you had some idea it was a xylophone, looking it up in the dictionary for your list to Santa was next to impossible because even then the word sounds like it starts with a z.  So you could look for years and never find it without just reading the whole dictionary (or ask your parents who probably wouldn’t let you suffer that much!) Finally,after hours of hunting around, I came across this article (ironically from the same person’s blog as the last one):   http://sqljunkies.com/WebLog/ashvinis/archive/2005/05.aspx.
     
    Basically it explains what I have tried to, and he gives an XSLT script and instructions for it’s use.  So I will be brief.
     
    Add an XML Task to your Control Flow pane, and connect it as next after the Script task from Step 3.  Then right click and edit.  On the General pane, choose “XSLT” for the Operation type. 
     
    Set the Source Type to “File connection”.  For source, choose the “pass.xml” connection we previously created.
     
    For the output, set SaveOperationResult to True, then DestinationType to “File Connection” choose a File Connection”. I will use the same destination as before (pass.xml), and set the OverwriteDestination to true, so it will overwrite the file.
     
    Finally, for the Second Operand, choose “Direct input” for SecondOperandType, and for SecondOperand, paste in the XSLT from the aforementioned article (ok, here it is again: http://sqljunkies.com/WebLog/ashvinis/archive/2005/05.aspx)  Get rid of any spaces before the first < character, or you will get an interesting error.
     
    Now go ahead and test it out.  After you are happy that it works, you will want to disable the Script Task if you have to run this more than a few times.  You don’t want to be downloading this feed too much or our SIG Team may think we have crossed some threshold :)
     
    If all goes well, there will be little output from the task. Looking at the pass.xml document now, you will see that it now starts with:
     
    <?xml version="1.0" encoding="utf-8"?><rss version="2.0">
     
    So no namespaces.  Now the XSD can be generated very easily.  There will be three more parts to this series (and they should come very quickly.)  They will be:

    Part 2 – Transforming and insert/updating the data, simple case
    Part 3 – Transforming and insert/updating the data, complex case
     
    Also, I have uploaded a copy of this package to http://drsql.org/Documents/RSSfromSSISFiles.zip for you to check out.  To use/edit the package(s), add to an SSIS project.  You will need to modify the DatabaseServer connection to match your server.  I have also included a cmd file to create the directories that I used in the articles.
    June 28

    Finding the database compatibility level

    A person asked how you do this, and  Umachandar Jayachandran aswered in the forums (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=513396&SiteID=1)
     
    I always forget this one, so here it is:
     
    -- SQL Server 2005 only
    select compatibility_level
    from  sys.databases
    where name = db_name()
     
    -- 7 and 2000 version
    select cmptlevel from master.dbo.sysdatabases where name = @dbname
     
    June 22

    Article on how to get text of an object in 2005

    So, I have done this a few times already myself, but this is notable because it includes one that I hadn't even noticed.  sys.sql_modules.  Very nice.
    June 17

    So I lost track of time (a little) but SSIS is really cool

    Funny story.  About eight hours ago, I decided that I wanted to learn (and eventually write) about full text indexing.  So I start hacking away and I think "hmmm, I need some sample data."  Well, I had seen enough demos of the person building a simple little SSIS package to consume an RSS feed that I said to myself, "Louis" (I call myself that :) "You need to give it a try and stop being afraid of SSIS. 
     
    Well, it was never as easy as displayed, and there aren't a lot of resources that I could find ( I will cite the people who's blogs got me going when I write up the full article, but a big thanks to Ashvini Sharma for his XSLT to strip out namespaces, since SSIS only allows one, and I only found like one feed that had a single namespace).  The whole process was a real pain, especially when using MSN Spaces data as it materialized several tables of data from the feed that had to be joined together (SQL Junkies RSS data went in smooth.)  I realized now that I left off the author of the article, so I might add that back before I post the entire project to my downloads.  
     
    It isn't really important to my original goal of getting moderately large quantities of text for searching.  The main thing is that I have the text of the blog and the hyperlink to it.  Then I can search for the match using full text indexing and hyperlink to the site to see how well it worked.
     
    Hopefully I will avoid this computer tommorrow for all but fun tasks, but I doubt it.  I have included pictures in the blog so you can see it all lit up in green.  That means its working and needs to have more feeds added to it :)  In all seriousness, this is a really cool little tool that could be really useful for building a website of blog aggregations like we have over on the PASS SIG Site in the blogs section that I manage.  Hmm...
    June 14

    Query to find objects with certain text in 2005

    This is just another of those queries that I get sick of having to remember.  Today I was looking for an object that used a certain column and I had to think, "what do I do again?"   So hence this blog entry.  It was semi-common knowledge that in SQL Server 2000, since objects where stored in 4000 character chunks, searching for object text could be problematic.  They could have stored it in text columns, but they were frankly horrible to work with.  Now with varchar(max) you can get the entire object text in one call to the object_definition() system function.
     
    With this ability, you can just query sys.objects and do a charindex for some value.  I originally used LIKE, but since LIKE '%[Sales].[SalesOrderHeader]%' doesn't return things that contain: '[Sales].[SalesOrderHeader]', it looks for things like s.s, or s.a, etc, since the brackets mean to look for characters within the brackets. I started to add ESCAPE '~' but typing: '~[Sales~].~[SalesOrderHeader~]' just seemed ludicrous. EDIT: Then, a friend of mine and occasional coauthor (Chuck Hawkins) suggested I use a replace to get rid of the goofy brackets.  So now just leave out the brackets and this query still works, regardless of bracket usage.
     
    DECLARE @value nvarchar(128)
    SET @value = 'Sales.SalesOrderHeader'
    SELECT cast(schema_name(schema_id) + '.' + name AS varchar(60)) AS name,
                 cast(type_desc AS varchar(20)) AS type , create_date, 
                 modify_date,
                 char(13) + char(10) 
                               + '--select object_definition(' + cast(object_id as varchar(10)) + ') as [' + name + ']'
    FROM   sys.objects
    WHERE  charindex(@value,replace(replace(object_definition(object_id),'[',''),']','')) > 0 
     
    This returns (in the AdventureWorks db: 
    name                                    type                 (dates removed)
    --------------------------------------- --------------------
    Sales.iduSalesOrderDetail               SQL_TRIGGER         
    --select object_definition(1195151303) as [iduSalesOrderDetail]
    Sales.uSalesOrderHeader                 SQL_TRIGGER         
    --select object_definition(1211151360) as [uSalesOrderHeader]
    Sales.vSalesPersonSalesByFiscalYears    VIEW                
    --select object_definition(1499152386) as [vSalesPersonSalesByFiscalYears]
     
    Now, just paste the output into the query pane, highlight the little select bits to get the text of the query.
     
    select object_definition(1499152386) as [vSalesPersonSalesByFiscalYears]
     
    vSalesPersonSalesByFiscalYears
    -----------------------------------------------------------
    CREATE VIEW [Sales].[vSalesPersonSalesByFiscalYears]
    AS
    SELECT
        pvt.[SalesPersonID]
        ,pvt.[FullName]
        ,pvt.[Title]
    ...
        FROM [Sales].[SalesPerson] sp
            INNER JOIN [Sales].[SalesOrderHeader] soh
            ON sp.[SalesPersonID] = soh.[SalesPersonID]
    ...
     
     
    June 12

    ALTER DATABASE - Access Options

    I restored a database today and when it was finished restoring it said (Restricted User) on the tree in SSMS.  "Restricted User?"  I said to myself.  "Is this an R-rated database?  What have I done?"  I wrote the code in the database, so I was almost sure that there weren't even any swear words, even in the comments (comments?)
     
    So I scripted the database out, and I saw a bunch of ALTER DATABASE commands that I hadn't seen before, or at least not that I stopped to think about.  So as I find myself with a blank mind for more writing, and without the energy to finish all of the other article threads I have started (most of which I am going to put on a list so I can finish them up.)  
     
    SINGLE_USER - Obviously only a single user can be in the database at a time.  I used this one back in this article along with the termination clause (I'll mention in a minute) to drop a database immediately without wait.

    RESTRICTED_USER - This one got me initially, though it is the same as the old dbo use only setting.  When this values is set, only users in the db_owner role (or sysadmin role) can use the database.

    MULTI_USER - Normal access, and any user with any access to the database can use the database.

    For some settings (including these user access options,) you can specify a termination clause which will terminate connections to the database (other than the one executing the command) before changing the setting.

    Also, if you want to set the database to read only (or back to read write,) you can use READ_ONLY and READ_WRITE.