zaterdag 22 maart 2014

Case

At Allianz Global Assistance we sell, among other products, travel insurances. Every day of the year. Once a month our sold policies are extracted from a middleware system in several CSV files and need to be imported in our Datawarehouse. Before doing that, we want to check if there are no days missing in the files. So I needed to create something to check for gaps in the date sequence. The date in the file I need to check is in an ISO format: 20140321

Solution

There are different approaches on this matter. In all cases we need to have a reference "table" where we know the sequence is 100% correct. Then we can compare our real data against this reference list.

I found an interesting post at some website called eidias. There you can find 5 possible ways to address this problem.

I used a query similar to the one of option 2 in the above website. First of all I created a table with one column containing all dates from 20120101 to 20201231. Doing that in Excel cost me 10 minutes. If you are in a Datawarehouse environment, you could probably also make use of your Time dimension.

This is my stored procedure, quite simple. I guess you will need to focus at the bottom query as all of the rest is related to the specific needs of my problem.

ALTER PROCEDURE [dbo].[sp_MCR_CheckIntegrationDatesMissing]
 @CurrPeriod AS NVARCHAR(6)
AS
BEGIN
 -- SET NOCOUNT ON added to prevent extra result sets from
 -- interfering with SELECT statements.
 SET FMTONLY OFF
 SET NOCOUNT ON;

-- DECLARE @CurrPeriod AS NVARCHAR
 DECLARE @CurrYear AS INTEGER
 DECLARE @CurrMonth AS INTEGER
 DECLARE @NumberOfDaysInMonth AS INTEGER


-- SET @CurrPeriod = 201212
 SET @CurrYear = CAST(LEFT(@CurrPeriod,4) AS INT)
 SET @CurrMonth = CAST(RIGHT(@CurrPeriod,2) AS INT)

 --Find the minimum and maximum value of the sequence column of your table
 DECLARE @MinDate AS INTEGER
 DECLARE @MaxDate AS INTEGER

 SET @MinDate = CAST(CAST(@CurrYear AS VARCHAR(4))+CAST(@CurrMonth AS VARCHAR(2))+'00' AS INT)

 SET @NumberOfDaysInMonth =  CASE @CurrMonth
        WHEN 1 THEN 31
        WHEN 2 THEN 28
        WHEN 3 THEN 31
        WHEN 4 THEN 30
        WHEN 5 THEN 31
        WHEN 6 THEN 30
        WHEN 7 THEN 31
        WHEN 8 THEN 31
        WHEN 9 THEN 30
        WHEN 10 THEN 31
        WHEN 11 THEN 30
        WHEN 12 THEN 31
        END
        
 SET @MaxDate = @MinDate + @NumberOfDaysInMonth

 --Query to list the missing sequence numbers
 SELECT  *
 FROM
 (
  SELECT IntegrationDate AS IntegrationDateMissing FROM POL_REF_Integration_Dates WHERE IntegrationDate BETWEEN @MinDate AND @MaxDate
 ) temp
 where IntegrationDateMissing not in
 (
  SELECT [Integration date] FROM PolicyFile
 )
Categories: ,

0 reacties:

Een reactie posten

Subscribe to RSS Feed Follow me on Twitter!