Case
At Allianz Global Assistance we sell, among other products, travel insurances. Every day of the year. Once a month our sold policies are extracted from a middleware system in several CSV files and need to be imported in our Datawarehouse. Before doing that, we want to check if there are no days missing in the files. So I needed to create something to check for gaps in the date sequence. The date in the file I need to check is in an ISO format: 20140321
Solution
There are different approaches on this matter. In all cases we need to have a reference "table" where we know the sequence is 100% correct. Then we can compare our real data against this reference list.
I found an interesting post at some website called eidias. There you can find 5 possible ways to address this problem.
I used a query similar to the one of option 2 in the above website. First of all I created a table with one column containing all dates from 20120101 to 20201231. Doing that in Excel cost me 10 minutes. If you are in a Datawarehouse environment, you could probably also make use of your Time dimension.
This is my stored procedure, quite simple. I guess you will need to focus at the bottom query as all of the rest is related to the specific needs of my problem.
This is my stored procedure, quite simple. I guess you will need to focus at the bottom query as all of the rest is related to the specific needs of my problem.
ALTER PROCEDURE [dbo].[sp_MCR_CheckIntegrationDatesMissing] @CurrPeriod AS NVARCHAR(6) AS BEGIN -- SET NOCOUNT ON added to prevent extra result sets from -- interfering with SELECT statements. SET FMTONLY OFF SET NOCOUNT ON; -- DECLARE @CurrPeriod AS NVARCHAR DECLARE @CurrYear AS INTEGER DECLARE @CurrMonth AS INTEGER DECLARE @NumberOfDaysInMonth AS INTEGER -- SET @CurrPeriod = 201212 SET @CurrYear = CAST(LEFT(@CurrPeriod,4) AS INT) SET @CurrMonth = CAST(RIGHT(@CurrPeriod,2) AS INT) --Find the minimum and maximum value of the sequence column of your table DECLARE @MinDate AS INTEGER DECLARE @MaxDate AS INTEGER SET @MinDate = CAST(CAST(@CurrYear AS VARCHAR(4))+CAST(@CurrMonth AS VARCHAR(2))+'00' AS INT) SET @NumberOfDaysInMonth = CASE @CurrMonth WHEN 1 THEN 31 WHEN 2 THEN 28 WHEN 3 THEN 31 WHEN 4 THEN 30 WHEN 5 THEN 31 WHEN 6 THEN 30 WHEN 7 THEN 31 WHEN 8 THEN 31 WHEN 9 THEN 30 WHEN 10 THEN 31 WHEN 11 THEN 30 WHEN 12 THEN 31 END SET @MaxDate = @MinDate + @NumberOfDaysInMonth --Query to list the missing sequence numbers SELECT * FROM ( SELECT IntegrationDate AS IntegrationDateMissing FROM POL_REF_Integration_Dates WHERE IntegrationDate BETWEEN @MinDate AND @MaxDate ) temp where IntegrationDateMissing not in ( SELECT [Integration date] FROM PolicyFile )
0 reacties:
Een reactie posten