Data source: http://www.planecrashinfo.com/database.htm.
This is an article describing how I have been extracting data from that data source: Power Query - Extracting web data using URL parts - Building functions to fetch web data. (mavaanalytics.com)
I am in the stage where I need to create an Incremental load logic for the new data.
Step 1
Create table "StgAirAccidents" for the data source. This task will be done by SSIS package.
Table will be dropped and re-created each time this package is running. I did not use SSIS variables or parameters for this solution.
Script for the Staging table.
The code is placed inside the Execute SQL Task (see bellow).
Result set: None
Primary key constraint creates a Clustered Index on the table.
BEGIN TRAN
use AirAccidents;
go
IF OBJECT_ID(N'[dbo].[StgAirAccidents]', N'U') IS NOT NULL
DROP TABLE [dbo].[StgAirAccidents];
Create table StgAirAccidents
(
AirAccidentsID int primary key identity(1,1),
Dates date,
Locations nvarchar(255),
Location1 nvarchar(255),
Location2 nvarchar(255),
Location3 nvarchar(255),
Operator nvarchar(255),
Flight nvarchar(255),
[Route] nvarchar(255),
ACType nvarchar(255),
Registration nvarchar(255),
TotalAboard int,
FatalitiesAboard int,
GroundFatalities int,
Summary nvarchar(max)
);
GO
COMMIT TRAN;
go
Step 2
Data Flow (1)
This data flow consists of an Excel Source table, Derived Column transformation and OLE DB Destination.
Excel source: sample data
In the "Locations" column I have data separated by coma. In some rows there is just one entry, other rows having 2-3 entries in that column. My goal is to separate data into 3 independent columns using Derived Column task.
Derived column transformation
Using TOKEN function to split data from Locations column into 3 new columns. RTRIM, LTRIM functions will remove leading and trailing spaces from a string data. You can not use them for data type TEXT or varchar(MAX) in MS SQL Server.
OLE DB Destination
This task will load all data into the staging table. Staging tables are used for data transformation - cleaning and modifying data.
Step 2
Execute SQL Task
This task will retrieve MAX dates from the staging and control table. I will need these values later for finishing the data flow logic.
ResultSet: None.
SQLSourceType: Direct input.
Declare @MaxDateStgAirAccidents date,
@MaxDateControlTable Date
Select @MaxDateStgAirAccidents=MAX(Dates) from StgAirAccidents
Select @MaxDateControlTable=MAX(MaxDate) from ControlTable
Creating a "ControlTable" and inserting default values.
Create table ControlTable
(
LoadID int identity(1,1),
Entity varchar(50),
RowsInserted int,
MaxDate date,
INFO varchar(150),
LastPackageRun datetime
)
INSERT INTO ControlTable(Entity, RowsInserted, MaxDate,INFO, LastPackageRun) VALUES('testing',5009,'2021-09-12','NO INFO',GETDATE())
Step 3
Implementing a logic to clean from the staging table.
I have been cleansing these data using nested CTE's and T-SQL script. I used them over 250 times. First step was to create a new column from Location1, Location2 and Location3 = MapCountry.
In the second step I have been cleaning data from the "MapCountry" column using simple login behind the Nested Case Statement. The whole code is stored inside a View and have a similar functionality as DQS in DW.
Step 4
Creating DimCountry dimension.
After I have extracted the country names from the "Locations" column I placed them into a excel table and thanks to a look up functions and another country data sets I created a "DimCountry" dimension which was joined with the result set of the nested CTE statement.
This script is a part of another SSIS package designed just for loading these two tables from MS Excel source.
BEGIN TRANSACTION
use AirAccidents;
go
BEGIN TRANSACTION
IF OBJECT_ID(N'[dbo].[DimCountry]', N'U') IS NOT NULL
DROP TABLE [dbo].[DimCountry];
Create table DimCountry
(
CountryID INT not null,
Country NVARCHAR(255),
Continent NVARCHAR(255),
ContinentID int not null,
PRIMARY KEY (CountryID)
)
COMMIT TRANSACTION
BEGIN TRANSACTION
IF OBJECT_ID(N'[dbo].[DimContinents]', N'U') IS NOT NULL
DROP TABLE [dbo].[DimContinents];
Create table DimContinents
(
ContinentID int,
Continent nvarchar(255),
PRIMARY KEY (ContinentID)
)
ALTER TABLE DimCountry
ADD FOREIGN KEY (ContinentID) REFERENCES DimContinents(ContinentID);
COMMIT TRANSACTION
COMMIT TRANSACTION
GO
This package loads excel source tables into SQL server
Step 5
After I have finished DimCountry and DimContinent tables I used the package above for loading them into Data Warehouse. This SQL-statement is joining the final result set form CTE (which is cleansing data from the "Staging table") with these two tables. I am retrieving just the foreign keys from each table. Whole script was saved as a View: vAirAccidents and as I said before, the purpose of this view is to clean data from the staging table. Full scrip is listed bellow.
SELECT
s2.Dates,s2.AirAccidentsID, s2.Locations,C1.CountryID,C2.ContinentID,
S2.Country,s2.Operator,s2.Flight,s2.[Route],
s2.ACType,s2.Registration,s2.TotalAboard,
s2.FatalitiesAboard,s2.GroundFatalities,s2.Summary
FROM Staging2 s2
INNER JOIN DimCountry C1
ON S2.Country = C1.Country
INNER JOIN DimContinents C2
ON c1.ContinentID = C2.ContinentID
Create view [dbo].[vAirAccidents]
AS
With Staging1
as
(
SELECT Locations,
Location1,
Location2,
Location3,
Case
When Location2 ='' Then Location1
Else Location2
end MapCountry,
AirAccidentsID, Dates, Operator, Flight, [Route], ACType, Registration, TotalAboard, FatalitiesAboard, GroundFatalities, Summary
FROM StgAirAccidents
), Staging2 as
(
SELECT AirAccidentsID,Dates,Locations,Location1,Location3,MapCountry,
CASE
WHEN MapCountry =': Massachusetts' Then 'Boston'
WHEN MapCountry IN ('Afghanistan','Afghanstan','Centeral Afghanistan Afghanistan','Northern Afghanistan') THEN 'Afghanistan'
WHEN MapCountry ='near Ajaccio' Then 'Ajaccio'
WHEN MapCountry IN ('116 miles WSW of Annette Island','AK','Alaksa','Alakska','Alaska','Off the Alaska coast') THEN 'Alaska'
WHEN MapCountry ='Algiers' Then 'Algeria'
WHEN MapCountry ='Off Algiers' Then 'Algeria'
WHEN MapCountry ='Over the Andaman Sea' Then 'Andaman Sea'
WHEN MapCountry ='off Angola' Then 'Angola'
WHEN MapCountry IN ('Aregntina','En route from Argentina to California') THEN 'Argentina'
WHEN MapCountry ='Over the Atlantic Ocean' Then 'Atlantic Ocean'
WHEN MapCountry IN ('Australila','NSW','off Australia','OLD','QLD','Qld. Australia','UAR','WA') THEN 'Australia'
WHEN MapCountry ='Azerbaijan Bakou' Then 'Azerbaijan'
WHEN MapCountry ='Off Bahrain' Then 'Bahrain'
WHEN MapCountry IN ('Atlantic Ocean between N.Y. and Bermuda','NE of Bermuda','off Bermuda') THEN 'Bermuda'
WHEN MapCountry ='Off Bimini' Then 'Bimini'
WHEN MapCountry IN ('Beni','Boliva') THEN 'Bolivia'
WHEN MapCountry = 'Malinau district' Then 'Borneo'
WHEN MapCountry ='Bosnia' Then 'Bosnia-Herzegovina'
WHEN MapCountry IN ('570 miles northeast of Natal','Para','SC') THEN 'Brazil'
WHEN MapCountry IN ('Bugaria','bulgaria','Bulgeria') THEN 'Bulgaria'
WHEN MapCountry IN ('CA','Cailifornia','California','Californiia','Calilfornia') THEN 'California'
WHEN MapCountry IN ('Argyll','Ayrshire','East Sussex','England','Forest-in-Teesdale. United Kingdom','Glasgow Scotland','Hants','Kent','London','UK','Cheshire',
'Lancashire','Lochgoilhead','Shetland') THEN 'United Kingdom'
WHEN MapCountry ='Cameroons' Then 'Cameroon'
WHEN MapCountry IN ('Alberta','Ontario','Placentia','BC','Northwest Territories','Northwest Territories Canada','Northwest Territory','NWT','ON','PE','PQ','Prov. Quebec',
'QC','Quebec','Quebec Canada','SK','WY','Labrador','Manitoba','Newfoundland and Labrador','Nova Scotia','Nunavut','Saskatchewan','Yukon','Placentia') THEN 'Canada'
WHEN MapCountry ='Cape Verde' Then 'Cape Verde Islands'
WHEN MapCountry IN ('near Chicago','Near Chicago Illinois') THEN 'Chicago'
WHEN MapCountry IN ('Andes','Off Chili') THEN 'Chile'
WHEN MapCountry ='French Indo-China' Then 'China'
WHEN MapCountry ='N of Santander' Then 'Colombia'
WHEN MapCountry ='Coloado' Then 'Colorado'
WHEN MapCountry IN ('Comoro Islands','Comoros') THEN 'Comoros Islands'
WHEN MapCountry ='near Covington' Then 'Covington'
WHEN MapCountry ='Deleware' Then 'Delaware'
WHEN MapCountry IN ('Democratic Republic Cogo','Democratic Republic Congo','Democratic Republic of Congo','Democtratic Republic Congo','DR Congo','Katanga') THEN 'Democratic Republic of the Congo'
WHEN MapCountry ='Near Dioubel' Then 'Dioubel'
WHEN MapCountry IN ('Djbouti','Djibouti Djibouti City') THEN 'Djibouti'
WHEN MapCountry ='Domincan Republic' Then 'Dominican Republic'
WHEN MapCountry ='Azuay' Then 'Ecuador'
WHEN MapCountry ='175 miles off the Egyptian coast' Then 'Egypt'
WHEN MapCountry ='Over the English Channel' Then 'English Channel'
WHEN MapCountry IN ('Atlantic Ocean off Florida','FL - Nassau','Off the Florida coast') THEN 'Florida'
WHEN MapCountry IN ('Ariège','Off the coast of France','Orly Airport','Loire') THEN 'France'
WHEN MapCountry ='near Ft. Collins' Then 'Colorado'
WHEN MapCountry ='East Germany' Then 'Germany'
WHEN MapCountry ='Off Gibraltar' Then 'Gibraltar'
WHEN MapCountry ='near Grants' Then 'New Mexico'
WHEN MapCountry ='Mt. Helmos. Greece' Then 'Greece'
WHEN MapCountry ='Guadaloupe' Then 'Guadeloupe'
WHEN MapCountry ='Off Guam' Then 'Guam'
WHEN MapCountry ='Over the Gulf of Finland' Then 'Gulf of Finland'
WHEN MapCountry ='000 miles west of Honolulu' Then 'Hawaii'
WHEN MapCountry ='Amsterdam' Then 'Netherlands'
WHEN MapCountry IN ('Near Hong Kong','Near Hong Kong International Airport','Off Hong Kong') THEN 'Hong Kong'
WHEN MapCountry ='HI' Then 'Honolulu'
WHEN MapCountry ='Near Houma Louisiana' Then 'Louisiana'
WHEN MapCountry ='Pest Hungary' Then 'Hungary'
WHEN MapCountry ='Ilinois' Then 'Illinois'
WHEN MapCountry IN ('India / Kandahar','Indian','Northern India') THEN 'India'
WHEN MapCountry ='10 mile SE of Tell City' Then 'Indiana'
WHEN MapCountry ='Northern Iraq' Then 'Iraq'
WHEN MapCountry IN ('100 miles W of Galway Bay','110 miles West of Ireland','Northern Ireland','Off Irish coast') THEN 'Ireland'
WHEN MapCountry IN ('Near Irkutsk Russia','Russian','Soviet Union','USSR') THEN 'Russia'
WHEN MapCountry ='Northern Israel' Then 'Israel'
WHEN MapCountry ='Calabria' Then 'Italy'
WHEN MapCountry ='Near Jacquinot Bay New Guinea' Then 'New Guinea'
WHEN MapCountry ='Jamacia' Then 'Jamaica'
WHEN MapCountry ='Jawa' Then 'Java Sea'
WHEN MapCountry ='near Kampung Ladang' Then 'Malaysia'
WHEN MapCountry ='Near Karkov' Then 'Karkov'
WHEN MapCountry ='Kazakastan' Then 'Kazakhstan'
WHEN MapCountry ='near Kedarnath' Then 'India'
WHEN MapCountry ='off Kefallinia' Then 'Greece'
WHEN MapCountry ='Northeast Laos' Then 'Laos'
WHEN MapCountry ='Eastern Libya' Then 'Libya'
WHEN MapCountry ='Off Malaya' Then 'British Malaya'
WHEN MapCountry ='Malaya' Then 'Malaysia'
WHEN MapCountry IN ('Malta International Airport','Off Malta-Luqa') THEN 'Malta'
WHEN MapCountry ='Manatoba' Then 'Manitoba'
WHEN MapCountry ='Massachutes' Then 'Massachusetts'
WHEN MapCountry ='Mauritania' Then 'Mauretania'
WHEN MapCountry IN ('Over the Mediterranean','Over the Mediterranean Sea') THEN 'Mediterranean Sea'
WHEN MapCountry ='Mexic' Then 'Mexico'
WHEN MapCountry ='18 NNW of Benton Harbor' Then 'Michigan'
WHEN MapCountry ='near Milton' Then 'Florida'
WHEN MapCountry ='Minnisota' Then 'Minnesota'
WHEN MapCountry ='Mississipi' Then 'Mississippi'
WHEN MapCountry ='Moldavia' Then 'Moldova'
WHEN MapCountry ='Inner Mongolia' Then 'Mongolia'
WHEN MapCountry IN ('Morroco','Off Morocco') THEN 'Morocco'
WHEN MapCountry ='Burma (Myanmar)' Then 'Myanmar'
WHEN MapCountry ='Near Nag' Then 'Philippines'
WHEN MapCountry IN ('Nambia','South-West Africa (Namibia)') THEN 'Namibia'
WHEN MapCountry ='near Natanz' Then 'Iran'
WHEN MapCountry ='Napal' Then 'Nepal'
WHEN MapCountry ='10 miles SW of Las Vegas' Then 'Nevada'
WHEN MapCountry IN ('New York (Idlewild)','NY','Queens') THEN 'New York'
WHEN MapCountry ='Fox Glacier Airstrip' Then 'New Zealand'
WHEN MapCountry ='800 miles east of Newfoundland' Then 'Canada'
WHEN MapCountry ='Over the North Atlantic' Then 'North Atlantic'
WHEN MapCountry ='Over North Carolina' Then 'North Carolina'
WHEN MapCountry ='Over the North Pacific Ocean' Then 'North Pacific Ocean'
WHEN MapCountry ='Over the North Sea' Then 'North Sea'
WHEN MapCountry ='Off Northern Germany' Then 'Germany'
WHEN MapCountry ='Off the Oregon coast' Then 'Oregon'
WHEN MapCountry ='near Pacific Grove' Then 'California'
WHEN MapCountry IN ('Over the Pacific Ocean','Pacific Ocean between Hong Kong and Macao','Pacific Ocean between Manila and Guam') THEN 'Pacific Ocean'
WHEN MapCountry ='East Pakistan' Then 'Pakistan'
WHEN MapCountry ='Off the Panama coast' Then 'Panama'
WHEN MapCountry ='Papua' Then 'Papua New Guinea'
WHEN MapCountry ='Near Petreasa Romania' Then 'Romania'
WHEN MapCountry ='Off western Denmark' Then 'Denmark'
WHEN MapCountry ='Phillipines' Then 'Philippines'
WHEN MapCountry ='near Plymouth' Then 'Plymouth'
WHEN MapCountry ='near Point Barrow' Then 'Alaska'
WHEN MapCountry IN ('Algarve','Azores','Azores (Portugal)','Marine Base') THEN 'Portugal'
WHEN MapCountry ='Off Puerto Rico' Then 'Puerto Rico'
WHEN MapCountry ='near Roussillon' Then 'France'
WHEN MapCountry IN ('Russian','Soviet Union','USSR','Chechnya') THEN 'Russia'
WHEN MapCountry ='near Santa Barbara' Then 'Santa Barbara'
WHEN MapCountry IN ('Off São Tomé Island','Sao Tomé') THEN 'Sao Tomé & Principe'
WHEN MapCountry ='East Sardinia' Then 'Sardinia'
WHEN MapCountry ='200 miles off Dakar' Then 'Senegal'
WHEN MapCountry ='Sierre Leone' Then 'Sierra Leone'
WHEN MapCountry ='near Silver Plume' Then 'Colorado'
WHEN MapCountry ='Biafra' Then 'Uganda'
WHEN MapCountry IN ('Madrid','Off Spain','Spain Moron AFB','Biscay') THEN 'Spain'
WHEN MapCountry ='near Strasbourg' Then 'France'
WHEN MapCountry ='North Sulawesi' Then 'Sulawesi'
WHEN MapCountry ='Off Tabones Island Philippines' Then 'Philippines'
WHEN MapCountry ='Near Tachikawa Air Base' Then 'Japan'
WHEN MapCountry ='Off Taiwan' Then 'Taiwan'
WHEN MapCountry ='near Tehran' Then 'Iran'
WHEN MapCountry ='Tennesee' Then 'Tennessee'
WHEN MapCountry ='Besar' Then 'Thailand'
WHEN MapCountry ='off the Philippine island of Elalat' Then 'Philippines'
WHEN MapCountry ='Off Turks and Caicos Islands' Then 'Turks and Caicos Islands'
WHEN MapCountry ='near Ueno Village' Then 'Japan'
WHEN MapCountry ='UAE' Then 'United Arab Emirates'
WHEN MapCountry ='off Ustica' Then 'Italy'
WHEN MapCountry ='Uzbekstan' Then 'Uzbekistan'
WHEN MapCountry ='Barquisimeto Venezuela' Then 'Venezuela'
WHEN MapCountry ='Near Villia Greece' Then 'Greece'
WHEN MapCountry ='Virginia.' Then 'Virginia'
WHEN MapCountry ='near Vitacura' Then 'Chile'
WHEN MapCountry IN ('325 miles east of Wake Island','Off Wake Island') THEN 'Wake Island'
WHEN MapCountry IN ('D.C.','Washingon','Washington','Washington DC') THEN 'Washington'
WHEN MapCountry IN('Off West Africa','Off Western Africa','French West Africa') Then 'Ghana'
WHEN MapCountry ='Off western Denmark' Then 'Denmark'
WHEN MapCountry ='North Yorkshire' Then 'United Kingdom'
WHEN MapCountry ='Yugosalvia' Then 'Yugoslavia'
WHEN MapCountry IN('Belgian Congo','Belgian Congo (Zaire)','Belgium Congo') Then 'Zaire'
WHEN MapCountry ='Zimbabwe (Harare' Then 'Zimbabwe'
WHEN MapCountry ='Zabul' Then 'Afghanistan'
WHEN MapCountry ='Aargau' Then 'Switzerland'
WHEN MapCountry ='Aichi Prefecture' Then 'Japan'
WHEN MapCountry IN ('Ajaccio','French Alps','Val-de-Marne') Then 'France'
WHEN MapCountry ='Bavaria' Then 'Germany'
WHEN MapCountry IN ('Bihimi','Inagua') Then 'Bahamas'
WHEN MapCountry ='Buea' Then 'Cameroon'
WHEN MapCountry ='Cerro de Pasco' Then 'Peru'
WHEN MapCountry IN ('Desertores Island Región de Los Lagos','San Vicente de Tagua Tagua') Then 'Chile'
WHEN MapCountry IN ('French Cameroons','French Equatorial Africa') Then 'Cameroon'
WHEN MapCountry IN ('Guantanamo Bay','Santiago de Cuba') Then 'Cuba'
WHEN MapCountry IN ('Hati','Bandar Abbas') Then 'Iran'
WHEN MapCountry ='Huánuco' Then 'Peru'
WHEN MapCountry ='Huila Department' Then 'Colombia'
WHEN MapCountry ='Jeddah' Then 'Saudi Arabia'
WHEN MapCountry ='Kharkiv' Then 'Ukraine'
WHEN MapCountry IN ('Lombardy','Piacenza','Trento') Then 'Italy'
WHEN MapCountry IN ('Maharashtra','Manmar','West Bengal') Then 'India'
WHEN MapCountry ='Miaoli' Then 'Taiwan'
WHEN MapCountry ='Molokai' Then 'Hawaii'
WHEN MapCountry IN ('Mt. Fuji','Okinawa Prefecture','Tokyo','Okinawa') Then 'Japan'
WHEN MapCountry ='Norrbotten' Then 'Sweden'
WHEN MapCountry IN ('Nuevo León','Veracruz') Then 'Mexico'
WHEN MapCountry ='Peshawar' Then 'Pakistan'
WHEN MapCountry ='Providencia Island' Then 'Rhode Island'
WHEN MapCountry IN ('Rhodesia','Rhodesia (Zimbabwe)') Then 'Zimbabwe'
WHEN MapCountry IN ('Rio de Janeiro','São Gabriel da Cachoeira','São Paulo') Then 'Brazil'
WHEN MapCountry ='Ross Ice Shelf' Then 'Antarctica'
WHEN MapCountry ='South Kazakhstan Region' Then 'Kazakhstan'
WHEN MapCountry ='South Yemen' Then 'Yemen'
WHEN MapCountry ='Southeastern Bolivia' Then 'Bolivia'
WHEN MapCountry IN ('Stirling','Staines-upon-Thames','Surrey','Sussex','Middlesex') Then 'United Kingdom'
WHEN MapCountry ='Taiwan (Formosa)' Then 'Taiwan'
WHEN MapCountry ='Tanganyika' Then 'Kenya'
WHEN MapCountry ='Terceira' Then 'Portugal'
WHEN MapCountry ='Territory of New Guinea' Then 'New Guinea'
WHEN MapCountry ='Valle del Cauca Department' Then 'Colombia'
WHEN MapCountry ='Västergötland' Then 'Sweden'
WHEN MapCountry ='West Indies' Then 'Galante Island'
WHEN MapCountry ='West Vlaanderen' Then 'Belgium'
WHEN MapCountry ='Zabul' Then 'Afghanistan'
WHEN MapCountry ='Zulia' Then 'Venezuela'
WHEN MapCountry ='Bias Bay' Then 'China'
WHEN MapCountry ='Central Mozambique' Then 'Mozambique'
WHEN MapCountry ='Lorraine' Then 'Canada'
WHEN MapCountry ='Oltenia' Then 'Romania'
WHEN MapCountry ='Picrdie' Then 'France'
WHEN MapCountry ='Siberia' Then 'Russia'
WHEN MapCountry ='Biscay' Then 'Spain'
WHEN MapCountry ='Zaire' Then 'Angola'
WHEN MapCountry ='Zabul Province' Then 'Afghanistan'
WHEN MapCountry ='West Germany' Then 'Germany'
WHEN MapCountry IS NULL Then 'Unknown'
ELSE MapCountry
END AS Country,
Operator, Flight, [Route], ACType, Registration, TotalAboard, FatalitiesAboard, GroundFatalities, Summary
FROM Staging1
)
SELECT
Dates,AirAccidentsID, Locations,C1.CountryID,C2.ContinentID,
S2.Country,Operator,Flight,[Route],
ACType,Registration,TotalAboard,
FatalitiesAboard,GroundFatalities,Summary
FROM Staging2 s2
INNER JOIN DimCountry C1
ON S2.Country = C1.Country
INNER JOIN DimContinents C2
ON c1.ContinentID = C2.ContinentID
GO
Step 5
Loading data into Data Warehouse
After I have cleaned all data from the staging table I am ready to load them into DW. As you can see from the picture bellow it is a very simple package with a clear logic behind it.
Drop & Re-Create StgAirAccidents creates a staging table
Get Data Into StgAirAccidents loads data from the source into staging table.
Get Dates extracting dates. These dates will be used in OLE DB Source vAirAccidents (2. Data flow)
Load Data Into DW loads cleaned data from view (view has data from the staging table) into the final table - DimAirAccidents.
This is Load Data Into DW, Data Flow task
To load the data into DW I am using a store procedure.
Script:
CREATE PROC [dbo].[spInsertDimAirAccidents]
As
BEGIN
Declare
@MaxDateStgAirAccidents date,
@MaxDateControlTable date,
@EmptyLocation nvarchar(255),
@StageRows int,
@DimRows int,
@NewRows int
SELECT @MaxDateControlTable =MAX(MaxDate) FROM ControlTable
SELECT @MaxDateStgAirAccidents =MAX(Dates) FROM StgAirAccidents
--load data into DimAirAccidents
SELECT * FROM vAirAccidents
WHERE Dates > @MaxDateControlTable
AND Dates <= @MaxDateStgAirAccidents
--new loaded rows calculation
SELECT @StageRows =COUNT(AirAccidentsID) FROM StgAirAccidents
SELECT @DimRows =COUNT(AirAccidentsID) FROM DimAirAccidents
SET @NewRows =@StageRows-@DimRows
--handling empty Location columns
SELECT @EmptyLocation= Locations FROM StgAirAccidents
WHERE AirAccidentsID = (SELECT MAX(AirAccidentsID) FROM StgAirAccidents)
IF(@EmptyLocation is null)
BEGIN
--Insert into ControlTable
INSERT INTO ControlTable(Entity,RowsInserted, MaxDate,INFO,LastPackageRun)
VALUES('DimAirAccidents',@NewRows,@MaxDateStgAirAccidents,'UNKNOWN LOCATION',GETDATE())
END
ELSE
BEGIN
--Insert into ControlTable
INSERT INTO ControlTable(Entity,RowsInserted, MaxDate,INFO,LastPackageRun)
VALUES('DimAirAccidents',@NewRows,@MaxDateStgAirAccidents,'NO PROBLEMS',GETDATE())
END
--delete all records from ControlTable, keep just the most recent one
DELETE FROM ControlTable
WHERE LoadID < (SELECT MAX(LoadID) FROM ControlTable)
IF(SELECT RowsInserted FROM ControlTable)<0
BEGIN
Update ControlTable SET RowsInserted = 0
END
END
Testing the Incremental Load logic of the package.
Right now I have 5009 rows in DimAirAccidents. This is s table in which all the cleaned data from the web site were loaded. On the right site is a ControlTable with just the default data.
MaxDate in the ControlTable is the max date from the staging table.
I am going to run the package. There are no new data on the source-side what means, no data will be loaded into DimAirAccident table and the information in the control table will be updated.
Execution Results
It takes 00:00:03.3 seconds to load 5009 rows of data.
Testing for a new data in the source table.
I have manually added new record into the source table. (London, England). Lets run the package.
I have a new record in DimAirAccident table saying that this accident happened in the United Kingdom. ContinentID = 8 stands for "Europe". Information about the Continent is retrieved from other dimension.
Testing for missing data
Most of the cleaning is done in Power Query. If there are some missing values or crazy characters as a question marks, Power Query can handle them and replace them for a predefined patterns. In this scenario I will be testing the incremental load for a missing value in Location column on the source table.
Result: 1 new row loaded.
ControlTable returns an information "UNKNOWN LOCATION"
The logic in the original package was: do not load Unknown locations into DW. But as I have already couple of them in the table, I change the logic of this package while writing this blog.
I have added one more task at the end.
UPDATE DimAirAccidents
SET Locations='Unknown'
WHERE AirAccidentsID =(SELECT
AirAccidentsID FROM DimAirAccidents
WHERE Locations IS NULL)
Execution Results: 5010 rows, 00:00:03.203 seconds.
Komentarji