Web source: Database index (planecrashinfo.com)
Goal is to extract the accident details for each year and date. That means get a access to level 3 data.
Step 1
Bring the Accident Details page in.
Step 2
Pivot
Go to: Transform > Pivot > Advanced options > Don't Aggregate
When data has been pivoted you can decide which columns (15) you want to keep or remove. You can also do data cleaning transformations. I will not demonstrate that, my goal is just focus on extracting the data.
Step 3
Get years.
Unpivot Columns
Keep just column with years, remove bottom rows (8 empty rows)
By now I have the access to level 1 data. Next task is to get access to level 2. Through level 2 I will be able to reach on data in level 3.
Step 4
Get Level 2 data. Remove all the columns and keep just one, dates. I will use these data to build a function to unlock data in level 3.
Step 5
After I have removed all the other columns I transform this dataset into a function.
Step 6
(Year)=>
let
Source = Web.Page(Web.Contents("http://www.planecrashinfo.com/"&Year&"/"&Year&".htm")),
As the URL is a text, Year variable must be a text too.
Step 7
In step 4 I extracted LEVEL 2 data. I will return them back here using tblHelp function.
Step 8
For year 1920 there are 51 date records in that table. I will add an index to each date.
Step 9
We add an Index to the Date column.
=>Add custom column
=Table.AddIndexColumn([Custom],"Index",1,1)
Expand IndexingCustomColumn column.
Remove unnecessary columns.
Now we have this key to get to access to Level 3 data, AccidentDetails.
Step 10
I convert this dataset into a function.
(year, index)=>
let
Source = Web.Page(Web.Contents("http://www.planecrashinfo.com/"&year&"/"&year&"-"&index&".htm"))
This function needs 2 arguments, which will be supplied form AirAcciddents table.
Step 11
Add Custom column, use AccidentDetails function.
Expand the column names from the FinalTable and bring them in.
Now I have all the accident details in one table.
Conclusion: It takes 5 hours to Power BI Desktop to load all the rows. MS Excel can load these data in 1 hour.
コメント