Excel blog about Cohort analysis: Customer Retention & Cohort Analysis (mavaanalytics.com)
Sample Data
First thing I need to do is to add a column that will display the first purchase by each customer.
Hmm, unfortunately this code is not working as I expected. It returns a MIN(Date) for entire data set. This is completely fucked up, I hate DAX....
Ok, after 2hrs of trying I finally got it. This code returns a date of the first purchase for the individual customer. But we got too many days now. Let's say that this bill is paid by the end of the month and return just EOMonth instead.
This will return 11 unique dates. I will use these dates as rows in the Matrix visual. We only need to use months, eventually years.
Alternatively, you can write a SQL code that will return the same result.
with FirstPurchaseDate as
(
SELECT CustomerID
,SubscriptionPrice
,OrderDate
,MIN(orderdate) over (Partition by customerid order by customerid) MinByCustomer
FROM [Salome].[dbo].[CohortData]
)
,Result as
(
select *
,EOMONTH(MinByCustomer) InitialPurchaseDate_EOM
from FirstPurchaseDate
)
select
CustomerID
,SubscriptionPrice
,OrderDate
,InitialPurchaseDate_EOM
from Result
Next thing we need for CA is to generate months for the column in the Matrix visual, from 0-11.
0 is the month of the first purchase, following by the next 11 months.
This is quite easy. Go to HOME tab, NEW TABLE.
Alternatively, you can create a SQL stored procedure like this (...if you hate DAX)
Create proc GenerateSeries
(@StartValue int, @EndValue Int, @Increment int)
as
with Result as (
select @StartValue as Months
union all
select Months + @Increment from Result
where Months < @EndValue
)
select * from Result
option (maxrecursion 0)
Exec GenerateSeries@StartValue = 0 ,@EndValue =11 ,@Increment = 1
You can add a custom formatted column, add a column to your table, but that column needs to use InitialPurchaseDate values.
Data inside the Matrix visual are populated from DISTINCTCOUNT(CohortData[CustomerID]) measure. We won't need this measure. Values are repeating because there is no relationship between the generated series and the transaction table.
IF YOU HATE DAX USE SQL...
After we have built this sophisticated data model, we can try to write the measure for the Customer Retention Analysis.
I think I'm on the right track. These two variables should retrieve what I need to retrieve.
CohortAnalysis =
var MonthAfterStartMonth = SELECTEDVALUE(StartMonth[Month])
var InitialPurchase = SELECTEDVALUE(CohortData[InitialPurchaseDate])
var Result=
CALCULATE(DISTINCTCOUNT(CohortData[CustomerID]),
FILTER(CohortData,
EOMONTH(CohortData[OrderDate],0) =
EOMONTH(InitialPurchase, MonthAfterStartMonth)))
Return
Result
Month 0 contains a distinct count of customers that made their first purchase in the initial month. Feb_2020 & Month 0 contains 10 distinct customers. 4 out of these 10 made another purchase in the next month. After that, these customers stopped paying for the subscription. These are no longer our customers, we lost them.
To display the proportion of the customers that remain with us we need to divide each Month value (horizontal values) by the corresponding value in Month 0 column. This column is holding the first customer purchase.
CohortAnalysis 2 =
var MonthAfterStartMonth = SELECTEDVALUE(StartMonth[Month])
var InitialPurchase = SELECTEDVALUE(CohortData[InitialPurchaseDate])
var CustomerRetention =
CALCULATE(DISTINCTCOUNT(CohortData[CustomerID]),
FILTER(CohortData,
EOMONTH(CohortData[OrderDate],0) =
EOMONTH(InitialPurchase, MonthAfterStartMonth)))
var InitialCustomers = DISTINCTCOUNT(CohortData[CustomerID])
var Result = DIVIDE(CustomerRetention, InitialCustomers ,0)
Return
Result
What happened in this measure is, that we took the previous measure and divide each month by its initial value.
For example, Feb_2020: (10/10 =1.00) (4/10=0.40)
SUM is not working correctly, and that's it. If you know why, leave a comment:=)
... this would not happen if you used SQL all the time
Kommentarer