30-10-2019 | POINT OF VIEW
The shape of you: how Topological Data Analysis reveals the unknown unknowns

“We are drowning in information and starving for knowledge.” – Rutherford D. Roger

Data is now king. It is driving digital transformation, heralding the second machine age, among every other firm’s top three strategic priorities, and fuelling the fourth industrial revolution.

Except data itself has no intrinsic value – you need to join up the dots to extract insights and enable better and more timely decisions. And more often than not, the explosion of data has led to firms feeling inundated, rather than informed, unsure of the important patterns among thousands of different measures and views.

Banks are no exception to this. Confronted with data sets that are both wide and long, banks too face the challenge of sieving signal from noise. Adding to the challenge is banks’ need to identify meaningful signal from the rapidly evolving landscape of “unknown unknowns” – think cyber threats to financial systems or new methods of conducting fraud. Regulatory pressures and consumer expectations on resilience are ramping up. The onus is on banks to spot new threats and identify malicious behaviours ahead of time.

Increasingly, banks are turning to unsupervised learning to mine the vast, frequent and granular data to their advantage. Unsupervised learning is a collection of analytics methods that don’t require assigned labels to identify patterns.

There are many unsupervised techniques available. But the specificities of financial services – wide data sets with varied scales (e.g. a bank’s global traded products order book) – lend themselves well to Topological Data Analysis (TDA).

TDA outperforms traditional unsupervised methods by letting data speak for themselves. Traditionally, the outputs of unsupervised learning are discrete groupings of similar data. But by imposing fewer arbitrary assumptions and by producing a continuous map between groupings, TDA allows firms to view the problem in higher resolution.

For example, we recently applied TDA on Open Banking data of 50 million transactions from 500,000 customer-level bank accounts to build a picture of different incomes and spending habits. Traditional clustering methods could have done part of the job and identified young professionals’ spending habits, allocating them to a different cluster to say the baby boomers, but it then requires manual intervention and domain knowledge to understand the inter-relationship between clusters. Our TDA solution creates a fine-grained, continuous map of segments: segment A who spend their money on sweets and fizzy drinks are closely related to B who spend on Lego and comics but not related to C who spend on savings and pension products. The insights now feed customer churn models, providing the client an early identification of when a customer starts to “wind down” an account.

There are advantages when it comes to application and implementation too. TDA results are highly visual and interpretable, circumventing some of the “black box” concerns associated with machine learning and providing actionable, intuitive insight. It can be easily integrated at different points in the incumbent analytics stack. There are also a range of solutions available from enterprise-grade vendors to free, open-source ones in Python, R and Java.

TDA is an analytics booster that can be applied to various banking processes that require data summarisation and insight, be that trade execution, customer insight, or internal workflows. We have, for example, helped global banks applying TDA in capital markets to identify anomalous trade execution patterns – by market, symbol, trader and algorithm – using its global trading book data.

TDA’s power resides in its versatility, but it is not silver bullet. It can face the common pitfalls of unsupervised learning methods: the lack of standard success metrics, the demand for computing power, and above all, understanding of the business problem at hand.

To date, TDA remains an under-utilised analytics method in financial services. To fully unleash its power and versatility, banks must be start with a clear understanding of the business challenge at hand – and give machines the right “exam questions”.

@p_f_g - Parker Fitzgerald