Nor does it begin with the fact that GDPR fines alone exceed $1 billion annually [2]. It starts, for example, when the bank discovers advertisements in the dark web selling access to its customers' personal accounts.
This is just one scenario indicating that something had gone wrong much earlier in the process.
Mules, drops, money laundering networks used to obscure the flow of funds between banks, financing terrorism, and other illegal activities are secondary issues.
The primary issue lies in the fact that the entire system and the KYC (Know Your Customer) processes within the bank allow bad actors to open accounts more cheaply than they could with competitors. This happens for various reasons: insider threats, bank policies, process vulnerabilities, and more. The opening of accounts then leads to AML (Anti-Money Laundering) cases, along with transit and cash-out activities.
However, only a minority of banks act wisely by employing Defensive Security and advanced AI-driven monitoring systems.
Is this due to objective reasons? Let’s explore this further.
In this post, we'll examine the challenges banks face in this context and how Federated Machine Learning can offer them solutions.
At Guardora, we've encountered several cases where Privacy-Preserving Machine Learning methods have been highly relevant in the BFSI (Banking, Financial Services, and Insurance) sector:
- Preventing financial crimes.
- Real-time fraud detection and monitoring (e.g., enhancing the detection of criminal networks by reducing false positives).
- Investigating financial crimes and tracing the monetary flow of all types of illicit activities.
- Customer risk scoring across various dimensions (credit risk, sanctions, legal, reputational, blacklist presence, and more).
- Verifying transaction details against high-risk lists across different financial institutions without revealing any user or transaction data.
- Real-time detection of suspicious transactions (anomaly detection).
All these cases are further complicated by the rapid speed of transactions, the advancement of new technologies accessible to criminals, the lack of virtual borders between countries, and, most critically, the absence of synchronization and the conflicting interests among all the legitimate participants in these processes.
Is the usefulness of a meeting inversely proportional to the number of participants?
When it comes to combining data for training shared ML models, the number of stakeholders extends far beyond just a few banks.
Within a bank's control framework, we find:
- Individuals (bank customers) - Their attitudes range from complete indifference to extreme conservative conspiracy theories. On average, they expect banks to act responsibly (protecting data from bad actors) and not irresponsibly (exposing their personal confidential data to risks).
- Legal entities (corporate clients) - They fear their data might end up with competitors and struggle to see how they would benefit from the process.
- Chief Information Security Officers (CISOs) - They believe that if you can’t destroy something, you don’t control it, and thus prefer not to share data at all.
- Heads of KYC/KYB - They must constantly balance the efficiency of their processes with the inflow of new clients.
- Chief AML Officers - They believe that any data sharing must be strictly regulated and adhere to international standards. Their primary concern is data breaches, which could endanger not only customers but the bank itself, exposing it to fines and regulatory sanctions.
- Chief Compliance Officers - They oversee the adherence to all legal and regulatory requirements. They emphasize the importance of complying with data protection and privacy laws across different jurisdictions. Their main concern is how data sharing will impact the bank’s reputation and its ability to follow rules and regulations, avoiding fines and litigation.
In a bank’s blind spot are:
- Financial institutions and services of varying sizes - Some struggle to see why they should be included at the table with major players, while others don’t understand how they could benefit from the participation of smaller datasets. How can the value of transaction data from a payment network be compared to account data from banks to achieve a collective benefit?
- Antitrust authorities - They might not distinguish between a collaboration of banks training ML models together and a cartel agreement.
- Law enforcement agencies - They are concerned that AI might prematurely alert criminals under investigation, potentially disrupting ongoing operations that have taken months to develop. Additionally, they often seek backdoor access to all solutions.
- Financial regulators and their rules - These vary from country to country, often conservatively imposing bans and additional requirements on financial institutions. A common example is the prohibition on processing data outside the country or using cloud services.
- Regulators like GDPR and EDPB (Europe), CCPA (California), PIPEDA (Canada), LGPD (Brazil), and others - They impose strict requirements on data protection and privacy, complicating cross-border data sharing and processing. Their demands for informed consent, the right to be forgotten, and transparency in data processing compel banks to approach any data-sharing initiatives with caution, ensuring compliance with national and international standards to avoid severe fines and reputational damage.
As we can see, each party justifiably pursues its own interests, but they all share a common concern: data sharing.
But how can we harness the full power of modern AI, creating sophisticated models capable of detecting the subtlest transaction anomalies and audacious schemes by criminals, without actually transferring the data?
What if, instead of bringing data to the computations, we bring the computations to the data? This is exactly how Federated Learning works.
Federated Learning (FL) is an ML paradigm that enables the training of a global model across multiple financial institutions without the need to share local data.
FL allows training on distributed datasets without exchanging the raw data between participating parties.
In brief, Federated Learning enables organizations to:
- Train models locally and then combine them without transferring the data itself.
- Unlock the benefits of data sharing without moving data outside the organization or individual’s device.
- Enhance security by eliminating the need for a centralized data repository, which could be a prime target for hackers.
- Apply this technique for decentralized learning either within a single organization or across different organizations.
- Develop a more robust model that is more effective than one trained on limited local data alone.
Introductory technical articles on Federated Learning and other Privacy-Preserving Machine Learning approaches can be found in the Technology section of our website.
To mitigate the limitations of Federated Learning, additional techniques and protocols can be used in conjunction, such as:
- Fully Homomorphic Encryption
- Secure Multi-Party Computation
- Synthetic Data
- Differential Privacy
- Zero-Knowledge Proofs
It's often said that evil is a parasite of good, and the more beneficial technologies we create, the more opportunities there will be to misuse them. In addition, the widespread adoption of Federated Learning may face the following challenges that need to be addressed:
- Data processing speed
- Computational costs
- Regulatory frameworks
- Standardization and certification
- Countering new attack technologies
- Scalability
- Development of Plug & Play software
Does this mean we should give up on practical applications and stick only to theory?
Perhaps it makes sense to start implementing Federated Learning with use cases unrelated to financial crimes, and limit the number of participants to legal entities within the same corporate group, such as:
- Forecasting scenarios based on investment outcomes.
- Analyzing transactional activity of bank clients for more effective targeting of banking products.
- Verifying ownership of financial assets.
- Understanding how clients interact with products overall.
- Checking and confirming the consistency of passwords and PIN codes across different banking applications.
- Validating the representativeness of datasets from different participants when testing hypotheses on the potential for collaboration in predictive machine learning.
- Developing new private banking products with verifiable rights to be forgotten or one-time use of confidential data.
- Sharing information with departments offering accounts receivable management services.
- Determining queue lengths in bank branches using geolocation data from visiting clients.
The eternal battle between armor and the projectile continues.
If malicious actors can freely combine datasets from various sources to carry out business fraud, identity theft, launder criminal proceeds, evade sanctions, and finance other illegal activities, then our defenses must also leverage the full advantages of collaborative machine learning based on shared datasets.
If this topic interests you as a user, developer, or Privacy Enhancing Technologies enthusiast join our Discord community and participate in discussing these pressing issues.
References