FL Explainer

Banking
and Insurance

PLAY PAUSE
0:00
/
PLAY PAUSE

Federated Machine Learning for Scoring in BFSI

Industry Domains:

Banking, Financial Services, Insurance

Technique:

Vertical Federated Learning (VFL)

Data type:

Tabular (structured) data

ML-model:

Decision Trees (XGBoost)

ML-tasks:

Credit scoring and fraud detection

Challenges:

  • Data privacy (names, addresses, transactions)
  • Data fragmentation between companies (banks, insurance companies)
  • Regulatory restrictions on data transfer.

Customers

A portrait of Federated Learning users includes representatives from the banking and insurance sectors, who use models for risk prediction, borrower assessment, and fraud prevention.

Among the common attributes of Customers are:
  • High demand for compliance with data protection legislation

  • Need for integration of diverse data sources

  • Use of data for predictive analytics and credit assessments

Challenge

The main challenge is training scoring models using confidential data from various sources, such as banks and insurance companies, without disclosing the data itself. The tasks include protecting data from leaks, ensuring anonymity, and complying with privacy laws.

Federated Machine Learning is used to build credit scoring and risk assessment models without revealing personal data.

The issues include:

  • The need to protect sensitive data (transaction history, personal information),
  • Legislative restrictions on data sharing between banks and insurance companies,
  • Mitigating threats such as inversion attacks and data poisoning.

Federated Learning for minimum 2 participants

The data does not leave the owner's perimeter. The ML specialist and resources are within the data owner's perimeter

Secured synchronization of local and global model parameters with the server

Final model quality assessment

Model application within the data owner's perimeter and interpretation of the obtained results

Solution

To enable the extraction of knowledge from the data of both participants, a vertical federated learning infrastructure was required.

The nature of the original data determined the choice of target model as gradient boosting based on decision trees using the XGBoost implementation.

The side with the target class labels is referred to as the Server Side, while the side without targets is called the Client Side.

For the public demonstration of results, a dataset was used that includes:

  1. Bank customer data with assigned scoring levels: low, standard, high.
  2. Auto insurance data
  • Low: indicates a high risk for lenders. Repayment of the loan may be difficult or result in higher interest rates.
  • Standard: this is an acceptable rating that indicates some risk. Generally allows for loans to be repaid under normal conditions.
  • High: indicates a low risk for lenders. Individuals with this rating can expect better lending terms.

Banking data is on the Server Side, totaling 78,806 records, each containing 12 feature descriptions of a person. Auto insurance data is on the Client Side, with 97,224 records and 9 features for each person.

Each dataset contains an ID field, enabling the matching of data related to the same individual. Each person from the intersection of datasets is described by 21 features, split between the two sides.

Part of the intersecting data was set aside as a test set of 25,668 records, with the rest used for training.

Two training cycles of an XGBoost model were conducted:

  • Local training, where the Server Side trained a classifier using only its own data.
  • Vertical Federated Learning, utilizing data from both sides to predict credit ratings.

For both cases, identical model parameters were set:

'objective': 'multi:softmax''num_class': 3
'eval_metric': 'merror''max_depth': 6
learning_rate': 0.1'subsample': 0.8

The result of testing the local model trained only on data from the Server Side:
Accuracy: 0.817

The result of testing the global model, trained on data from both the Server and Client Sides:
Accuracy: 0.975

The profit from using this approach compared to the local model's capabilities:

From this matrix, it can be seen that, for example, the number of test samples with a low credit score, but classified by the trained model as high rating, decreased by 92.66% when using the global federated learning model.

It’s worth noting that the distributed training process takes longer than centralized training. The graphs show the time required to train the model with a specified number of trees using CPU and GPU.

Despite the significant time costs, the high convergence speed of the model allows VFL to remain a practically valuable method for generalizing information from accumulated data.

Results

An increase in the accuracy of credit scoring models by more than 15%

A reduction in the risk of loan defaults and fraud.

An increase in the prediction of customer churn and improvement of customer experience.

A scoring model was trained based on data from multiple sources (bank and insurance) while fully preserving confidentiality.

How can banks and insurance companies build credit scoring models together without sharing customer data?
Banks and insurance companies hold complementary data about the same customers — banks have payment history and account behavior; insurers have claims, policy, and asset data. Combining these views produces stronger credit scoring models, but data-sharing agreements are restricted by privacy regulations (GDPR, CCPA, 152-FZ) and competitive concerns. Vertical Federated Learning (VFL) solves this: each party keeps its raw data on-premise, and only encrypted intermediate computations cross the network during training. The resulting joint model captures information from both feature sets without either party seeing the other's records.
What is vertical federated learning (VFL) for credit scoring?
VFL is a federated learning variant where parties hold different features about the same customers (versus horizontal FL where parties hold the same features about different customers). For BFSI credit scoring: the bank holds the target variable (creditworthiness label) and one feature set (transactions, account history); the insurer holds a complementary feature set (claims, policies, vehicle data). Customer IDs are aligned via secure set intersection without exposing identities. Training exchanges gradients (optionally encrypted with Paillier homomorphic encryption), and the resulting model uses both feature sets in production.
How much accuracy improvement does VFL give over local-only training in BFSI?
On the public-demonstration dataset used in this case study (78,806 bank records with 12 features + 97,224 insurance records with 9 features, classifying credit into Low / Standard / High), an XGBoost classifier scored 0.817 test accuracy when trained only on bank data and 0.975 when trained jointly via VFL — a 15.8 percentage-point absolute gain. More importantly, the rate of catastrophic misclassification (low-score customers labeled high) dropped by 92.66% — that is the operationally meaningful improvement, since these are the cases that produce defaults.
What machine learning models work best for federated credit scoring?
Gradient-boosted decision trees (XGBoost, LightGBM, CatBoost) dominate production credit scoring because they handle tabular features, missing values, and feature interactions well, and they're interpretable enough for regulatory review. The case study used XGBoost with hyperparameters: `objective='multi:softmax'`, `num_class=3`, `eval_metric='merror'`, `max_depth=6`, `learning_rate=0.1`, `subsample=0.8`. These are standard moderate-depth settings — the federated variant produces a model with the same hyperparameters but uses both parties' feature sets during tree splits.
How is data privacy maintained during federated training in banking?
Three protections combine.
Raw data isolation: each party trains on its own server inside its own perimeter; the bank's records and the insurer's records never leave their respective owners.
Encrypted gradient exchange: optionally, gradients are encrypted using Paillier (1024-bit) additive homomorphic encryption, so the receiving party cannot reconstruct individual training examples from gradient values.
Secure set intersection for ID alignment ensures the lists of common customers are matched without exposing the full customer lists of either party.
What regulations make federated learning relevant for banking and insurance?
Several frameworks restrict cross-organization data transfer in financial services.
GDPR (EU) requires data minimization and lawful basis for transfer.
GLBA (US, Gramm-Leach-Bliley Act) restricts non-public personal information sharing between financial institutions.
PSD2 (EU) requires consent for account data sharing.
152-FZ (Russia) restricts personal data transfer between operators.
CCPA / CPRA (California) grants consumer rights to limit data sharing. Federated learning is compliant by design under all of these because raw personal data stays inside the regulatory perimeter where it was originally collected.
Is VFL slower than centralized training, and is that acceptable for production?
Yes — VFL training takes longer than centralized training because of the cryptographic protocol overhead and the network round-trips between parties for each tree split. On the case study workload, distributed training was meaningfully slower than centralized, but model convergence speed (number of trees needed to reach a quality plateau) was comparable. For credit scoring use cases, model retraining typically happens weekly or monthly — so a multi-hour training pipeline is fully acceptable when production inference (real-time scoring) runs at standard XGBoost speeds.
How does Guardora support credit scoring and fraud detection use cases?
Guardora provides Guardora VFL — a production platform for two-party vertical federated learning in tabular ML scenarios. The platform handles ID alignment via secure intersection, training coordination, optional Paillier homomorphic encryption for gradient protection, and inference serving. Supported models: gradient-boosted trees (XGBoost, GBDT), logistic regression, and other tabular algorithms. Tested workloads include the case shown on this page (BFSI credit scoring, 0.817 → 0.975 accuracy) and a related credit-scoring case where Guardora VFL matched the quality of a stacking ensemble (ROC AUC ≈ 71.3 on 300K records) that requires label transfer.