What Is Federated Learning? How It Works & Use Cases

Technologies

Federated Learning

PLAY PAUSE

0:00

PLAY PAUSE

A New Horizon for Confidential Machine Learning

Let's first consider the most common case of machine learning today. There is a certain entity that possesses enough data to train a machine learning model.

There is a certain entity that possesses enough data to train a machine learning model.

The model can be arbitrary, ranging from deep neural networks to linear regression.

The interaction of the model with the data generates a solution to a practical problem, such as object detection, audio transcription, and so on.

In reality, the data that the model has to work with does not originate from the machine on which the training takes place; they are created elsewhere.

That is, to conduct the analysis, data from various sources must be collected on some central server, e.g., in the cloud.

It is easy to imagine a situation where it is not possible to transfer data from each source to the server, for example, due to the following reasons:

Reluctance of parties to share their confidential information with a third party in an open form

The combined volume of data exceeds the capacity of the central storage

Regulatory restrictions: GDPR (Europe), CCPA (California), PIPEDA (Canada), LGPD (Brazil), and others protect sensitive data from being transferred

The local data of each party individually is insufficient for training a high-quality model. Thus, we have arrived at the challenge that led to the emergence of the innovative concept of Federated Learning.

Federated Learning (FL) is a machine learning paradigm that enables the training of a global model by clients without sharing their local data.

This approach to machine learning not only solves data privacy issues, but also opens new horizons for developing secure and efficient models.

Centralized ML moves data to perform computations

FL vs Centralized ML

FL moves computations to the data

The Basic Principles Of FL Step By Step

Step 0

Clients agree on the global model, loss function, and data preprocessing procedures. The central server initializes the global model either randomly or using a pre-trained checkpoint.

Step 1

The server distributes the parameters of the global model to connected clients. It's important to note that each client starts training using the same model parameters.

Step 2

Clients perform local training of model instances on their own data. Local training ranges from a few steps to several epochs depending on initial agreements.

Step 3

After local training, clients have models with differing parameters due to variations in their local datasets. Clients return these model parameters to the central server.

Step 4

The server receives the model parameters from clients and aggregates them to update the parameters of the global version.

There are various approaches to parameter aggregation, but the most popular method is FedAvg (Federated Averaging), where the received parameters are averaged in a weighted manner according to the sizes of the local datasets.

Step 5

Steps 1-4 constitute one round of the FL, which is repeated until the model converges.

It is important to note that the data itself remains local and only model updates are transmitted to the centralized server and other devices. This approach preserves data privacy by avoiding the need for centralized accumulation.

Like any other technology, the FL, while solving the intended problem, has both positive and negative aspects.

The positive aspects of the FL procedure

Data Confidentiality

Due to the absence of data transmission, FL minimizes the risk of leaks or unauthorized access to confidential information.

Scalability

This approach allows for efficient processing of large volumes of data and scaling to a large number of devices without significantly increasing network load or computational resources.

Distributed Structure

Local copies of models and data distribution among clients help minimize vulnerabilities associated with server failures.

Efficient Resource Utilization

FL enables significant parallelization of the global model training process, eliminating the need for GPU-equipped central servers.

Reduction of Data Drift Effect

The path from data source to model becomes shorter, reducing the likelihood of data becoming outdated or distorted.

The complexities brought by learning technology

Coordination Complexity

Managing the training process with multiple clients requires a complex system of coordination and agreement, which can complicate system deployment and support.

Data Consistency Issues

Differences in client data sets can lead to inconsistencies in models or loss of commonality in the aggregated model.

Computational Constraints

Computational resources on user devices may be limited, complicating the training of complex models or requiring additional algorithm optimization.

Security Threats

The possibility of attacks on individual devices or servers storing data or model updates necessitates increased attention to cybersecurity and fraud protection.

Need for Cooperation

The difficulty in finding identical data owners willing to solve a similar practical problem.

Nevertheless, FL has found wide application in various fields where sensitive data processing is required, such as medicine (analyzing medical images and patient data), financial services (transaction analysis and fraud detection) and the Internet of Things (processing data from sensors and smart devices).

Thus, FL represents an important technological innovation in the field of PPML (Privacy Preserving Machine Learning), capable of reshaping the landscape of machine learning by making it safer and more accessible across various sectors of the economy.

This approach not only safeguards data confidentiality but also promotes the development of new methods for processing and analyzing information while preserving privacy.

However, successful implementation requires careful consideration and management of the shortcomings and challenges inherent to this method.

What is federated learning?

Federated Learning (FL) is a machine learning paradigm where multiple clients — devices or organizations — collaboratively train a shared global model without sending their raw data to a central server. Each client trains a local copy on its own data, then transmits only the updated model parameters back to an aggregator, which combines them into a new global version. The data itself never leaves the device or institution that owns it, which is why FL is foundational to privacy-preserving machine learning (PPML).

How does federated learning work step by step?

A standard FL round has six steps.
Step 0: clients agree on the global model architecture, loss function, and preprocessing; the server initializes weights.
Step 1: server distributes the current global model to all clients.
Step 2: each client trains the model locally on its own data for a few steps or epochs.
Step 3: clients return updated parameters to the server.
Step 4: server aggregates updates (typically via FedAvg) into a new global model.
Step 5: steps 1–4 repeat until convergence.

What is FedAvg (Federated Averaging)?

FedAvg, short for Federated Averaging, is the standard aggregation algorithm in federated learning. After each client trains locally and returns its updated model parameters, the server computes a weighted average of all client updates — weights proportional to each client's local dataset size. The result becomes the new global model. FedAvg was introduced by McMahan et al. (Google, 2017) and remains the baseline aggregation method against which most FL research is compared.

What are the advantages and disadvantages of federated learning?

Advantages: raw data never leaves the device (privacy); scales to large device fleets without proportional server load; resilient to single-server failures due to distributed copies; parallelizes training across many endpoints without central GPU hardware; reduces data drift by training closer to the source.
Disadvantages: coordination across many clients is complex; non-IID data across clients causes inconsistencies; user devices have limited compute; the protocol introduces new attack surfaces; finding partners with identical data schemas willing to collaborate is non-trivial.

How does federated learning protect data privacy?

FL keeps raw data on each client's device and only transmits model parameters (gradients or weights). This avoids the central honeypot risk that comes with traditional ML pipelines. However, FL alone is not sufficient — research has shown that model updates can leak information about the underlying data through gradient-inversion attacks. Production FL combines the no-raw-data property with cryptographic protection: secure aggregation, homomorphic encryption, differential privacy, or trusted execution environments (TEEs).

What regulations make federated learning relevant?

FL is increasingly relevant under data protection laws that restrict cross-border or cross-organizational transfer of personal information. Notable examples: GDPR (European Union), CCPA (California), PIPEDA (Canada), LGPD (Brazil), and Russia's Federal Law 152-FZ on Personal Data. These regimes require organizations to minimize data collection and prevent unauthorized transfer — federated learning satisfies both by design, since training data stays within the regulatory perimeter where it was collected.

What is PPML (privacy-preserving machine learning)?

PPML stands for Privacy-Preserving Machine Learning — an umbrella term for techniques that train and run ML models while keeping training data, inference queries, or model parameters confidential from one or more parties. Federated learning is one PPML technique. Other common PPML techniques include homomorphic encryption (computation on encrypted data), differential privacy (statistical noise injection), secure multi-party computation, and trusted execution environments. Production privacy-preserving ML systems typically combine several PPML techniques.

How does Guardora implement federated learning?

Guardora builds commercial federated learning infrastructure with two products: Guardora VFL (vertical federated learning for two-party scenarios — bank and analytics vendor, hospital and wearable maker) and Guardora FFT (federated fine-tuning for adapting large models on distributed sensitive data). Both products combine the no-raw-data property of FL with cryptographic protection: Paillier homomorphic encryption (1024-bit) for gradient confidentiality, encrypted gRPC for communication, and validated performance on real workloads such as credit scoring (ROC AUC ≈ 71.3 on GBDT, 300K records trained in under 9 minutes).