Nothing Personal, It’s Just Insurance

Confidential computing for insurance

How can you offer the best policy without knowing anything personal about the client?

Long before SaaS, "indulgence memberships" to fitness clubs, and the widespread use of business models based on compound interest, insurance was invented.

Insurance, as one ad said, is a thing that is better to have and not need, than to need and not have.

Briefly and cynically, the essence of the insurance business is to collect more money in the form of premiums than to pay out in the form of compensation, by predicting risks. This does not deny the usefulness of insurance services in individual cases.

The more accurate the forecasts, the more favorable conditions you can offer to clients and minimize your risks. That is, the profit of insurance companies depends on the accuracy of predictions. And where there are predictions, there are machine learning models, which need more data to be accurate.

Including data that each individual might prefer to keep secret for various reasons, such as:

medical history (tests for toxoplasmosis and a tendency to take unnecessary risks),
hobbies (statistics on parachute jumps and equipment rentals over the past year),
travel (history of places visited from traditional resorts to flashpoint areas).

You can also find a lot of interesting things from commercial clients:

real data on hazardous industries and the associated environment,
statistics on accidents,
staff turnover.

All this data helps to create more accurate risk models and make the most accurate predictions.

In fact, various state laws are aimed at preserving the confidentiality of data. And in general, it seems that since the emergence of such a concept as personal data, their list has only expanded.

So, here is the task we have:

Insurance companies need to do better scoring based on their clients' data and data attracted from external sources.
Personal, confidential, or sensitive data is restricted in free circulation for various reasons.

Questions:

How to make accurate ML models if there is not enough data?
How to work with sensitive data, once it has fallen into the hands of insurers, without the risk of sanctions, leaks, attacks, or competitive espionage?
How to attract partner data without breaking the law?
How to send existing data for computing in third-party clouds and not bear the risk of losses?
In the end, how to really protect the model (the fruit of intellectual labor) and monetize it without fear of inversion attacks?

At Guardora, we encountered several cases mentioned below where the insurers needed to protect data and then train ML algorithms on it.

Offering customers more personalized insurance plans.
Processing customer medical data to improve the accuracy of risk assessment without compromising personal information.
Integrating data from multiple sources to create a comprehensive scoring model that takes into account both internal and external factors.
Preventing fraud and abuse when submitting reimbursement claims.
Working with insurance claim history to create a model for predicting the probability of an insured event.
Exchanging data between insurance companies to improve models without violating customer privacy.
Processing data on customers´ behavior (e.g. via sensors or apps) while maintaining their anonymity.
Pricing.

The following Privacy Preserving Machine Learning methods, protocols, and approaches are most commonly mentioned in the context of insurance and AI.

Technique	Description
Federated learning	Allows training a distributed model without data transfer. Data does not leave the client loop.
Homomorphic encryption	The model is trained on encrypted data without decrypting it. The data is always under reliable cryptographic protection.
Secure multi-party computation	Allows participants to train a model together without revealing their data to each other. The data remains secure as no participant has access to the other’s data.
Differential privacy	The data is used in training the model with noise added, preventing subsequent identification of individual records.

If it's so clear, what are the implementation challenges?

There are some problems on the way to widespread use of these technologies:

Lack of qualified specialists and wide awareness of such technologies.
Difficulty in integrating new methods into existing infrastructures and processes.
High costs of implementing and maintaining data protection technologies.
Scalability issues: technologies can require significant computing resources as data volumes increase.
Uncertainty of regulatory requirements in different jurisdictions complicates compliance.
The lack of universal standards and practices in the field of data protection leads to disparate solutions.

Join our community on Discord to discuss more specific use cases from the insurance industry, the combination of different methods and protocols to enhance privacy in machine learning, and meet Privacy Preserving Machine Learning enthusiasts from around the world.

Nothing Personal, It’s Just Insurance

Latest Articles

Large Language Models and the "Stranger-on-a-Train" Phenomenon

Medical Confidentiality and Data Privacy in Machine Learning

Swarm Intelligence and Privacy Enhancing Technologies