Large Language Models and the "Stranger-on-a-Train" Phenomenon

Psychologist Zick Rubin coined the term "stranger-on-a-train" phenomenon to describe the tendency of people to share personal information with strangers they are unlikely to meet again.

Could something similar be happening with Large Language Models (LLMs)?

Corporate information security is increasingly concerned about the growing number of cases where employees share sensitive data with ChatGPT, Gemini, Copilot, and other less-known large language models.

Lawyers create summaries of contracts containing counterparty data, technical specifications, prices, and special terms. Programmers edit code critical to system architectures. Translators work with texts full of sensitive information.

On the other hand, companies are trying to integrate AI assistants and replace traditional machine learning models with LLMs.

Here are some use cases we've encountered at Guardora:

Listening to sales team recordings to score and analyze their interactions with clients.
Understanding the tone of client interactions in real-time. For example, if a client becomes more irritable, the system can switch them to a higher-ranking specialist who can address their concerns and defuse negative sentiments.
Clustering users based on interests and other criteria.
Creating document search and analysis systems with tasks like summarization, keyword exclusion, and personal data protection.
Generating text, images, music, and other media using sources with unique proprietary content.
Localizing videos from the original language to others.
Developing new product clusters—assistants or copilots—that support development teams, enhancing productivity and reducing time-to-market. They minimize errors and handle tasks that are often disliked (e.g., tests, documentation, training juniors). These tools primarily focus on generating, verifying, and debugging code across various tests.

However, the overarching challenge in this vast market is using LLMs while ensuring the data remains accessible only to the data owner. How can we protect data from third parties, cloud service providers, or malicious actors throughout its transmission, storage, model training, quality verification, and result retrieval?

This challenge is precisely what Guardora aims to address. Our solutions already ensure data security for some use cases throughout its entire journey, including ML model training, quality checks, inference, and sometimes even safeguarding the model itself as an intellectual property.

At Guardora, we focus on ensuring the confidentiality of requests to ML models and we want to do the same with LLMs and generative models.

Here is a list of current challenges we need to solve before launching our first prototypes:

Safeguarding the confidentiality of training datasets during the initial training of LLMs and generative models.
Preventing unauthorized third-party usage of LLMs and generative models trained from scratch.
Protecting the confidentiality of data used to fine-tune pre-trained LLMs and generative models.
Securing the confidentiality of queries submitted to LLMs and generative models.

If this topic interests you as a user or developer, join our Discord community and participate in the discussion of these pressing issues.

Large Language Models and the "Stranger-on-a-Train" Phenomenon

Latest Articles

Differential Privacy for Federated Machine Learning: Meet Noise-to-Noise

Federated Learning: Creating a Symphony of Cross-Platform Solutions

Federated Learning in Advertising