Membership Inference Attacks Against Machine Learning Models

1 January 2025 | S.M.Mehrabul Islam

From a privacy point-of-view, whether a data sample (image, text, audio etc) has been used in a model’s training or not (supervised learning), is a sensitive information in many cases. Infering/extracting this specific information, is what we are referring to as Membership Inference.

Membership inference can be used both to support or to undermine privacy.

Published in 2017, "Membership Inference Attacks Against Machine Learning Models" was the first ever work of its kind. So, many of the aspects of this work, are subject to get updated by now (1 January, 2025) due to the rapidly evolving AI landscape.
But this paper can still offer us some foundational knowledge that could help us better navigate through the field.

Reference Paper


What is Membership Inference Attack?

Membership Inference Attack is one of the most popular privacy attacks on AI, that tries to determine whether an input sample was part of the model’s training dataset or not.

The authors have tried to consider the worst case scenario from an attacker’s perspective to prove the legitimacy of the attack.

Assumptions

Objective

Attacking black-box models is more complex than attacking white-box models whose structure and parameters are known to the adversary. The authors evaluated their techniques against the popular MLaaS (Machine Learning as a Service) platforms which provide black box access to machine learning services via API calls.


Approach

The key observation that guided the authors was that machine learning models often behave differently on the data that they were trained on versus the data that they “see” for the first time.

So, they tried to use machine learning to detect this difference in behavior of the target model. Thus, converting the Membership Inference Problem into a Binary Classification Problem by using Adversarial Machine Learning.

So, the authors introduced a novel Shadow Training technique that enables training the attack model on proxy targets for which the training dataset is known.


Shadow Training (Key Innovation)

Shadow Training

  1. Create multiple Shadow Models to imitate the behavior of the Target Model

  2. Generate training data for the Shadow Models

    An adversary can generate synthetic training data using the Target Model (Model-based Synthesis) or statistics about the underlying population (Statistic-based Synthesis). Or he/she might have access to a potentially noisy version of the Target Model’s training dataset (Real-world Data).

    In this work, the authors have evaluated on the worst case scenario where the shadow training dataset is disjoint from the target model’s training dataset. But in real world, they will have some overlap between them as the adversary will have some idea about the population of the data. Thus, may result in a better attack performance.

  3. Train (Supervised) the Shadow Models using the generated data

    Must be similar to the target model’s training. It’s feasible when the target model’s training algorithm and model structure is known. Or in case of MLaaS (Mahine Learning as a Service), the attacker can use exactly the same service (e.g., Google Prediction API) to train the shadow model as was used to train the target model.

  4. Train (Supervised) the Attack Model on the labeled inputs and outputs of the shadow models

    To recognize differences in shadow models’ behavior when these models operate on inputs from their own training datasets versus inputs they did not encounter during training.

Attack Model Training

Attack Model Training

This binary classification task can be performed using any state-of-the-art machine learning framework or service.

The proposed techniques are generic and not based on any particular dataset or model type.

Evaluation

Datasets

CIFAR-10, CIFAR-100, Purchase Dataset (based on Kaggle’s acquire valued shoppers), Location Dataset (from the publicly available set of mobile users’ location “check-ins”), Texas Hospital Stays Dataset, MNIST, UCI Adult (Census Income)

Target Models

Google Prediction API, Amazon ML, Locally Implemented Neural Network

Conditions

Key Findings

  1. The accuracy of the attack varied considerably for different classes due to the difference in size and composition of the training data belonging to each class and highly depends on the dataset.
    Precision By Classes
  2. Attack Accuracy on Purchase Dataset
    Purchase Dataset Result
  3. Effect of Shadow Training Data
    • Even with noisy version of the real data (disjoint from the target model’s training dataset but sampled from the same population), the attack still outperforms the baseline.
      Thus, providing robustness against the attacker's inaccurate assumption about the distribution of the target model's training data.
    • The accuracy of the attack using marginal-based (statistic-based) synthetic data is noticeably reduced versus real data, but is nevertheless very high for most classes.
    • The attack using model-based synthetic data exhibits dual behavior. For most classes its precision is high and close to the attacks that use real data for shadow training, but for a few classes precision is very low (less than 0.1). The reason behind that is, those classes are under-represented in the target model's training dataset.
      Thus, the attacker need to generate inputs that are classified by the target model with high confidence.
    Real Data vs Noisy Data Effect of Shadow Training Data
  4. Effect of the number of classes and training data per class
    • More classes means more signals about the internal state of the model that are available to the attacker.
      In other words, models with more output classes need to remember more about their training data. Thus, they leak more information resulting in higher attack accuracy.
    • The more data that is associated with a given class, the lower the attack precision for that class.
    Effect of Number of Classes
  5. Effect of Overfitting
    • Overfitting is not the only factor that causes a model to be vulnerable to membership inference. The structure and type of the model also contribute to the problem.
    • For a specific type of model, the more overfitted a model, the more it leaks.
    Effect of Overfitting

Mitigation Strategies

The authors have discussed about the following mitigation strategies:

Conclusion