"NoPeek" to Prevent Reconstruction of Raw Data in Distributed Deep Learning

Resource 5: NoPeek Paper

Demonstrates that minimizing distance correlation between raw data and intermediary representations reduces leakage of sensitive raw data patterns across client communications while maintaining model accuracy.

Leakage Invertibility/reconstruction of raw data from intermediary representation

The solution prevent such reconstruction of raw data while maintaining information required to sustain good classification accuracies. The approach is based on minimizing a statistical dependency measure called distance correlation.

Distance Correlation A powerful measure of non-linear (and linear) statistical dependence between random variables.

Pearson’s Correlation only captures linear relationships. Can’t capture non-linear relationships between variables.
But Distance Correlation is able to capture both linear and non-linear relationships.
Value between 0 to 1.

In worst-case reconstruction attack settings, the attacker has access to a leaked subset of samples of training data along with corresponding transformed activations ata chosen layer, the outputs of which are always exposed to other client/server by design for the distributed learning of the deep learning network to be possible.

Before applying NoPeek

successful-reconstruction-attack

After applying NoPeek

failed-reconstruction-attack

Two popular distributed learning settings where this attack is highly relevant:

Split Learning
Adversarial Reconstruction (Server side insider threat)

Moreover, model extraction, model inversion, malicious training, adversarial examples (evasion attacks) and membership inference etc.

Existing Solutions

Deep learning, adversarial learning and information theoretic loss based privacy

The proposed solution is not necessarily tied to a generative adversarial network (GAN) styled architecture where two separate models have to be trained in tandem. The proposed model is based on a easily implementable differentiable loss function between the intermediate activations and the raw data.
Homomorphic encryption and secure multi-party computation for computer vision

HE and MPC techniqes although highly secure are not computationally scalable and communication efficient for complex tasks like training large deep learning models.

The proposed method on the other hand is communication efficient and highly scalable with regards to large deep learning achitectures.
Differential privacy for computer vision

These methods typically take a stronger hit on accuracy of deep learning models although at the benefit of attempting to provide worst case privacy guarantees for membership inference attacks

Method

solution-method

The key idea of the proposed method is to reduce information leakage by adding an additional loss term (distance correlation) to the commonly used classification loss term of categorical crossentropy.

Reconstruction Attack Testbed

reconstruction-attack-testbed

CIFAR10
UTKFace
Diabetic retinopathy severity detection method

Privacy-Utility Tradeoff on UTKFace

tradeoff-utk

We show l2 error of reconstruction of a baseline strategy of adding uniform noise (in red) to activations of the layer being protected. This results in a model of no classififcation utility (performs at chance accuracy) albeit while preventing reconstruction. Our NoPeek approach (in blue) attains a much greater classfication accuracy for the downstream task ( 0.82) compared to adding uniform noise ( chance accuracy) while still preventing reconstruction of raw data. This is compared to regular training, that does not prevent the reconstruction (in green).

utk-results

diabetic-retinopathy-results