"NoPeek" to Prevent Reconstruction of Raw Data in Distributed Deep Learning

15 March 2025

Resource 5: NoPeek Paper

Demonstrates that minimizing distance correlation between raw data and intermediary representations reduces leakage of sensitive raw data patterns across client communications while maintaining model accuracy.

Leakage Invertibility/reconstruction of raw data from intermediary representation

The solution prevent such reconstruction of raw data while maintaining information required to sustain good classification accuracies. The approach is based on minimizing a statistical dependency measure called distance correlation.

Distance Correlation A powerful measure of non-linear (and linear) statistical dependence between random variables.

In worst-case reconstruction attack settings, the attacker has access to a leaked subset of samples of training data along with corresponding transformed activations ata chosen layer, the outputs of which are always exposed to other client/server by design for the distributed learning of the deep learning network to be possible.

Before applying NoPeek

successful-reconstruction-attack

After applying NoPeek

failed-reconstruction-attack

Two popular distributed learning settings where this attack is highly relevant:

Moreover, model extraction, model inversion, malicious training, adversarial examples (evasion attacks) and membership inference etc.

Existing Solutions

Method

solution-method

The key idea of the proposed method is to reduce information leakage by adding an additional loss term (distance correlation) to the commonly used classification loss term of categorical crossentropy.

Reconstruction Attack Testbed

reconstruction-attack-testbed

Privacy-Utility Tradeoff on UTKFace

tradeoff-utk

We show l2 error of reconstruction of a baseline strategy of adding uniform noise (in red) to activations of the layer being protected. This results in a model of no classififcation utility (performs at chance accuracy) albeit while preventing reconstruction. Our NoPeek approach (in blue) attains a much greater classfication accuracy for the downstream task ( 0.82) compared to adding uniform noise ( chance accuracy) while still preventing reconstruction of raw data. This is compared to regular training, that does not prevent the reconstruction (in green).

utk-results

diabetic-retinopathy-results