Under review as a conference paper at ICLR 2017 SEMI-SUPERVISED KNOWLEDGE TRANSFER FOR DEEP LEARNING FROM PRIVATE TRAINING DATA Nicolas Papernot Pennsylvania State University

[email protected] Mart′?n Abadi Google Brain

[email protected] ′Ulfar Erlingsson Google

[email protected] Ian Goodfellow OpenAI ian

[email protected] Kunal Talwar Google Brain

[email protected] ABSTRACT Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information. To address this problem, we demonstrate a generally applicable approach to pro- viding strong privacy guarantees for training data. The approach combines, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users. Because they rely directly on sensitive data, these models are not published, but instead used as “teachers” for a “student” model. The student learns to predict an output chosen by noisy voting among all of the teachers, and cannot directly access an individual teacher or the underlying data or parameters. The student’s privacy properties can be understood both intuitively (since no single teacher and thus no single dataset dictates the student’s training) and formally, in terms of differential privacy. These properties hold even if an adversary can not only query the student but also inspect its internal workings. Compared with previous work, the approach imposes only weak assumptions on how teachers are trained: it applies to any model, including non-convex models like DNNs. We achieve state-of-the-art privacy/utility trade-offs on MNIST and SVHN thanks to an improved privacy analysis and semi-supervised learning. 1 INTRODUCTION Some machine learning applications with great benefits are enabled only through the analysis of sensitive data, such as users’ personal contacts, private photographs or correspondence, or even medical records or genetic sequences (Alipanahi et al., 2015; Kannan et al., 2016; Kononenko, 2001; Sweeney, 1997). Ideally, in those cases, the learning algorithms would protect the privacy of users’ training data, e.g., by guaranteeing that the output model generalizes away from the specifics of any individual user. Unfortunately, established machine learning algorithms make no such guarantee; indeed, though state-of-the-art algorithms generalize well to the test set, they continue to overfit on specific training examples in the sense that some of these examples are implicitly memorized. Recent attacks exploiting this implicit memorization in machine learning have demonstrated that private, sensitive training data can be recovered from models. Such attacks can proceed directly, by analyzing internal model parameters, but also indirectly, by repeatedly querying opaque models to gather data for the attack’s analysis. For example, Fredrikson et al. (2015) used hill-climbing on the output probabilities of a computer-vision classifier to reveal individual faces from the training data. Because of those demonstrations—and because privacy guarantees must apply to worst-case out- liers, not only the average—any strategy for protecting the privacy of training data should prudently assume that attackers have unfettered access to internal model parameters. Work done while the author was at Google. 1 Under review as a conference paper at ICLR 2017 To protect the privacy of training data, this paper improves upon a specific, structured application of the techniques of knowledge aggregation and transfer (Breiman, 1994), previously explored by Nis- sim et al. (2007b), Pathak et al. (2010), and particularly Hamm et al. (2016). In this strategy, first, an ensemble (Dietterich, 2000) of teacher models is trained in such a way that each model is based on a disjoint subset of the sensitive data. Then, using auxiliary, unlabeled non-sensitive data, a student model is trained on the aggregate output of the ensemble, such that the student learns to accurately mimic the ensemble. Intuitively, this strategy ensures that the student does not depend on the details of any single sensitive training data point (e.g., of any single user), and, thereby, the privacy of the training data is protected even if attackers can observe the student’s internal model parameters. This paper shows how this strategy’s privacy guarantees can be strengthened by restricting student training to a limited number of teacher votes, and by revealing only the topmost vote after carefully adding random noise. Furthermore, we introduce an improved privacy analysis that makes the strat- egy generally applicable to machine learning algorithms with high utility and meaningful privacy guarantees—in particular, when combined with semi-supervised learning. To establish strong privacy guarantees, it is important to limit the student’s access to its teachers, so that the student’s exposure to teachers’ knowledge can be meaningfully quantified and bounded. Fortunately, there are many techniques for speeding up knowledge transfer that can reduce the rate of student/teacher consultation during learning. We describe four such techniques in this paper, the most effective of which makes use of generative adversarial networks (Goodfellow et al., 2014) ap- plied to semi-supervised learning. We use the implementation proposed by Salimans et al. (2016). Like all semi-supervised learning methods, it assumes the student has access to additional, unlabeled data, which, in this context, must be public or non-sensitive. This assumption should not greatly re- strict our method’s applicability: even when learning on sensitive data, a non-overlapping, unlabeled set of data typically exists, from which semi-supervised methods can extract distribution priors. For instance, large public datasets exist for text and images, and similarly even for medical data. It seems intuitive, or even obvious, that a student machine learning model will provide good privacy when trained without access to sensitive training data, apart from a few, noisy votes from a teacher quorum. However, intuition is not sufficient because privacy properties can be surprisingly hard to reason about; for example, even a single data item can greatly impact machine learning models trained on a large corpus (Chaudhuri et al., 2011). Therefore, to limit the effect of any single sensitive data item on the student’s learning, precisely and formally, we apply the well-established, rigorous standard of differential privacy (Dwork for this purpose, we use the state-of-the-art moments accountant technique from Abadi et al. (2016), which tightens the privacy bound when the topmost vote has a large quorum. As a result, for MNIST and similar benchmark learning tasks, our methods allow students to provide excellent utility, while our analysis provides meaningful worst-case guarantees. In particular, we can bound the metric for privacy loss (the differential-privacy “) to a range similar to that of existing, real-world privacy- protection mechanisms, such as Google’s RAPPOR (Erlingsson et al., 2014). Finally, it is an important advantage that our learning strategy and our privacy analysis do not depend on the details of the machine learning techniques used to train either the teachers or their student. Therefore, the techniques in this paper apply equally well for deep learning methods, or any such learning methods with large numbers of parameters, as they do for shallow, simple techniques. In comparison, Hamm et al. (2016) guarantee privacy only conditionally, for a restricted class of student classifiers—in effect, limiting applicability to logistic regression with convex loss. Also, unlike the methods of Abadi et al. (2016), which represent the state-of-the-art in differentially- private deep learning, our techniques make no assumptions about details such as batch selection, the loss function, or whether optimization is by gradient descent or otherwise. Even so, as we show in experiments on MNIST and SVHN, our techniques provide a privacy/utility tradeoff that equals or improves upon bespoke learning methods such as those of Abadi et al. (2016). Appendix A further discusses the related work. Building on this related work, our contributions are as follows: 2 Under review as a conference paper at ICLR 2017 Data 1Data 2 Data n Data 3. Teacher 1Teacher 2 Teacher n Teacher 3. Aggregate Teacher QueriesStudent Training Accessible by adversaryNot accessible by adversary Sensitive Data Incomplete Public Data Prediction Data feeding Predicted completion Figure 1: Overview of the approach: (1) an ensemble of teachers is trained on disjoint subsets of the sensitive data, (2) a student model is trained on public data labeled using the ensemble. We demonstrate a general machine learning strategy that provides differential privacy for training data in a “black-box” manner, i.e., independent of the learning algorithm, as demonstrated by Section 4 and Appendix D. We improve upon the strategy outlined in Hamm et al. (2016) for learning machine models that protect training data privacy. In particular, our student only accesses the teachers’ top vote and the model does not need to be trained with a restricted class of convex losses. We explore four different approaches for reducing the student’s dependence on its teachers, and show how the application of GANs to semi-supervised learning of Salimans et al. (2016) can greatly reduce the privacy loss by radically reducing the need for supervision. We present a new application of the moments accountant technique from Abadi et al. (2016) for improving the differential-privacy analysis of knowledge transfer, which allows the training of students with meaningful privacy bounds. We evaluate our framework on MNIST and SVHN, allowing for a comparison of our results with previous differentially private machine learning methods. Our classifiers achieve an (“; ) differential-privacy bound of (2:04;10 5) for MNIST and (8:19;10 6) for SVHN, respectively with accuracy of 98:00% and 90:66%. In comparison, for MNIST, Abadi et al. (2016) obtain a looser (8;10 5) privacy bound and 97% accuracy. For SVHN, Shokri Y), where X denotes the set of inputs, and Y the set of labels, we partition the data in n disjoint sets (Xn;Yn) and train a model separately on each set. As evaluated in Section 4.1, assum- ing that n is not too large with respect to the dataset size and task complexity, we obtainn classifiers 3 Under review as a conference paper at ICLR 2017 fi called teachers. We then deploy them as an ensemble making predictions on unseen inputs x by querying each teacher for a prediction fi(x) and aggregating these into a single prediction. Aggregation: The privacy guarantees of this teacher ensemble stems from its aggregation. Let m be the number of classes in our task. The label count for a given class j 21::m and an input ~x is the number of teachers that assigned class j to input ~x: nj(~x) =jfi : i21::n;fi(~x) = jgj. If we simply apply plurality—use the label with the largest count—the ensemble’s decision may depend on a single teacher’s vote. Indeed, when two labels have a vote count differing by at most one, there is a tie: the aggregated output changes if one teacher makes a different prediction. Therefore, we add random noise to the vote counts nj to introduce ambiguity: f(x) = argmaxj nj(~x) +Lap 1 “ (1) In this equation, “ is a privacy parameter and Lap(b) the Laplacian distribution with location 0 and scale b. Smaller “ values provide stronger privacy guarantees but degrade the accuracy of the aggre- gation: the noise added can potentially change the respective order of label counts (see Section 4.1). To continue providing strong privacy guarantees as successive queries are made, larger noise needs to be added. Hence, the model utility is subject to degradation. Furthermore, privacy guarantees do not hold when an adversary has access to the model parameters, or recovers them by making a large number of queries to the model. Indeed, as each teacher fi was trained without taking into account privacy, it is conceivable that they have sufficient capacity to retain details of the training data. To address these limitations, we train another model, the student, using a fixed number of labels predicted by the teacher ensemble. 2.2 SEMI-SUPERVISED TRANSFER OF THE KNOWLEDGE FROM AN ENSEMBLE TO A STUDENT We train a student on nonsensitive and unlabeled data, some of which we label using the aggregation mechanism. This student model is the one deployed, in lieu of the teacher ensemble, so as to fix the privacy loss to a value that is constant with respect to the number of user queries made to the student model. Indeed, the privacy loss is now determined by the number of queries made to the teacher ensemble during student training and does not increase as end-users query the deployed student model. Thus, the privacy of users who contributed to the original training dataset is preserved even if the student’s architecture and parameters are public or reverse-engineered by an adversary. We considered three classes of techniques to maximize the student’s learning while decreasing the number of labels it needs to access: distillation, active learning, semi-supervised learning (see Ap- pendix C). Here, we only describe the most successful: semi-supervised learning with GANs. Training the student with GANs: The GAN framework involves two machine learning models, a generator and a discriminator. They are trained in a competing fashion, in what can be viewed as a two-player game (Goodfellow et al., 2014). The generator produces samples from the data distribu- tion by transforming vectors sampled from a Gaussian distribution. The discriminator is trained to distinguish samples artificially produced by the generator from samples part of the real data distri- bution. Models are trained via simultaneous gradient descent steps on both players’ costs with the goal of converging to the Nash equilibrium. In practice, such convergence is complex to achieve when the parameterization of each player’s strategy is a non-convex function of the parameters, like a DNN. In their application of GANs to semi-supervised learning, Salimans et al. (2016) made the following modifications. The discriminator is extended from a binary classifier (data vs. generator sample) to a multi-class classifier (one of k classes of data samples, plus a class for generated sam- ples). This classifier is then trained to classify real samples in one of thek classes, and the generated samples in the additional class. Although no formal results currently explain why yet, the technique was empirically demonstrated to greatly improve semi-supervised learning of classifiers on several datasets, especially when the classifier is trained with feature match