Literature

Privacy-Preserving Federated Learning

Federated learning is a machine learning technique that allows independent parties to collaboratively train a global model while keeping their local data decentralized. Recently, this approach has attracted considerable attention due to the abundance of data sources dispersed across many sources, which can be combined to generate more efficient and accurate machine learning models. Federated learning offers the possibility of generating these complex and comprehensive models with information coming from different data providers, without revealing the local data of a participant to the remaining parties involved. This could be achieved by each party training a local model based on its own data, and then sharing the model’s parameters with the other parties and/or a centralized server to generate a global model. Although data itself remains decentralized, revealing local model parameters to others can result in the training data to be exposed to unauthorized parties. Privacy-preserving federated learning has emerged as a research topic in order to address problems as such. This article shares some work in the field of privacy-preserving federated learning, which could be useful in familiarizing yourself with the current state-of-the-art. In this work, the authors propose several secure aggregation protocols for federated learning systems in a single server setting. These include masking using one-time pads, using secret-sharing mechanisms to recover in the case of a user-dropout scenario, using double-masking to overcome possible information leakage in the presence of a malicious server, an efficient key exchange protocol that reduces the overall communication cost, and finally, a mobile-device deployable protocol that ensures pairwise secure connection, authentication and forward-secrecy. XORBoost is a protocol for training and inferring gradient-boosted trees in an MPC setting using the Manticore MPC framework, which operates with an offline trusted dealer, and with full-threshold security across an arbitrary number of players. The XORBoost framework supports training for both vertically and horizontally split datasets (e.g., split by data points vs. split by feature space). SecureBoost is a vertical federated tree-boosting protocol that uses homomorphic encryption, specifically the Paillier cryptosystem, to achieve security. The gradient values are calculated by the active party, who owns the label information. The gradients are encrypted and sent to the passive parties (who do not have access to the label information, but only to certain feature values). Passive parties aggregate these gradients based on their feature values. The aggregated gradients are sent back to the active party, who decrypts them to decide the split point for the given node of the tree. FederBoost introduces vertical and horizontal federated learning protocols for training gradient-boosted tree models using differential privacy and secure aggregation. The vertical protocol does not require any cryptographic operations. The horizontal protocol uses secure aggregation to aggregate the gradients and find the split point for given nodes. The horizontal federated learning approach includes a secure quantile look-up technique that also utilizes secure aggregation for distributed bucket construction, which is a non-trivial operation.