Techniques andChallenges while Applying Machine Learning Algorithms inPrivacy Preserving Fashion

Authors

  • Artrim Kjamilji

Abstract

Nowadaysmany different entities collect data of the same nature, but in slightly different environments. In this sense differenthospitalscollect data about their patients’ symptoms and corresponding disease diagnoses, different banks collect transactions of their customers’ bank accounts,multiple cyber-security companies collect data about log files and corresponding attacks, etc. It is shown that if those different entitieswould mergetheir privately collected datain a single dataset and use itto train a machine learning (ML) model, they oftenend upwith a trained model that outperforms the human expertsof the corresponding fieldsinterms ofaccurate predictions. However, there is a drawback.Due to privacy concerns,empowered by laws and ethical reasons, no entity is willing to share with others their privately collected data. The sameproblem appears during the classification case overanalready trained ML model. On one hand,a user that has an unclassified query(record),doesn’t want to share with the server that owns the trained model neither the content of the query (which might contain private data such as credit card number, IP address, etc.), nor the finalprediction(classification) of the query. On the other hand, the owner of the trained model doesn’t want to leak any parameter ofthe trained model to the user. In order to overcome those shortcomings, severalcryptographic and probabilistictechniques have been proposed during the last few yearsto enable both privacy preserving training and privacy preserving classificationschemes. Some of them include anonymization and k-anonymity, differential privacy, secure multiparty computation (MPC), federated learning, Private Information Retrieval (PIR), Oblivious Transfer (OT), garbled circuitsand/orhomomorphic encryption, to name a few.Theoretical analyses and experimental results show that the current privacy preserving schemes are suitable for real-case deployment, while the accuracy of most of them differ little or not at all with the schemes that work in non-privacy preserving fashion.

Downloads

Published

2021-06-07

Issue

Section

Keynote Speech