Privacy-Preserving Publishing of Knowledge Graphs

Hoang, Anh Tu

Online social networks (OSNs) attract a huge number of users sharing their data every day. These data can be shared with third parties for various usage purposes, such as data analytics and machine learning. Unfortunately, adversaries can exploit shared data to infer users’ sensitive information. Various anonymization solutions have been presented to anonymize shared data such that it is harder for adversaries to infer users’ personal information. Whereas OSNs contain both users’ attributes and relationships, previous work only consider anonymizing either attributes, illustrated in relational data or relationships, represented in directed graphs. To cope with this issue, in this thesis, we consider the research challenge of anonymizing knowledge graphs (KGs), due to their flexibility in representing both attributes’ values and relationships of users. The anonymization of KGs is not trivial since adversaries can exploit both attributes and relationships of their victims. In the era of big data, these solutions are significant as they allow data providers to share attributes’ values and relationships together. Over the last three years, we have done important research efforts which has resulted in the definition of different anonymization solutions for KGs for many relevant scenarios, i.e., anonymization of static KGs, sequential anonymization of KGs, and personalized anonymization of KGs. Since KGs are directed graphs, we started our research by investigating anonymization solutions for directed graphs. As anonymization algorithms proposed in the literature (i.e., the Paired k-degree) cannot always anonymize graphs, we first presented the Cluster-Based Directed Graph Anonymization Algorithm (CDGA). We proved that CDGA can always generate anonymized directed graphs. We analyzed an attacking scenario where an adversary can exploit attributes’ values and relationships of his/her victims to re-identify these victims in anonymized KGs. To protect users in this scenario, we presented the k-Attribute Degree (k-ad) protection model to ensure that users cannot be re-identified with a confidence higher than 1 k . We proposed the Cluster-Based Knowledge Graph Anonymization Algorithm (CKGA) to anonymize KGs for this scenario. CKGA has been designed for a scenario where KGs are statically anonymized. Unfortunately, the adversary can still re-identify his/her victims if he/she has access to many versions of the anonymized KG. To cope with this issue, we further presented the k w-Time-Varying Attribute Degree to give users the same protection of k-ad even if the adversary gains access to w continuous anonymized KGs. In addition, we proposed the Cluster-based Time-Varying Knowledge Graph Anonymization Algorithm to anonymize KGs while allowing data providers to insert/re-insert/remove/update nodes and edges of their KGs. However, users are not allowed to specify their privacy preferences which are crucial to for those users requiring strong privacy protection, such as influencers. To this end, we proposed the Personalized k-Attribute Degree to allow users to specify their own value of k. The effectiveness of the proposed algorithms has been tested with experiments on real-life datasets.

Privacy-Preserving Publishing of Knowledge Graphs / Anh-tu Hoang , 2020. 33. ciclo, Anno Accademico 2019/2020.