ICML Poster Unmasking Vulnerabilities: Cardinality Sketches under Adaptive Inputs

Poster

Unmasking Vulnerabilities: Cardinality Sketches under Adaptive Inputs

Sara Ahmadian · Edith Cohen

[ Abstract ]

Abstract: Cardinality sketches are popular data structures that enhance the efficiency of working with large data sets. The sketches are randomized representations of sets that are only of logarithmic size but can support set merges and approximate cardinality (i.e., distinct count) queries. When queries are not adaptive, that is, do not depend on preceding query responses, the design provides strong guarantees of correctly answering a number of queries that is exponential in the sketch size $k$. In this work, we investigate the performance of cardinality sketches in adaptive settings and unveil inherent vulnerabilities: We design an attack against the ``standard'' estimators that constructs an adversarial input using a number of queries that is linear in the sketch size $k$. Empirically, our attack used only $4k$ queries with the widely used HyperLogLog (HLL++)~\cite{hyperloglog:2007,hyperloglogpractice:EDBT2013} sketch. Notably, our attack is simple and uses queries that are not adaptive with the adversarial input constructed by post-processing responses. This suggests that it can be used with natural workloads. Finally, we demonstrate that the vulnerability is inherent for the sketch structures as any estimator can be attacked using a number of queries that is quadratic in $k$, matching a generic upper bound.

Live content is unavailable. Log in and register to view live content