ICML Poster High-Dimensional Geometric Streaming for Nearly Low Rank Data

Poster

High-Dimensional Geometric Streaming for Nearly Low Rank Data

Hossein Esfandiari · Praneeth Kacham · Vahab Mirrokni · David Woodruff · Peilin Zhong

[ Abstract ]

Abstract: We study streaming algorithms for the $\ell_p$ subspace approximation problem. Given points $a_1, \ldots, a_n$ as an insertion-only stream and a rank parameter $k$, the $\ell_p$ subspace approximation problem is to find a $k$-dimensional subspace $V$ such that $(\sum_{i=1}^n d(a_i, V)^p)^{1/p}$ is minimized, where $d(a, V)$ denotes the euclidean distance between $a$ and $V$. When $p = \infty$, we need to find a subspace $V$ that minimizes $\max_i d(a_i, V)$. For $\ell_{\infty}$ subspace approximation, we give a deterministic strong coreset construction algorithm and show that it can be used to compute a $\text{poly}(k, \log n)$ approximate solution. For $\ell_p$ subspace approximation, we show that suitably scaling the points and then using our $\ell_{\infty}$ coreset construction, we can compute a $\text{poly}(k, \log n)$ approximation. Our algorithms are easy to implement and run very fast on large datasets.We also use our strong coreset construction to improve the results in a recent work of Woodruff and Yasuda (FOCS 2022) which gives streaming algorithms for high-dimensional geometric problems such as width estimation, convex hull estimation, volume estimation etc. Their algorithms require $\Omega(d^2)$ bits of space and have an $\Omega(\sqrt{d})$ multiplicative approximation factor. We show that when the rows are $a_1,\ldots,a_n$ are “almost” spanned by a $k$ dimensional space, our streaming coreset construction algorithm can be used to obtain algorithms that use only $O(d \cdot \text{poly}(k, \log n))$ bits of space and have a multiplicative error of $O(\text{poly}(k, \log n))$. When $k \ll d$, our algorithms use a much smaller amount of space while guaranteeing a better approximation.

Live content is unavailable. Log in and register to view live content