首页> 外文OA文献 >Online Row Sampling
【2h】

Online Row Sampling

机译:在线行抽样

摘要

Finding a small spectral approximation for a tall n x d matrix A is a fundamental numerical primitive. For a number of reasons, one often seeks an approximation whose rows are sampled from those of A. Row sampling improves interpretability, saves space when A is sparse, and preserves row structure, which is especially important, for example, when A represents a graph.However, correctly sampling rows from A can be costly when the matrix is large and cannot be stored and processed in memory. Hence, a number of recent publications focus on row sampling in the streaming setting, using little more space than what is required to store the outputted approximation [Kelner Levin 2013] [Kapralov et al. 2014].Inspired by a growing body of work on online algorithms for machine learning and data analysis, we extend this work to a more restrictive online setting: we read rows of A one by one and immediately decide whether each row should be kept in the spectral approximation or discarded, without ever retracting these decisions. We present an extremely simple algorithm that approximates A up to multiplicative error epsilon and additive error delta using O(d log d log (epsilon ||A||_2^2/delta) / epsilon^2) online samples, with memory overhead proportional to the cost of storing the spectral approximation. We also present an algorithm that uses O(d^2) memory but only requires O(d log (epsilon ||A||_2^2/delta) / epsilon^2) samples, which we show is optimal.Our methods are clean and intuitive, allow for lower memory usage than prior work, and expose new theoretical properties of leverage score based matrix approximation.
机译:为一个高n x d矩阵A找到一个小的光谱近似值是一个基本的数值本原。由于多种原因,人们通常会寻找一种近似值,该近似值是从A的行中进行采样的。行采样提高了可解释性,在A稀疏时节省了空间,并保留了行结构,这尤其重要,例如,当A代表图形时但是,当矩阵很大并且无法在内存中存储和处理时,从A正确采样行可能会耗费大量资金。因此,许多最近的出版物集中于流式设置中的行采样,所使用的空间比存储输出的近似值所需的空间少[Kelner Levin 2013] [Kapralov等。 2014]。受机器学习和数据分析在线算法工作量不断增长的启发,我们将这项工作扩展到更具限制性的在线设置:我们逐一读取A行,并立即决定是否应将每一行保留在频谱近似或丢弃,而无需撤消这些决策。我们提出了一种非常简单的算法,使用在线样本O(d log d log(epsilon || A || _2 ^ 2 / delta / epsilon ^ 2)/ epsilon ^ 2)将A近似为乘积误差epsilon和加性误差delta。存储光谱近似值的成本。我们还提出了一种使用O(d ^ 2)内存但仅需要O(d log(epsilon || A || _2 ^ 2 / delta)/ epsilon ^ 2)样本的算法,我们证明这是最优的。我们的方法是简洁直观,比以前的工作占用更少的内存,并公开了基于杠杆得分的矩阵近似的新理论属性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号