【24h】

Two time-efficient gibbs sampling inference algorithms for biterm topic model

机译:Biterm主题模型的两个Quey-Questive Gibbs采样推理算法

获取原文
获取原文并翻译 | 示例
           

摘要

Biterm Topic Model (BTM) is an effective topic model proposed to handle short texts. However, its standard gibbs sampling inference method (StdBTM) costs much more time than that (StdLDA) of Latent Dirichlet Allocation (LDA). To solve this problem we propose two time-efficient gibbs sampling inference methods, SparseBTM and ESparseBTM, for BTM by making a tradeoff between space and time consumption in this paper. The idea of SparseBTM is to reduce the computation in StdBTM by both recycling intermediate results and utilizing the sparsity of count matrix . Theoretically, SparseBTM reduces the time complexity of StdBTM from O(|B| K) to O(|B| K (w) ) which scales linearly with the sparsity of count matrix (K (w) ) instead of the number of topics (K) (K (w) K, K (w) is the average number of non-zero topics per word type in count matrix ). Experimental results have shown that in good conditions SparseBTM is approximately 18 times faster than StdBTM. Compared with SparseBTM, ESparseBTM is a more time-efficient gibbs sampling inference method proposed based on SparseBTM. The idea of ESparseBTM is to reduce more computation by recycling more intermediate results through rearranging biterm sequence. In theory, ESparseBTM reduces the time complexity of SparseBTM from O(|B|K (w) ) to O(R|B|K (w) ) (0 R 1, R is the ratio of the number of biterm types to the number of biterms). Experimental results have shown that the percentage of the time efficiency improved by ESparseBTM on SparseBTM is between 6.4% and 39.5% according to different datasets.
机译:Biterm主题模型(BTM)是一个有效的主题模型,用于处理短文本。然而,其标准的GIBBS采样推理方法(STDBTM)的成本比潜在的Dirichlet分配(LDA)的时间更多的时间更多。为了解决这个问题,我们通过在本文中的空间和时间消耗之间进行权衡,提出了两次Quate效率的GIBBS采样推论方法,SPARSASTBTM和ESPARSEBTM,用于BTM。 SparseBtm的想法是通过回收中间结果并利用计数矩阵的稀疏性来减少STDBTM的计算。理论上,SparseBTM将STDBTM的时间复杂度降低到O(| B | K)至O(| B | K(W)),其与计数矩阵的稀稀条(K(W))而不是主题的数量( k)(k(w)& k,k(w)是计数矩阵中每个单词类型的非零主题的平均数量)。实验结果表明,在良好的条件下,SparseBtm的速度比STDBTM快约18倍。与SparseBtm相比,EsparseBtm是一种基于SparseBtm的更高效率的GIBBS采样推理方法。 EsparseBtm的想法是通过重新排列苯定法通过重新排列更高的中间结果来减少更多的计算。理论上,EsparseBTM将来自O(B | K(W))至O(R | B | K(W))(0& 1,R是数量的比例BENRERM类型为BITERMS的数量)。实验结果表明,根据不同的数据集,ESParseBTM改善的时间效率的百分比增加了6.4%和39.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号