首页> 外文会议>Workshop on noisy user-generated text >No, you're not alone: A better way to find people with similar experiences on Reddit
【24h】

No, you're not alone: A better way to find people with similar experiences on Reddit

机译:不,您并不孤单:在Reddit上找到具有类似经验的人的更好方法

获取原文

摘要

We present a probabilistic clustering algorithm that can help Reddit users to find posts that discuss experiences similar to their own. This model is built upon the BF.RT Next Sentence Prediction model and reduces the time complexity for clustering all posts in a corpus from O(n~2) to O(n) with respect to the number of posts. We demonstrate that such probabilistic clustering can yield a performance better than baseline clustering methods based on Latent Dirichlet Allocation (Blei et al.. 2003) and Word2Vec (Mikolov et al., 2013). Furthermore, there is a high degree of coherence between our probabilistic clustering and the exhaustive comparison O(n~2) algorithm in which the similarity between every pair of posts is found. This makes the use of the BERT Next Sentence Prediction model more practical for unsupervised clustering tasks due to the high runtime overhead of each BERT computation.
机译:我们提出了一种概率聚类算法,可以帮助Reddit用户找到讨论与自己的经历类似的帖子。该模型建立在BF.RT下一句预测模型的基础上,并降低了将语料库中所有帖子相对于帖子数量从O(n〜2)聚集到O(n)的时间复杂度。我们证明,与基于潜在狄利克雷分配(Blei et al。2003)和Word2Vec(Mikolov et al。,2013)的基线聚类方法相比,这种概率聚类可以产生更好的性能。此外,在我们的概率聚类和穷举比较O(n〜2)算法之间存在高度的一致性,其中发现了每对帖子之间的相似性。由于每个BERT计算的运行时开销都很高,因此对于无人监督的群集任务,使用BERT下一句预测模型更加实用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号