Field independent probabilistic model for clustering multi-field documents

Shanfeng Zhu; Ichigaku Takigawa; Jia Zeng; Hiroshi Mamitsuka

首页> 外文期刊>Information Processing & Management >Field independent probabilistic model for clustering multi-field documents

【24h】

Field independent probabilistic model for clustering multi-field documents

机译：用于多字段文档聚类的与字段无关的概率模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a new finite mixture model for clustering multiple-field documents, such as scientific literature with distinct fields: title, abstract, keywords, main text and references. This probabilistic model, which we call field independent clustering model (FICM), incorporates the distinct word distributions of each field to integrate the discriminative abilities of each field as well as to select the most suitable component probabilistic model for each field. We evaluated the performance of FICM by applying it to the problem of clustering three-field (title, abstract and MeSH) biomedical documents from TREC 2004 and 2005 Genomics tracks, and two-field (title and abstract) news reports from Reuters-21578. Experimental results showed that FICM outperformed the classical multinomial model and the multivariate Bernoulli model, being at a statistically significant level for all the three collections. These results indicate that FICM outperformed widely-used probabilistic models for document clustering by considering the characteristics of each field. We further showed that the component model, which is consistent with the nature of the corresponding field, achieved a better performance and considering the diversity of model setting also gave a further performance improvement. An extended abstract of parts of the work presented in this paper has appeared in Zhu et al. [Zhu, S., Takigawa, I., Zhang, S., & Mamitsuka, H. (2007). A probabilistic model for clustering text documents with multiple fields. In Proceedings of the 29th European conference on information retrieval, ECIR 2007. Lecture notes in computer science (Vol. 4425, pp. 331-342)].

机译：我们提出了一种新的有限混合模型，用于聚类多领域文档，例如具有不同领域的科学文献：标题，摘要，关键字，主要文本和参考文献。我们将这种概率模型称为场无关聚类模型（FICM），它结合了每个域的不同单词分布，以集成每个域的判别能力，并为每个域选择最合适的组件概率模型。我们将FICM的性能应用于TREC 2004和2005 Genomics跟踪的三场（标题，摘要和MeSH）生物医学文档以及来自Reuters-21578的两场（标题和摘要）新闻报道的问题进行了评估。实验结果表明，FICM优于经典多项式模型和多元伯努利模型，在所有三个集合中均具有统计学上的显着水平。这些结果表明，通过考虑每个字段的特征，FICM优于用于文档聚类的广泛使用的概率模型。我们进一步表明，与相应领域的性质相一致的组件模型实现了更好的性能，并且考虑到模型设置的多样性也进一步提高了性能。 Zhu等人发表了本文提出的部分工作的扩展摘要。 [Zhu，S.，Takigawa，I.，Zhang，S.，＆Mamitsuka，H.（2007）。用于将文本文档与多个字段聚类的概率模型。在第29届欧洲信息检索会议论文集中，ECIR2007。计算机科学讲义（第4425卷，第331-342页）]。

著录项

来源
《Information Processing & Management》 |2009年第5期|555-570|共16页
作者
Shanfeng Zhu; Ichigaku Takigawa; Jia Zeng; Hiroshi Mamitsuka;
展开▼
作者单位

Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China School of Computer Science, Fudan University, 220 Handan Road, Shanghai 200433, China;

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan;

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong;

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
document clustering; finite mixture model; multivariate Bernoulli model; multinomial model; field independent clustering model;

机译：文档聚类;有限混合模型多元伯努利模型;多项式模型领域无关的聚类模型;

相似文献

外文文献
中文文献
专利

1. Flow regime independent, high resolution multi-field modelling of near-horizontal gas-liquid flows in pipelines [J] . Bonizzi M, Andreussi P, Banerjee S International Journal of Multiphase Flow . 2009,第1期

机译：独立于流态的管道中近水平气液流的高分辨率多场建模
2. Applying Multiple Grids to a Multi-Field Model - The Resolution Requirements of Individual Fields in the Two-Fluid Model for 1D Pipe Flow [J] . Akselsen A. H., Nydal O. J. Journal of Dispersion Science and Technology . 2015,第10a12期

机译：将多个网格应用于多字段模型-一维管道流动的两流体模型中单个字段的分辨率要求
3. Thermodynamically consistent phase-field models of fracture: Variational principles and multi-field FE implementations [J] . Miehe C., Welschinger F., Hofacker M. International Journal for Numerical Methods in Engineering . 2010,第10期

机译：断裂的热力学一致相场模型：变分原理和多场有限元实现
4. A Probabilistic Model for Clustering Text Documents with Multiple Fields [C] . Shanfeng Zhu, Ichigaku Takigawa, Shuqin Zhang, European Conference on IR Research . 2007

机译：具有多个字段的群集文本文档的概率模型
5. Experiments and Multi-Field Modeling of Inelastic Soft Materials [D] . Wang, Shuolun. 2018

机译：非弹性软材料的实验与多场建模
6. Multi-Field Coupling Dynamics Modeling of Aerostatic Spindle [O] . Guoda Chen, Yijie Chen 2021

机译：空气静力学主轴多场耦合动力学建模
7. Multi-field open inflation model and multi-field dynamics in tunneling [O] . Kazuyuki Sugimura, Daisuke Yamauchi, Misao Sasaki 2012

机译：隧道多场开放通胀模型与多场动态

Field independent probabilistic model for clustering multi-field documents

摘要

著录项

相似文献

相关主题

期刊订阅