首页>
外国专利>
Table of contents extraction based on textual similarity and formal aspects
Table of contents extraction based on textual similarity and formal aspects
展开▼
机译:基于文本相似度和形式方面的目录提取
展开▼
页面导航
摘要
著录项
相似文献
摘要
An initial organizational table for a document is determined based on textual similarity between entries of the organizational table and target text fragments and not taking into account text formatting. A classifier is trained to identify text fragment pairs consisting of entries of the organizational table and corresponding target text fragments based at least in part on text formatting features. The training employs a training set of examples annotated based on the initial organizational table. The initial organizational table is updated using the trained classifier.
展开▼