【24h】

Towards Building a Collection of Web Archiving Research Articles

机译:致力于构建Web归档研究文章的集合

获取原文
获取原文并翻译 | 示例

摘要

The field of Web Archiving exists in a fluid, fragmented,rnand heterogeneous state. Part of the problem is that thisrnfield is relatively new and its literature is scattered across arnwide range of journal and conference venues. This makesrnthe state of Web Archiving as a discipline particularlyrndifficult to ascertain. This paper presents an approach tornbuilding a collection of articles about the subject. We beginrnwith a small dataset of articles taken from a Web ArchivingrnBibliography and then proceed to expand it by crawling thernWeb and collecting additional documents. The crawledrndocuments are then classified using machine learningrnclassification techniques. We show that by extracting therndocuments’ titles and abstracts and representing them usingrnthe “bag of words” approach, we are able to accuratelyrnidentify documents from the Web crawler as documentsrnthat are about Web Archiving. We also discuss our resultsrnin the context of Web Archiving as an emerging field.
机译:Web归档领域以一种流动的,零散的,混杂的状态存在。问题的部分原因是该领域相对较新,其文献散布在各种期刊和会议场所中。这使得Web归档的状态成为一门很难确定的学科。本文提出了一种构建有关该主题的文章集的方法。我们从从Web档案书目中摘录的少量文章开始,然后通过爬网和收集其他文档来进行扩展。然后使用机器学习分类技术对已爬网的文档进行分类。我们显示出,通过提取文档的标题和摘要并使用“单词袋”方法表示它们,我们能够准确地将Web搜寻器中的文档标识为与Web归档有关的文档。我们还将讨论在Web归档这一新兴领域中的研究结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号