首页> 外国专利> Removing non-substantive content from a web page by removing its text-sparse nodes and removing high-frequency sentences of its text-dense nodes using sentence hash value frequency across a web page collection

Removing non-substantive content from a web page by removing its text-sparse nodes and removing high-frequency sentences of its text-dense nodes using sentence hash value frequency across a web page collection

机译：通过删除整个页面集合中的句子散列值频率，通过删除其文本稀疏节点并删除其文本密集节点的高频句子来从网页中删除非实质内容

页面导航

摘要
著录项
相似文献

摘要

A method and system for removing chrome from a web page is provided. An example system includes a parsing module, a text density analyzer, a content node selector 206, and a text extractor. The parsing module may be configured to parse a web page into a tree structure. The text density analyzer may be configured to determine a text density score value for each node from the tree structure. The content node selector may be configured to identify one or more nodes from the tree structure as content nodes based on their respective text density score values. The text extractor may be configured to extract text from the content nodes only.

机译：提供了一种用于从网页去除铬的方法和系统。一个示例系统包括解析模块，文本密度分析器，内容节点选择器 206 和文本提取器。解析模块可以被配置为将网页解析为树结构。文本密度分析器可以被配置为从树结构中确定每个节点的文本密度得分值。内容节点选择器可以被配置为基于树结构的一个或多个节点各自的文本密度得分值来将它们识别为内容节点。文本提取器可以被配置为仅从内容节点提取文本。 展开▼

著录项

公开/公告号US9449114B2

专利类型

公开/公告日2016-09-20

原文格式PDF

申请/专利权人 JOHN ROPER;DANE GLASGOW;
展开▼

申请/专利号US20100761272

发明设计人 JOHN ROPER;DANE GLASGOW;
展开▼

申请日2010-04-15

分类号G06F17/30;G06F17/21;G06F17/22;

国家 US

入库时间 2022-08-21 14:31:38

相似文献

专利

外文文献

中文文献

1. 在备份时计算散列值以删除重复数据的系统及其方法 [P] . 中国专利： CN103853754A . 2014-06-11

2. 用于删除在移动通信终端中接收的文本消息的装置和方法 [P] . 中国专利： CN1319358C . 2007.05.30

3. Connecting element for the releasable fastening of a removable denture on a fixed artificial tooth, such as dental crowns, telescopable crowns and webs, on natural teeth or dental implants; process for the detachable fastening of a removable denture on a fixed artificial tooth, such as dental crowns, telescopable crowns and webs, on natural teeth or dental implants [P] . 德国专利： DE102017007560B3 . 2018-09-06

机译：用于将可移动义齿可释放地固定在固定人造牙齿上的连接元件，例如天然牙或种植牙上的牙冠，可伸缩牙冠和腹板;可拆卸义齿在天然牙齿或种植牙上可拆卸义齿可拆卸地固定在固定人造牙（例如牙冠，可伸缩牙冠和腹板）上的方法

4. Method, system and computer product for classifying web content nodes based on relationship scores derived from mapping content nodes, topical seed nodes and evaluation nodes [P] . 美国专利： US7739209B1 . 2010-06-15

机译：基于从映射内容节点，主题种子节点和评估节点得出的关系分数对web内容节点进行分类的方法，系统和计算机产品

5. Process for reducing the moisture content of the weft fibers process to remove part of the liquid contained in a plot remains porous wet process to remove water wet porous web process pairThe product of crepe paper process to retool manufacturing facility of the plot of conventional role system to reduce the content of the mixture of the weft paper process to convert installsTion of the manufacture of the weft paper pressed wet conventional [P] . BR9506569A . 1997-09-02

机译：降低纬纤维含水量的工艺，以去除地块中所含液体的一部分，保持多孔湿法，以去除湿的湿网幅工艺对。皱纹纸工艺产品将传统角色系统的地块的生产设备改造为减少纬纸工艺混合物的含量以转换安装量

1. The number of tumor-free axillary lymph nodes removed as a prognostic parameter for node-negative breast cancer [J] . Fei Gao, Ni He, Pei-Hong Wu 癌症（英文版） . 2014,第011期

2. Prognostic significance of the number of pelvic lymph nodes removed in patients with early cervical cancer [J] . Jing Zhao, Weihong Dong 肿瘤学与转化医学（英文） . 2018,第2)期

3. Prognostic significance of the number of pelvic lymph nodes removed in patients with early cervical cancer [J] . Jing Zhao, Weihong Dong 中德临床肿瘤学杂志（英文版） . 2018,第002期

4. Prognostic significance of total number of nodes removed, negative nodes removed, and ratio of positive nodes to removed nodes in node positive breast carcinoma. [J] . Kuru B European Journal of Surgical Oncology: The Journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology . 2006,第10期

5. Sentinel node tumour burden quantified based on cytokeratin 19 mRNA copy number predicts non-sentinel node metastases in breast cancer: Molecular whole-node analysis of all removed nodes [J] . OsakoT., IwaseT., KimuraK., European journal of cancer: official journal for European Organization for Research and Treatment of Cancer (EORTC) [and] European Association for Cancer Research (EACR) . 2013,第6期

6. Reply from Authors re: Francesco Montorsi, Giorgio Gandaglia. Sentinel node biopsy for prostate cancer: A useless surgical exercise? Eur Urol 2014;66:999-1000: Removing nodes that count rather than counting nodes that don't [J] . VanDerPoelH.G., VanDenBergN.S., KleinjanG.H., European urology . 2014,第6期

7. Web Content Information Extraction Approach Based on Removing Noise and Content-Features [C] . Yang Dingkui, Song Jihua 2010 International Conference on Web Information Systems and Mining . 2010

8. Hypertext: Attraction and distraction. The effects of hypertext link positioning and node content on inter-sentence integration. [D] . Hardy, Kathleen M. 2001

9. Prognostic Significance of the Number of Removed and Metastatic Lymph Nodes and Lymph Node Ratio in Breast Carcinoma Patients with 1–3 Axillary Lymph Node(s) Metastasis [O] . Nüvit Duraker, Bakır Batı, Davut Demir, 2011

10. Prognostic Significance of the Number of Removed and Metastatic Lymph Nodes and Lymph Node Ratio in Breast Carcinoma Patients with 1–3 Axillary Lymph Node(s) Metastasis [O] . Duraker, Nüvit, Batı, Bakır, Demir, Davut, 2011

1. A new device for the identification of lymph nodes removed during different types of neck dissection [J] . Imre Gerlinger ,Tamas Ferenc Molnar ,Tamas Jarai . 健康（英文） . 2010,第9期

2. Correlation of tumor-positive ratio and number of perigastric lymph nodes with prognosis of patients with surgically-removed gastric carcinoma [J] . Yong-BinDing ,Guo-YuChen ,Jian-GuoXia . 世界胃肠病学杂志：英文版 . 2004,第2期