首页>
外国专利>
SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS
SYSTEM AND METHOD FOR BUILDING DIVERSE LANGUAGE MODELS
展开▼
机译:建立多种语言模型的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for collecting web data in order to create diverse language models. A system configured to practice the method first crawls, such as via a crawler operating on a computing device, a set of documents in a network of interconnected devices according to a visitation policy, wherein the visitation policy is configured to focus on novelty regions for a current language model built from previous crawling cycles by crawling documents whose vocabulary considered likely to fill gaps in the current language model. A language model from a previous cycle can be used to guide the creation of a language model in the following cycle. The novelty regions can include documents with high perplexity values over the current language model.
展开▼