首页>
外国专利>
DATA CLEANSING SYSTEM, DATA CLEANSING METHOD, AND DATA CLEANSING PROGRAM
DATA CLEANSING SYSTEM, DATA CLEANSING METHOD, AND DATA CLEANSING PROGRAM
展开▼
机译:数据清洗系统,数据清洗方法和数据清洗程序
展开▼
页面导航
摘要
著录项
相似文献
摘要
PROBLEM TO BE SOLVED: To enable accurate data cleansing even in a case where place names with the same notation, abbreviated notation, building name, person's name, and the like coexist.SOLUTION: A data cleansing system comprises: a candidate adding unit 102a for adding an address as a basic token which is a candidate for an element constituting the address with reference to the dictionary data 101b defining the address, and if a corresponding word for the divided character string is not detected from the dictionary data 101b, analyzing a character type of the divided character string, and adding it as an analysis token corresponding to the analyzed character type; a tree construction unit 102b for branching and connecting the basic token and the analysis token added by the candidate adding unit 102a to construct a tree structure; a cost calculation unit 102c for calculating a cost which is added a weight given according to the priority of each token for each branch pattern included in the tree structure; and a candidate selection unit 102e for selecting a predetermined branch pattern as a candidate for the address according to the calculated cost.SELECTED DRAWING: Figure 2
展开▼