首页> 外文会议>IEEE Conference on Communications and Network Security >Text mining for security threat detection discovering hidden information in unstructured log messages
【24h】

Text mining for security threat detection discovering hidden information in unstructured log messages

机译:用于安全威胁检测的文本挖掘,发现非结构化日志消息中的隐藏信息

获取原文

摘要

The exponential growth of unstructured messages generated by the computer systems and applications in modern computing environment poses a significant challenge in managing and using the information contained in the messages. Although these data contain a wealth of information that is useful for advanced threat detection, the sheer volume, variety, and complexity of data make it difficult to analyze them even by well-trained security analysts. While conventional Security Information and Event Management (SIEM) systems provide some capability to collect, correlate, and detect certain events from structured messages, their rule-based correlation and detection algorithms fall short in utilizing the information within the unstructured messages. Our study explores the possibility of utilizing the techniques for data mining, text classification, natural language processing, and machine learning to detect security threats by extracting relevant information from various unstructured log messages collected from distributed non-homogeneous systems. The extracted features are used to run a number of experiments on the Packet Clearing House SKAION 2006 IARPA Dataset, and their prediction capability is evaluated. In comparison with the base case without feature extraction, an average of 16.73% performance gain and 84% time reduction was achieved using extracted features only, and a 23.48% performance gain was attained using both unstructured free-text messages and extracted features. The results also show a strong potential for further increase in performance by increasing size of training datasets and extracting more features from the unstructured log messages.
机译:计算机系统和现代计算环境中的应用程序生成的非结构化消息的指数增长在管理和使用消息中包含的信息来构成重大挑战。虽然这些数据包含了丰富的信息,其可用于高级威胁检测,但数据的庞大卷,品种和复杂性使得甚至通过训练有素的安全分析师甚至难以分析它们。虽然传统的安全信息和事件管理(SIEM)系统提供了一些收集,相关性和检测结构性消息的某些事件的能力,但其规则的相关性和检测算法在利用非结构化消息中的信息来缩短。我们的研究探讨了利用数据挖掘,文本分类,自然语言处理和机器学习的技术来通过从分布式非同一系统中收集的各种非结构化日志消息中提取相关信息来检测安全威胁的可能性。提取的特征用于在分组清除House Skaion 2006 IARPA数据集上运行多个实验,并评估其预测能力。与没有特征提取的基本情况相比,使用提取的特征平均仅实现了16.73%的性能增益和84%的时间减少,并且使用非结构化的自由文本消息和提取的功能实现了23.48%的性能增益。结果还显示出通过增加训练数据集的大小并从非结构化日志消息提取更多功能来进一步增加性能的强劲潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号