首页> 外文会议>IEEE International Conference on Software Maintenance and Evolution >Semantics-Aware Machine Learning for Function Recognition in Binary Code
【24h】

Semantics-Aware Machine Learning for Function Recognition in Binary Code

机译:二进制代码中功能识别的语义感知机器学习

获取原文

摘要

Function recognition in program binaries serves as the foundation for many binary instrumentation and analysis tasks. However, as binaries are usually stripped before distribution, function information is indeed absent in most binaries. By far, identifying functions in stripped binaries remains a challenge. Recent research work proposes to recognize functions in binary code through machine learning techniques. The recognition model, including typical function entry point patterns, is automatically constructed through learning. However, we observed that as previous work only leverages syntax-level features to train the model, binary obfuscation techniques can undermine the pre-learned models in real-world usage scenarios. In this paper, we propose FID, a semantics-based method to recognize functions in stripped binaries. We leverage symbolic execution to generate semantic information and learn the function recognition model through well-performing machine learning techniques.FID extracts semantic information from binary code and, therefore, is effectively adapted to different compilers and optimizations. Moreover, we also demonstrate that FID has high recognition accuracy on binaries transformed by widely-used obfuscation techniques. We evaluate FID with over four thousand test cases. Our evaluation shows that FID is comparable with previous work on normal binaries and it notably outperforms existing tools on obfuscated code.
机译:程序二进制文件中的功能识别是许多二进制仪器和分析任务的基础。但是,由于通常在分发前剥离二进制文件,在大多数二进制文件中确实不存在函数信息。到目前为止,识别剥离二进制文件的功能仍然是一个挑战。最近的研究工作建议通过机器学习技术识别二进制代码的功能。识别模型包括典型的函数入口点模式,通过学习自动构建。但是,我们观察到,作为以前的工作只利用语法级别来训练模型,二进制混淆技术可以破坏现实世界使用场景中预先学习的模型。在本文中,我们提出了一种基于语义的方法来识别剥离二进制文件的方法。我们利用符号执行来生成语义信息,并通过良好的计算机学习技术来学习功能识别模型.FID从二进制代码中提取语义信息,因此,有效地适应不同的编译器和优化。此外,我们还表明FID在广泛使用的混淆技术转化的二进制中具有高识别准确性。我们评估了超过四千个测试用例的FID。我们的评估表明,FID与以前的正常二进制文件的工作相当,并且略高于现有的替补代码上的现有工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号