Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators

Pietro Liguori; Cristina Improta; Roberto NatellaBojan CukicDomenico Cotroneo

首页> 外文期刊>Expert Systems with Application >Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators

【24h】

Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators

机译：Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses output similarity metrics, i.e., automatic metrics that compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This work analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations.

著录项

来源
《Expert Systems with Application》 |2023年第9期|120073.1-120073.12|共12页
作者
Pietro Liguori; Cristina Improta; Roberto NatellaBojan CukicDomenico Cotroneo;
展开▼
作者单位

University of Naples Federico Ⅱ, Naples, Italy;

University of North Carolina at Charlotte, NC, United States of America;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
AI-based code generators; Offensive code; Neural machine translation; Software security; Output similarity metrics;

Who evaluates the evaluators? On automatic metrics for assessing AI-based offensive code generators

摘要

著录项

相关主题

期刊订阅