...
首页> 外文期刊>Digital investigation >Ranking algorithms for digital forensic string search hits
【24h】

Ranking algorithms for digital forensic string search hits

机译:数字取证字符串搜索命中的排名算法

获取原文
获取原文并翻译 | 示例
           

摘要

This research proposes eighteen quantifiable characteristics of allocated files, unallocated clusters, and string search hits contained therein, which can be used to relevancy rank string search output. We executed a 36-term query across four disks in a synthetic case ("M57 Patents" from DigitalCorpora.org), which produced over two million search hits across nearly 50,000 allocated files and unallocated clusters. We sampled 21,400 search hits from the case, extracted the proposed feature values, trained binary class (relevant/ not-relevant) support vector machine (SVM) models, derived two relevancy ranking functions from the resultant model feature weights, and empirically tested the ranking algorithms. We achieved 81.02% and 85.97% prediction accuracies for the allocated and unallocated models, respectively. Further research is needed to validate these algorithms in a broader set of real-world cases, and/or adapt the algorithms to improve their robustness. Nonetheless, this research provides an important starting point for research into digital forensic search hit relevancy ranking algorithms. We proposed an initial set of relevancy ranking features and obtained very promising empirical results. The ability to achieve rank-ordered list output for search queries in digital forensics, similar to what web browsing and digital library users enjoy, is extremely important for digital forensic practitioners to reduce the analytical burden of text string searching - a valuable analytical technique.
机译:这项研究提出了分配文件,未分配簇和其中包含的字符串搜索命中值的18个可量化特征,这些特征可用于关联排名字符串搜索输出。我们在一个合成案例(DigitalCorpora.org中的“ M57专利”)中对四个磁盘执行了36项查询,该查询在将近50,000个分配的文件和未分配的集群中产生了超过200万个搜索命中。我们从该案例中抽取了21,400个搜索结果,提取了建议的特征值,训练了二元分类(相关/不相关)支持向量机(SVM)模型,从所得模型特征权重中得出了两个相关性排名函数,并通过经验测试了排名算法。对于已分配模型和未分配模型,我们分别实现了81.02%和85.97%的预测准确度。需要进一步研究以在更广泛的实际案例中验证这些算法,和/或对算法进行调整以提高其鲁棒性。尽管如此,这项研究为数字取证搜索命中相关性排名算法的研究提供了重要的起点。我们提出了一组初始的相关性排名特征,并获得了非常有希望的经验结果。与网络浏览和数字图书馆用户喜欢的功能类似,在数字取证中实现搜索查询的排序列表输出的功能,对于数字取证从业者减少文本字符串搜索的分析负担(一种有价值的分析技术)极为重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号