Ranking algorithms for digital forensic string search hits

Nicole Lang Beebe; Lishu Liu

首页> 外文期刊>Digital investigation >Ranking algorithms for digital forensic string search hits

【24h】

Ranking algorithms for digital forensic string search hits

机译：数字取证字符串搜索命中的排名算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This research proposes eighteen quantifiable characteristics of allocated files, unallocated clusters, and string search hits contained therein, which can be used to relevancy rank string search output. We executed a 36-term query across four disks in a synthetic case ("M57 Patents" from DigitalCorpora.org), which produced over two million search hits across nearly 50,000 allocated files and unallocated clusters. We sampled 21,400 search hits from the case, extracted the proposed feature values, trained binary class (relevant/ not-relevant) support vector machine (SVM) models, derived two relevancy ranking functions from the resultant model feature weights, and empirically tested the ranking algorithms. We achieved 81.02% and 85.97% prediction accuracies for the allocated and unallocated models, respectively. Further research is needed to validate these algorithms in a broader set of real-world cases, and/or adapt the algorithms to improve their robustness. Nonetheless, this research provides an important starting point for research into digital forensic search hit relevancy ranking algorithms. We proposed an initial set of relevancy ranking features and obtained very promising empirical results. The ability to achieve rank-ordered list output for search queries in digital forensics, similar to what web browsing and digital library users enjoy, is extremely important for digital forensic practitioners to reduce the analytical burden of text string searching - a valuable analytical technique.

机译：这项研究提出了分配文件，未分配簇和其中包含的字符串搜索命中值的18个可量化特征，这些特征可用于关联排名字符串搜索输出。我们在一个合成案例（DigitalCorpora.org中的“ M57专利”）中对四个磁盘执行了36项查询，该查询在将近50,000个分配的文件和未分配的集群中产生了超过200万个搜索命中。我们从该案例中抽取了21,400个搜索结果，提取了建议的特征值，训练了二元分类（相关/不相关）支持向量机（SVM）模型，从所得模型特征权重中得出了两个相关性排名函数，并通过经验测试了排名算法。对于已分配模型和未分配模型，我们分别实现了81.02％和85.97％的预测准确度。需要进一步研究以在更广泛的实际案例中验证这些算法，和/或对算法进行调整以提高其鲁棒性。尽管如此，这项研究为数字取证搜索命中相关性排名算法的研究提供了重要的起点。我们提出了一组初始的相关性排名特征，并获得了非常有希望的经验结果。与网络浏览和数字图书馆用户喜欢的功能类似，在数字取证中实现搜索查询的排序列表输出的功能，对于数字取证从业者减少文本字符串搜索的分析负担（一种有价值的分析技术）极为重要。

著录项

来源
《Digital investigation》 |2014年第8期|S124-S132|共9页
作者
Nicole Lang Beebe; Lishu Liu;
展开▼
作者单位

The University of Texas at San Antonio, Department of Information Systems and Cyber Security, San Antonio, TX, USA;

The University of Texas at San Antonio, Department of Information Systems and Cyber Security, San Antonio, TX, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Digital forensics; String search; Ranking; Ranked list; Relevancy;

机译：数字取证;字符串搜索;排行;排名列表;关联;

相似文献

外文文献
中文文献
专利

1. Clustering digital forensic string search output [J] . Nicole L. Beebe, Lishu Liu Digital investigation . 2014,第4期

机译：聚类数字取证字符串搜索输出
2. Post-retrieval search hit clustering to improve information retrieval effectiveness: Two digital forensics case studies [J] . Nicole Lang Beebe, Jan Guynes Clark, Glenn B. Dietrich, Decision support systems . 2011,第4期

机译：检索后搜索命中聚类以提高信息检索效率：两个数字取证案例研究
3. A Link-click-concept Based Ranking Algorithm for Ranking Search Results [J] . S. Geetha Rani, M. Sorana Mageswari Indian Journal of Science and Technology . 2014,第10期

机译：基于链接点击概念的搜索结果排名算法
4. A Term Distribution Visualization Approach to Digital Forensic String Search [C] . Moses Schwartz, L.M. Liebrock Visualization for Computer Security . 2008

机译：数字取证字符串搜索的术语分布可视化方法
5. Improving information retrieval effectiveness in digital forensic text string searches: Clustering search results using self-organizing neural networks. [D] . Beebe, Nicole L. 2007

机译：在数字取证文本字符串搜索中提高信息检索效率：使用自组织神经网络对搜索结果进行聚类。
6. SAM: String-based sequence search algorithm for mitochondrial DNA database queries [O] . Alexander Röck, Jodi Irwin, Arne Dür, -1

机译：SAM：用于线粒体DNA数据库查询的基于字符串的序列搜索算法
7. A New Ranking Algorithm for Ranking Search Results of Search Engine based on Personalized User Profile [O] . S. Geetha Rani, M. Phil 2014

机译：基于个性化用户档案的搜索引擎搜索结果排序新排序算法

Ranking algorithms for digital forensic string search hits

摘要

著录项

相似文献

相关主题

期刊订阅