首页> 外文学位 >Search problems for speech and audio sequences.
【24h】

Search problems for speech and audio sequences.

机译:语音和音频序列的搜索问题。

获取原文
获取原文并翻译 | 示例

摘要

The modern proliferation of very large audio and video databases has created a need for effective methods of indexing and searching highly variable or uncertain data. Classical search and indexing algorithms deal with clean input sequences. However, an index created from speech or music transcriptions is marked with errors and uncertainties stemming from the use of imperfect statistical models in the transcription process. This thesis presents novel algorithms, analyses, and general techniques and tools for effective indexing and search that not only tolerate but exploit this uncertainty.;We have devised a new music identification technique in which each song is represented by a distinct sequence of music sounds, called "music phonemes." We learn the set of music phonemes, as well as a unique sequence of music phonemes characterizing each song, using an unsupervised algorithm. We also create a compact mapping of music phoneme sequences to songs. Using these techniques, we construct an efficient and robust large-scale music identification system.;We have further designed new algorithms for compact indexing of uncertain inputs based on suffix and factor automata and given novel theoretical guarantees for their space requirements. We show that the suffix automaton or factor automaton of a set of strings U has at most 2Q - 2 states, where Q is the number of nodes of a prefix-tree representing the strings in U. We also describe matching new linear-time algorithms for constructing the suffix automaton S or factor automaton F of U in time O(|S|).;We have also defined a new quality measure for topic segmentation systems and designed a discriminative topic segmentation algorithm for speech inputs. The new quality measure improves on previously used criteria and is correlated with human judgment of topic-coherence. Our segmentation algorithm uses a novel general topical similarity score based on word co-occurrences. This new algorithm outperforms previous methods in experiments over speech and text streams. We further demonstrate that the performance of segmentation algorithms can be improved by using a lattice of competing hypotheses over the speech stream rather than just the one-best hypothesis as input.
机译:巨大的音频和视频数据库的现代发展,产生了对索引和搜索高度可变或不确定数据的有效方法的需求。经典的搜索和索引算法处理干净的输入序列。但是,从语音或音乐转录创建的索引标记有错误和不确定性,这些错误和不确定性是由于转录过程中使用了不完善的统计模型而引起的。本文提出了新颖的算法,分析方法以及用于有效索引和搜索的通用技术和工具,这些算法不仅可以容忍而且可以利用这种不确定性。我们设计了一种新的音乐识别技术,其中每首歌曲都由不同的音乐声音序列表示,称为“音乐音素”。我们使用无监督算法学习一组音乐音素,以及表征每首歌曲的独特音乐音素序列。我们还创建了音乐音素序列到歌曲的紧凑映射。使用这些技术,我们构建了一个高效而强大的大型音乐识别系统。我们进一步设计了基于后缀和因子自动机的不确定输入紧凑索引的新算法,并为其空间需求提供了新颖的理论保证。我们显示了一组字符串U的后缀自动机或因子自动机最多具有2Q-2状态,其中Q是表示U中字符串的前缀树的节点数。我们还描述了匹配新的线性时间算法在时间O(| S |)中构造U的后缀自动机S或因子自动机F。我们还为主题细分系统定义了一种新的质量度量,并为语音输入设计了区分性主题细分算法。新的质量度量改进了以前使用的标准,并且与人类对主题一致性的判断相关。我们的细分算法使用了基于单词共现的新颖通用主题相似度评分。在语音和文本流的实验中,该新算法优于以前的方法。我们进一步证明,分割算法的性能可以通过在语音流上使用竞争假设的格子而不是仅将最佳假设作为输入来提高。

著录项

  • 作者

    Weinstein, Eugene.;

  • 作者单位

    New York University.;

  • 授予单位 New York University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 174 p.
  • 总页数 174
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:38:26

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号