Search problems for speech and audio sequences.

机译：语音和音频序列的搜索问题。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The modern proliferation of very large audio and video databases has created a need for effective methods of indexing and searching highly variable or uncertain data. Classical search and indexing algorithms deal with clean input sequences. However, an index created from speech or music transcriptions is marked with errors and uncertainties stemming from the use of imperfect statistical models in the transcription process. This thesis presents novel algorithms, analyses, and general techniques and tools for effective indexing and search that not only tolerate but exploit this uncertainty.;We have devised a new music identification technique in which each song is represented by a distinct sequence of music sounds, called "music phonemes." We learn the set of music phonemes, as well as a unique sequence of music phonemes characterizing each song, using an unsupervised algorithm. We also create a compact mapping of music phoneme sequences to songs. Using these techniques, we construct an efficient and robust large-scale music identification system.;We have further designed new algorithms for compact indexing of uncertain inputs based on suffix and factor automata and given novel theoretical guarantees for their space requirements. We show that the suffix automaton or factor automaton of a set of strings U has at most 2Q - 2 states, where Q is the number of nodes of a prefix-tree representing the strings in U. We also describe matching new linear-time algorithms for constructing the suffix automaton S or factor automaton F of U in time O(|S|).;We have also defined a new quality measure for topic segmentation systems and designed a discriminative topic segmentation algorithm for speech inputs. The new quality measure improves on previously used criteria and is correlated with human judgment of topic-coherence. Our segmentation algorithm uses a novel general topical similarity score based on word co-occurrences. This new algorithm outperforms previous methods in experiments over speech and text streams. We further demonstrate that the performance of segmentation algorithms can be improved by using a lattice of competing hypotheses over the speech stream rather than just the one-best hypothesis as input.

机译：巨大的音频和视频数据库的现代发展，产生了对索引和搜索高度可变或不确定数据的有效方法的需求。经典的搜索和索引算法处理干净的输入序列。但是，从语音或音乐转录创建的索引标记有错误和不确定性，这些错误和不确定性是由于转录过程中使用了不完善的统计模型而引起的。本文提出了新颖的算法，分析方法以及用于有效索引和搜索的通用技术和工具，这些算法不仅可以容忍而且可以利用这种不确定性。我们设计了一种新的音乐识别技术，其中每首歌曲都由不同的音乐声音序列表示，称为“音乐音素”。我们使用无监督算法学习一组音乐音素，以及表征每首歌曲的独特音乐音素序列。我们还创建了音乐音素序列到歌曲的紧凑映射。使用这些技术，我们构建了一个高效而强大的大型音乐识别系统。我们进一步设计了基于后缀和因子自动机的不确定输入紧凑索引的新算法，并为其空间需求提供了新颖的理论保证。我们显示了一组字符串U的后缀自动机或因子自动机最多具有2Q-2状态，其中Q是表示U中字符串的前缀树的节点数。我们还描述了匹配新的线性时间算法在时间O（| S |）中构造U的后缀自动机S或因子自动机F。我们还为主题细分系统定义了一种新的质量度量，并为语音输入设计了区分性主题细分算法。新的质量度量改进了以前使用的标准，并且与人类对主题一致性的判断相关。我们的细分算法使用了基于单词共现的新颖通用主题相似度评分。在语音和文本流的实验中，该新算法优于以前的方法。我们进一步证明，分割算法的性能可以通过在语音流上使用竞争假设的格子而不是仅将最佳假设作为输入来提高。

著录项

作者
Weinstein, Eugene.;
展开▼
作者单位

New York University.;

展开▼
授予单位 New York University.;
学科 Computer Science.
学位 Ph.D.
年度 2009
页码 174 p.
总页数 174
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:38:26

相似文献

外文文献
中文文献
专利

1. No, There Is No 150 ms Lead of Visual Speech on Auditory Speech, but a Range of Audiovisual Asynchronies Varying from Small Audio Lead to Large Audio Lag [J] . Jean-Luc Schwartz, Christophe Savariaux PLoS Computational Biology . 2014,第7期

机译：不，听觉语音没有150 ms的视觉语音引导，但是视听异步范围从小音频导致大音频滞后
2. The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech [J] . Michael J. Crosse Edmund C. Lalor Journal of Neurophysiology . 2014,第4期

机译：对于视听语音，语音包络的皮质表示早于音频语音
3. The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech [J] . Michael J. Crosse Edmund C. Lalor Journal of Neurophysiology . 2014,第4期

机译：语音信封的皮质代表性比音频语音更早用于视听演讲
4. Fast Vocabulary-Independent Audio Search Based on Syllable Confusion Network Indexing in Mandarin Spontaneous Speech [C] . Shao, Jian, Zhang, . 2007

机译：基于普通话自发音节混淆网络索引的快速词汇无关音频搜索
5. Analysis of pragmatic speech styles among Korean learners of English: A focus on complaint-apology speech act sequences. [D] . Lee, Jin Sook. 2000

机译：韩国英语学习者语用风格分析：以抱怨道歉言语行为序列为重点。
6. No There Is No 150 ms Lead of Visual Speech on Auditory Speech but a Range of Audiovisual Asynchronies Varying from Small Audio Lead to Large Audio Lag [O] . Jean-Luc Schwartz, Christophe Savariaux 2014

机译：不听觉语音没有150 ms的视觉语音导联但是视听异步范围从小音频导联到大音频滞后
7. No, there is no 150 ms lead of visual speech on auditory speech, but a range of audiovisual asynchronies varying from small audio lead to large audio lag. [O] . Jean-Luc Schwartz, Christophe Savariaux 2014

机译：不，在听觉语音上没有150毫秒的视觉语音引导，但是从小音频变化的一系列视听异步导致大的音频滞后。
8. Preliminary Investigation Into the Impact of Audiovisual Synchronization of Impaired Audiovisual Sequences. [R] . Pinson, M. H., Webster, A., Ingram, W. 2011

机译：视听序列受损视听同步影响的初步研究。

Search problems for speech and audio sequences.

摘要

著录项

相似文献

相关主题

期刊订阅