The proposed algorithm in this paper is capable of classifying not only unusual speech when people get anger, surprised, or excited but also unusual noise such as clashing, hitting, or clapping in real-time without depending on particular speaker voices or utterances. Also, it does not require a prior learning process to construct acoustic models. This algorithm, therefore, allows a surveillance camera system to effectively monitor quarrel or violent situations regardless of object shields and light conditions. To realize our approach, we analyze the variance and change of spectral densities and pitches when unusual speech and noise occur. We then propose new methods (SEBNI, USDF, and UNDF) to classify unusual sounds in real-time. Moreover, to improve performance, we apply a noise suppression system based on MMSE-STSA and a statistic model-based VAD to our algorithm in order to extract reliable voice features and segment only voice-related periods in noisy environments. We confirm that our proposed method achieves an 87% accuracy performance for classifying unusual speech.
展开▼