A method and system processes utterances that are acquired either from an automatic speech recognition (ASR) system or text. The utterances have associated identities of each party, such as role A utterances and role B utterances. The information corresponding to utterances, such as word sequence and identity, are converted to features. Each feature is received in an input layer of a neural network(NN). A dimensionality of each feature is reduced, in a projection layer of the NN, to produce a reduced dimensional feature. The reduced dimensional feature is processed to provide probabilities of labels for the utterances.
展开▼