In classification tasks applicable to data that exhibit sequential output statistics, a classifier may be trained in an unsupervised manner based on a sequence of input samples and an unaligned sequence of output labels, using a cost function that measures the negative cross-entropy of an N-gram joint probability distribution derived from the sequence of output labels with respect to an expected N-gram frequency in a second sequence of output labels predicted by the classifier. In some embodiments, a primal-dual reformulation of the cost function is employed to facilitate optimization.
展开▼