Search
Close this search box.

Martins A.F.T., Smith N.A., Xing E.P., Aguiar P.M.Q., Figueiredo M.A.T.

Journal of Machine Learning Research
2009

pp 935

-
975

Abstract:

Positive definite kernels on probability measures have been recently applied to classification prob- lems involving text, images, and other types of structured data. Some of these kernels are related to classic information theoretic quantities, such as (Shan non’s) mutual information and the Jensen- Shannon (JS) divergence. Meanwhile, there have been recent advances in nonextensive gener- alizations of Shannon’s information theory. This paper bri dges these two trends by introducing nonextensive information theoretic kernels on probability measures, based on new JS-type diver- gences. These new divergences result from extending the the two building blocks of the classical JS divergence: convexity and Shannon’s entropy. The notion of convexity is extended to the wider concept of q-convexity, for which we prove a Jensen q-inequality. Based on this inequality, we in- troduce Jensen-Tsallis (JT) q-differences, a nonextensive generalization of the JS dive rgence, and define a k-th order JT q-difference between stochastic processes. We then define a n ew family of nonextensive mutual information kernels, which allow weights to be assigned to their arguments, and which includes the Boolean, JS, and linear kernels as particular cases. Nonextensive string kernels are also defined that generalize the p-spectrum kernel. We illustrate the performance of these kernels on text categorization tasks, in which documents are modeled both as bags of words and as sequences of characters.