Cynthia Rudin presents “Simpler Models Exist and How Can We Find Them?” at the AI Seminar (December 4, 2020).

The Artificial Intelligence (AI) Seminar is a weekly meeting at the University of Alberta where researchers interested in AI can share their research. Presenters include both local speakers from the University of Alberta and visitors from other institutions. Topics related in any way to artificial intelligence, from foundational theoretical work to innovative applications of AI techniques to new fields and problems, are explored.

Bio:

Cynthia Rudin is a professor of computer science, electrical and computer engineering, and statistical science at Duke University. Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. Her degrees are from the University at Buffalo and Princeton University. She is a three-time winner of the INFORMS Innovative Applications in Analytics Award. She has served on committees for INFORMS, the National Academies, the American Statistical Association, DARPA, the NIJ, and AAAI. She is a fellow of both the American Statistical Association and Institute of Mathematical Statistics. She was a Thomas Langford Lecturer at Duke University for 2019-2020.

Abstract:

While the trend in machine learning has tended towards more complex hypothesis spaces, it is not clear that this extra complexity is always necessary or helpful for many domains. In particular, models and their predictions are often made easier to understand by adding interpretability constraints. These constraints shrink the hypothesis space; that is, they make the model simpler. Statistical learning theory suggests that generalization may be improved as a result as well. However, adding extra constraints can make optimization (exponentially) harder. For instance, it is much easier in practice to create an accurate neural network than an accurate and sparse decision tree. We address the following question: Can we show that a simple-but-accurate machine learning model might exist for our problem, before actually finding it? If the answer is promising, it would then be worthwhile to solve the harder constrained optimization problem to find such a model. In this talk, I present an easy calculation to check for the possibility of a simpler model. This calculation indicates that simpler-but-accurate models do exist in practice more often than you might think. Time-permitting, I will then briefly overview our progress towards the challenging problem of finding optimal sparse decision trees.

For those interested in interpretable classifiers designed for multi-class imbalance problems, I suggest the following publications:

• L. Cañete-Sifuentes, R. Monroy, M. A. Medina-Pérez, O. Loyola-González, F. Vera Voronisky, "Classification based on multivariate contrast patterns," IEEE Access, vol. 7, pp. 55744-55762, 2019.

• O. Loyola-González, M. A. Medina-Pérez, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, R. Monroy, M. García-Borroto, "PBC4cip: A new contrast pattern-based classifier for class imbalance problems," Knowledge-Based Systems, vol. 115, pp. 100-109, 2017.

The following publication introduces and interpretable clustering algorithm:

• O. Loyola-González, A. E. Gutierrez-Rodrígue, M. A. Medina-Pérez, R. Monroy, J. F. Martínez-Trinidad, J. A. Carrasco-Ochoa, M. García-Borroto, "An explainable Artificial Intelligence model for clustering numerical databases," IEEE Access, vol. 8, pp. 52370-52384, 2020.

The following publications combine interpretable classifiers and visualization techniques:

• O. Loyola-González, M. A. Medina-Pérez, R. A. Coronilla Valdez, K.-K. Raymond Choo, "A contrast pattern-based scientometric study of the QS world university ranking," IEEE Access, vol. 8, pp. 206088-206104, 2020.

• B. Cervantes, F. Gómez, R. Monroy, O. Loyola-González, M. A. Medina-Pérez, J. Ramírez-Márquez, "Pattern-Based and Visual Analytics for Visitor Analysis on Websites," Applied Sciences, vol. 9, no. 18, pp. 3840, 2019.

• O. Loyola-González, A. López-Cuevas, M. A. Medina-Pérez, B. Camiña, R. Monroy, J. E. Ramirez-Marquez, "Fusing Approaches of Pattern Discovery and Visual Analytics on Tweet Propagation," Information Fusion, vol. 46, pp. 91-101, 2019.

The following publication studies the relation among features and the performances of decision tree ensembles:

• B. Cervantes, R. Monroy, M. A. Medina-Pérez, M. González-Mendoza, J. Ramirez-Marquez, "Some features speak loud, but together they all speak louder: A study on the correlation between classification error and feature usage in decision-tree classification ensembles," Engineering Applications of Artificial Intelligence, vol. 67, pp. 270-282, 2018.

The dot product is highly statistical being a summary measure. And actually people forget the variance equation for linear combinations of random variables applies to it. Anyway one way to get smaller nets is to swap around what is adjustable. You use fixed dot products (enacted with fast transforms) and adjustable (parametric) activation functions like fi(x)=ai.x x<0, fi(x)=bi.x x>=0, i=0 to m. The fast Walsh Hadamard transform is nice and O(nlogn). As a practical matter you apply a random fixed pattern of sign flips to the input data to stop the first transform from taking the spectrum and use a final transform as a sort of readout layer. Such nets are highly statistical in nature due to the exact nature of the fixed dot products. Neurons have to work together to produce specific output. Anyway there is a blog with code explaining Fast Transform fixed-filter-bank neural networks.