Sachita Nishal

PhD Candidate, Northwestern University

A Short Primer on Mutual Information

Date Updated: Apr 25, 2022
Reading Time: 2 minutes
305 words

Definitions

Mutual Information as Feature Selection

MI can be used for feature selection, and scikit-learn implementation of MI for both regression and classification problems exists.

MI results can be used to select the K-best features/K-percentile features, and their site gives the explicit instruction that “treating a continuous feature as discrete and vice versa will usually give incorrect results, so be attentive about that.” A comparable method of univariate feature selection (i.e. selection based on univariate statistical tests) is the F-test method - based on a quick linear model for testing the effect of a single regressor, sequentially for many regressors, and using it to get F-stats and p-values for each feature. However, mutual information can capture any kind of dependency between variables whereas the F-test captures only linear dependency.

Sources