Can a rational agent learn about a domain from data starting in a condition of ignorance?
Addressing this question is important because it is often desirable to start analyzing a problem with very weak initial assumptions, thus trying to implement an objective-minded approach to learning and inference.
This question has been most thoroughly investigated so far within the well-established Bayesian learning theory. In this case, ignorance is modeled by a so-called noninformative prior distribution (e.g., a flat one); data are summarized by a likelihood model. Posterior inferences (e.g., predictions, hypothesis testing, etc.) are obtained from the prior and the likelihood by Bayes' rule. However, the use of a single - albeit noninformative - prior as a model of ignorance has been criticized by several authors.
A more compelling alternative, in our opinion, relies on using a set of priors that generate vacuous lower and upper expectations: that is, an expectation interval whose extreme points coincide with the infimum and the supremum values a random variable can take. This expresses very faithfully one's ignorance about a random variable. These models have been first suggested by Walley with the name of "near-ignorance priors". They have been successfully applied to classification problems: by naturally leading to indeterminate (i.e., set-valued) predictions, when the information from data is not enough to draw stronger conclusions, they originate credible and reliable models. However, we have shown that in some cases near-ignorance originates the problem to make learning from data impossible: the set of posteriors may coincide again with the set of priors. This amounts to making data useless, which is clearly a critical issue.
To tackle this problem, we have established some general principles that should govern a well-defined learning process based on near-ignorance, and have specialized them for the case of the one-parameter exponential family. Then we have proven that there is a unique set of priors that complies with these principles, and hence that guarantees both near-ignorance and learning to be attained. We have also shown that the obtained set of priors reduces to other already known models of prior near-ignorance and thus, indirectly, proven the uniqueness of these models. Moreover, we have derived new models of near-ignorance that were not available before. We regard these results as a basic stepping stone to properly answer the initial question. Such seminal work should now be deepened and extended to fully tackle the question of learning under near-ignorance, and to make it closer to applications. We plan to do this according to the following tasks:
- Extension to the multivariate exponential family: this extension is fundamental as the most important applications are based on it; this is for instance the case of regression analysis.
- Linear regression: to employ the multivariate model to develop new robust linear regression algorithms. Linear regression is an important area of data analysis, where near-ignorance can provide a leading approach for reliable inference.
- Metrics: to develop new performance metrics to compare a set-valued estimate/prediction with a single-valued one. In fact, while standard estimators yield determinate (or single-valued) outcomes, those based on near-ignorance lead to set-valued ones. The problem of how to fairly compare through a metric a set-valued estimate with a single-valued estimate is thus fundamental to judge the quality of the models in practice.
- Extension to the non-parametric case: this extension is important as non-parametric models are spreading in the context of artificial intelligence, because of the increasing computational power. Modeling near-ignorance with a non-parametric model can allow to build very general models for statistical inference.
- Understanding the ultimate relationship between near-ignorance and learning: we plan to go deeper into these questions, exploiting some key results derived in the context of Bayesian inference for the consistency and convergence of posterior distributions.