I’m a Bayesian now. But there are caveats.

I originally picked up this book because of Jaynes’ connection to the formulation of the maximum entropy principle (maxent), and my love for all things information theory, along with the fact that before his passing, Jaynes was a professor at Washington University.

The basic ideas behind the work is that probability theory can be expressed as a generalization to Aristotelean (binary) logic, and that everyone should be Bayesians.

The first component seems pretty natural to me, but Jaynes goes through much pain in the first 5 chapters to build up why this makes sense as well as what this approach buys you. I think its one of those things where if you are not truly a member of the field, you can’t quite understand what the fuss is about, but you can tell it is a matter of great controversy within the community (this can be said about the topics discussed in a good deal of the book).

As someone who never really loved the Bayesian approach I was eventually convinced by Jaynes, who painstakingly goes through the process of dismantling the frequentist (what he calls the “orthodox”) approach continuously over the course of about 600 pages.

The two major arguments he gives against the frequentist approach is that both: The Bayesian approach is a sort of superset of the frequentist (that is, with certain priors you can always get exactly the estimates the frequentist gets, but you can also get others, which is more powerful), and that the frequentist uses a bag of ad-hoc tricks which sometimes gives strange answers. Treating probability theory as an extension of logic tells us what does and does not make sense to do, which leaves the user with a separate bag of tricks that is cohesive and defensible.

So now I’m drinking the kool-aid, but I don’t think I will actually be espousing any of the ideas here in work I do any time soon.

Why? Because Jaynes basically says if you have enough samples (the magic number 30 comes up), and are using any prior that is pretty smooth and broad around the ML region, the Bayesian and frequentist methods will give you pretty much the same answer.

In any of the problems I’m interested, I want very general solutions, so if I were using the Bayesian approach I would be using either maxent with almost no constraints or some other very broad prior (such as Cauchy) anyway, so I wouldn’t really be buying very much at all; a good deal more complexity for basically the same results, since it is generally easier to do the math to find the frequentist answer than for the Bayesian.

Jaynes himself was a Physicist, and therefore had very useful priors to include in his work (we know how gravity works, so it would be foolish to ignore it if we are trying to solve a particular problem that involves gravity). He admits that many of the most promient frequentists (such as Fisher) were studying other fields such as biology, where good priors were difficult to come by (especially in the more distant past), so even if you were a Bayesian, you wouldn’t really know what to do.

In the end though, the usefulness of this book will come down to what I recently learned was the difference between statisticians and people who study machine learning: the former want to spend a lot of time coming to a correct answer to a very particular problem, while the latter want to spend their time coming up with a solution that is probably less accurate, but applicable to a wide number of problems. If you think of yourself as the former, this book is probably useful, but if not there is probably not to be leveraged.

To put it another way, if you want to be sold on the Bayesian approach, this is a good book. If you want to learn tools to solve a particular problem and have time to crank on the math, this book will give you a good set of tools to do that. A warning though, is that a good deal of the math was way over my head, so I more read *through *the book than read it.

thanks for sharing! that’s a very interesting story.