Why training takes millions of examples.
Disorder is not chaos. It is the most probable state.
In Zone 01, Q5, you compared entropy in a black hole and entropy in an AI. The same equation governed both. Here is what that equation actually says.
Boltzmann proved in 1877 that disorder is not chaos. It is the most probable state.
A gas spreads not because it wants to. It spreads because there are vastly more spread-out arrangements than concentrated ones. The math is just counting.
AI training works by the same principle in reverse: searching a high-dimensional landscape for the rare arrangement of weights that best explains the data. Gradient descent is how we find the 1 in a trillion without trying all the others.
Boltzmann spent his career trying to prove that atoms exist. The scientific establishment — Mach, Ostwald, and others — publicly dismissed his work as metaphysics for thirty years.
He proved that entropy is a statistical property of systems with many parts: disorder dominates not because of force, but because there are vastly more disordered arrangements than ordered ones.
He died by suicide in 1906 while on holiday in Trieste. Months later, experiments confirmed he had been right about everything. Einstein later called him one of the most important scientists who ever lived.
Training an AI is finding order in a high-dimensional probability landscape. Boltzmann described the obstacle in 1877. Gradient descent is the solution he didn't live to see.