Non-Ergodicity and Protein Folding

2024-09-03

Ergodicity and non-ergodicity are concepts that were put on my radar by Mark Spitznagel, head of Universa Investments, a ‘black-swan’ hedge fund — a fund that profits off of ‘improbable,’ ‘unforeseen’ market crashes.

Despite being called a ‘black-swan’ fund, Spitznagel has firm Austrian roots and views the market cycle as an artifact of credit expansion, which causes measurable distortions from the true value of the market. Therefore, the crashes are expected and foreseen. There is no ‘black-swan’ event, as the crash is inevitable, not improbable.

Ergodicity relates to the behavior of an individual versus the behavior of a population.

The simplest example to illustrate ergodicity is a game of ‘infinite golf.’

Imagine you have a ball on a windy, empty field, and it rolls forever, pushed by random wind. Sometimes the ball is on the left side of the field, sometimes on the right side. If you watch it forever, the average position of the ball is in the middle of the field, because it moves randomly. If you measure the position of one million balls that randomly moved around for some finite amount of time, the average position of them is also in the middle of the field.

So, the behavior of one ball (when measuring position) is equivalent to the behavior of a large population of balls. The average position is ‘ergodic.’

Now, imagine we put a hole in the field on the left side. If a ball rolls around forever, it will eventually go into the hole, and it will never leave. So, the average position of a ball over infinite time is inside of the hole. Now, let’s consider the average position of infinite balls at a finite time. There will always be some balls that are not yet stuck in the hole, so the average position of this population will be somewhere between the hole and the middle of the field.

Thus, once you add a hole to the field, the average behavior of one ball over time differs from the average behavior of a large population of balls. The average position, with a hole, is ‘non-ergodic.’

There's also a concept called 'observational non-ergodicity,' which describes a type of non-ergodic behavior observed over finite timescales.

You could imagine a scenario in which a ball will get stuck in the hole for 10 hours, but then will spontaneously jump out. If you measure the position of one ball over just 9 hours, and it immediately rolls into the hole after measurements start, the average position of the ball will be inside of the hole. If you measure the position of 1000 balls, the average position will be outside of the hole. So, over 9 hours, the ball’s behavior is ‘observationally non-ergodic.’ If you measured this system forever, because there is no permanent irreversibility and the behavior of all the balls eventually is the same, the average position would be ergodic.

Irreversibility over human-relevant timescales is a source of non-ergodicity in life.

This matters a lot in financial markets and is a core thesis of Spitznagel’s Universa. The amount of wealth you hold at any point is essentially a geometric series. You multiply the wealth at a previous point by a value that represents your returns, and you get a new wealth. This makes it so that losses hurt more than gains help: if you have a 50% drawdown, you need 200% returns to get back to your original wealth. If you have a 100% drawdown, you cannot get back to your original wealth with any multiplication of your current portfolio. Hence, there is an irreversibility of monetary loss.

The ‘expected value’ of a return in finance averages out a ‘population’ of returns.

That is, if a billion people start with $100, and all but one lose all their money, but one person accumulates $1 trillion, the ‘expected value’ of the trade is $1000, or a 10x return ($1 trillion / $1 billion = $1000, $100x10). It isn’t a good trade to take, though, as a human, because in all likelihood you could take that trade every minute for the rest of your life, only lose money, run out of money, and then die.

In our golf analogy, this is like saying “I bet my ball will go to the right,” while there’s a big hole on the left. After we run our golf simulation, one billion balls go into a hole one inch to the left, and one escapes 15 million miles to the right. On average, the balls are on the right, not the left, since the rightmost ball is very far away. It seems like a good bet if you only look at the expectation value, but in practice, you will never win the bet.

Instead, it’s better to look at the ‘time average,’ or the behavior of one member.

In practice, you simulate the golf game and see that they all go into the left hole, so you don't bet that they will go right. Or, you put a big hill in front of the hole so that the balls never go far enough left for you to lose. In finance, you can ensure there’s zero ‘risk of ruin,’ or you can hedge properly so large drawdowns never create financial irreversibility.

More practically, using a geometric mean of wealth outcomes rather than an arithmetic mean is better financial practice. This tells what you should expect *yourself,* one member of the population, to end up with in wealth, since wealth is a geometric series. If most of your wealth outcomes go to 0 with a given strategy, a geometric mean would tell you your expected wealth outcome is 0 and ignore outliers. Using a geometric mean, if the drawdowns are extreme for most of your possible wealth outcomes, outliers where you gain a lot will not matter and will not drive up the mean inappropriately. You live in only one reality, and you want that reality to be as good as possible, so you must elevate the outcomes of all possible realities.

(This is a parallel of the math Spitznagel outlined in his book, Safe Haven. Luca Dellanna has a good book on ergodicity as well. )

Finance is not the only source of non-ergodicity.

Interestingly, proteins can fold in an observationally non-ergodic fashion on human-relevant timescales.

As a brief intro to proteins, proteins are the machinery of your cell: they perform chemical reactions, form structures such as the cytoskeleton, contract or expand to make the organism move, and perform myriad other actions. Their action depends on their folding.

You can estimate the time proteins spend in different folding states by a technique called single-molecule FRET (fluorescence resonance energy transfer). Essentially, there are two fluorescent molecules attached to different points on a protein. Energy can be transferred between the fluorescent molecules at different efficiencies depending on their distance, making them act as a “molecular ruler.” Each folding state has a unique ‘FRET’ distance, so you can determine how long a protein spends in a specific folding state.

This has been done multiple times on a single-molecule level (outside of cells). In these experiments, proteins can sometimes spend entire days in specific folding states, and they can often adopt dozens of different folding configurations. When using FRET efficiency (which indicates folding state) to evaluate ergodicity, proteins exhibit observational non-ergodicity over periods of days.

In practical terms, an enzyme can spontaneously inactivate itself for days, before randomly ‘reawakening,’ while a large sample of the protein population remains active.

Some proteins exist at low concentrations inside of cells. If you consider the ‘population’ of all of the proteins in a healthy body, most proteins probably have a functional folding configuration. If there are just a few of a specific protein in one cell, though, observational non-ergodicity can win out on short timescales, and all of this specific protein inside of one cell can suddenly inactivate themselves.

Imagine the case of a tumor-suppressing protein that activates a program to kill the cell whenever it becomes cancerous. If a cell is unlucky, the few proteins, even if nothing ‘goes wrong,’ could enter into an inactive folding state. The cell cycle time of some of the fastest growing tumors can be around 10 hours. With nothing wrong in the cancer cell, the tumor suppressor protein can then spontaneously fail for long enough for the cancer to double.

Now, this model ignores ‘active’ control of protein folding state. The systems that measure the non-ergodic folding states of proteins are usually dead systems where a protein is tethered to a surface so you can track the single molecule easily. Gilbert Ling had a model of the cell where ATP was essentially a global allosteric regulator. That is — it binds to most proteins and forcefully changes the folding state. He envisioned two states: a relaxed, resting state (with ATP) and a folded, active state (without ATP). I think there are clearly more folding states than these two, but it still serves as a visionary model, as most scientists don’t consider that a protein can ‘rest,’ and think that abstractions such as resting and active usually only exist on a cellular level.

Many think that protein folding is ‘solved’ with AlphaFold. AlphaFold is a great achievement, but the whole field of protein folding is extremely divorced from reality and has many problems. I’d like to take a side tangent and list what I see as wrong with the field of protein folding below:

1. ‘Protein folding,’ including AlphaFold, is based on X-ray crystallography, which definitively measures a protein folding state that is different from how proteins are folded inside of the cell. There is a technique called ‘circular dichroism’ which can measure the secondary structure (think, small, simple folding structures, such as alpha helixes or beta sheets) of protein solutions in real time. You need a protein to crystallize to measure the protein structure in high resolution with X-ray crystallography. As a protein crystallizes, the circular dichroism trace changes. This means that protein must alter its folding to crystallize, indicating that the crystal structure doesn't accurately represent the protein's folding state in solution.

2. Solution NMR can show protein structures of proteins in solution, so it is better than X-ray crystallography, but these are extremely non-native solutions. This irks me the most about protein science. The inside of the cell is high in potassium and low in sodium. It is full of ATP, hormones, and other ligands for proteins. It is crowded, with only 3 layers of water between adjacent biomolecules on average. It is a fairly oxidized environment, having a high NAD+/NADH ratio. A protein scientist often just puts a protein, even if intracellular, in dilute ‘PBS’ with a reducing agent. This is exactly the opposite of a cell: high in sodium, low in potassium, not crowded, and reduced, not oxidized. It would be very hard to measure NMR of intracellular proteins to get true folding states. In my time in drug discovery, I’ve seen someone with a drug that binds very weakly to their protein of interest per standard biophysical methods. They were sad and having a hard time justifying to investors that they had a good drug for their protein target. However, their protein was intracellular, and they ran their assay in a saline solution, so the potassium concentration was non-physiological. I told them to increase the potassium concentration and drop the sodium, and their drug increased in binding strength by 1000-fold. It’s very clear that the solution a protein exists in has a strong influence on how it behaves, and NMR cannot capture proteins in their native environment. Membrane proteins, for example, are also often measured with these techniques in dead lipid bilayers. A physiological membrane has a different solvent on either side of it (rotationally-hindered water intracellularly, more bulk-like water extracellularly), and a dead lipid bilayer preparation has the same solvent on either side. Furthermore, there may be some higher order non-lamellar structures that lipids in biological membranes can form that are not replicated with dead membrane preparations.

3. Simulations of protein structure are very bad for a few reasons. I talked to a leading scientist in the world of ‘intrinsically disorder proteins,’ which are proteins that do not fold, and he told me about the history of protein folding simulations. Originally, proteins would be put into simulations with physics that we think is true, and they would not fold. So, with our prior knowledge that proteins ‘should fold,’ we changed the physics until they did fold. Later, once non-folding proteins were discovered, once you put these non-folding proteins into the simulations, they would fold — even though they never would in the real world. So, we changed the physics again to accommodate the disordered proteins. Clearly, the physics is adjusted in post to affirm biases, rather than agnostically generated to make accurate predictions. Unwillingness to simulate a lot of water molecules and ignorance around the inductive effects of adsorption of ligands onto proteins are a couple more reasons why simulations are bad, but those are topics for another day.

Protein folding as a field measures proteins in a dead, dilute, de-energized, non-physiological state, and that’s the data AlphaFold is trained on. Attempts to create useful applications based on 'protein structure' often overlook that protein folding is non-ergodic. This non-ergodicity means the working structure is merely one example of a dead, dilute, de-energized, non-physiological state, which proteins can spontaneously avoid.

The proteins in the cancer cell you’re so dearly trying to target could just decide to not respond to your drug by folding into ‘resistant’ protein states, not representative of the rest of the protein population.

In fact, that will be the cancer cell that survives, and the drug you designed will fail every time. Cancers can, and will, use observationally non-ergodic protein folding to escape death.

It doesn’t seem likely that a small molecule binding to a specific protein will serve as a cure to cancer. The 'risk of ruin' is too high. I think we need to hack the operating system, not the apps.

- anabology