The GA found something. We don't trust it yet.

First contact with real market data. 60.3% accuracy, unanimous convergence on a regime engine, and one very suspicious 'consensus' number.

Last week we shipped the dumbest possible genetic algorithm. This week we fed it actual market data.

It came back with a structure.

Result:           60.3% backtest accuracy
Convergence:      10/10 survivors agree
Weight structure: Negative oil, negative VIX
Assessment:       Interesting. Not trustworthy.

Before interpreting any of that, we needed to answer a more basic question:

Why are we allowed to listen to this system at all?

First: proving it can recover signal

Before touching markets, we generated synthetic data with a hidden rule:

direction = sign(gold × -0.3 + oil × -0.5 + sentiment × 0.7 + vix × -0.6 + 0.1)

Then we let the GA search blindly.

Results:

Hidden:     gold=-0.3  oil=-0.5  sentiment=0.7  vix=-0.6  threshold=0.1
Discovered: gold=-0.51 oil=-0.53 sentiment=0.73 vix=-0.72 threshold=0.22

96.7% accuracy. Converged in ~20 generations. Same seed, same result.

The weights aren’t exact, but they don’t need to be. On finite data, multiple configurations produce equivalent fitness. What matters is that the representation can express the rule, and evolution can find it.

This isn’t a warm-up. It’s the license. Now when it tells us something about real markets, we know it’s not just hallucinating patterns into noise.

What it found in real markets

Signals:  Gold, Oil, VIX, Headline Sentiment, Wikipedia "Recession" traffic
Target:   S&P 500 daily direction
Window:   63 trading days
Result:   60.3% accuracy

Ten points above random. Interesting, but not the point. The real artifact is the structure it converged on.

The convergence pattern: it’s a regime engine

All survivors landed on the same shape: negative oil weight, negative VIX weight.

Initially, we thought the VIX sign was a paradox. It isn’t. Because we’re normalizing VIX against a fixed long-term anchor (20), a negative weight means exactly what you’d expect: high volatility regimes push the total score down, making a “DOWN” prediction more likely.

The real surprise isn’t the sign. It’s the dominance.

Out of five signals, evolution very quickly decided that the market, in this window, can mostly be explained as a function of volatility regime and energy pressure. Gold, headlines, and attention proxies were present but subordinate.

This tells us one of three things:

Regime vs. delta. VIX is anchored (absolute state), while gold and oil are windowed (relative movement). The GA is currently “voting” for the absolute regime over relative price action.
Feature noise. Other features (like Wikipedia attention) might be too noisy to compete with the clean signal of the VIX.
The energy gravity. In this specific 63-day window, the S&P 500 was an “energy” story, not a “sentiment” story.

The system generated a hypothesis we did not start with. That is exactly what it’s supposed to do.

The fake consensus problem

A live run produced:

CONSENSUS: ▲ UP (10/10 models agree)

This is suspicious.

If evolution drove all survivors to near-identical weights, we don’t have ten opinions. We have one opinion repeated. This is “population collapse.” Before “consensus” can mean confidence, we need to measure structural diversity. Until then, agreement is an artifact, not evidence.

Three ways this dies

The next phase is not “improve accuracy.” It’s “subject the discovered structure to hostile conditions.”

Out-of-sample. Train early, test late.
Cross-window. Train on different regimes (2021 vs 2024).
Calibration. Does unanimity actually correlate with correctness?

What this already establishes

The GA reliably externalizes patterns. Accuracy is secondary to structure. Most importantly: this machine is already doing the one thing it was built to do — turning market behavior into inspectable objects we can argue with.

Technical appendix: prediction logic

For those following the Elixir implementation:

# Genome representation
defmodule JLMoney.Evolution.Genome do
  defstruct [:weights, :threshold, :fitness, :lineage]
end

# Prediction function: negative weights on VIX/Oil lower the score.
def predict(%Genome{weights: w, threshold: t}, signals) do
  score =
    Enum.zip(w, signals)
    |> Enum.map(fn {weight, signal} -> weight * signal end)
    |> Enum.sum()

  if score > t, do: :up, else: :down
end

Initialization: Rule 30 CA generates deterministic bit patterns for the starting population. Selection: tournament (k=3), single-point crossover, Gaussian mutation (σ=0.1).

Next step: we’re going to force the GA to use a rolling-window VIX (Delta-VIX) to see if it can still find the signal without the “regime” crutch.

← All Lab Notes