Building JLMoney: a macro synthesis engine that shows its work

Phase 0. The scaffolding. Why Elixir, why CA-initialized genetic algorithms, and what success looks like before we touch real markets.

We’re building something different in the market analysis space.

Not another black box that says “buy X” with no explanation. Not another LLM wrapper that hallucinates confidence. A system that generates hypotheses, tests them, and explains its reasoning in plain language.

Why we’re building this

We’ve run trading robots since 2019. Sentiment analysis, balance sheet parsing, historical pattern matching, 12-month price projections. They worked — outperformed manual trading by a wide margin.

But we never trusted them enough to automate. Two bad paper trades in Interactive Brokers and we shut off automation entirely. Couldn’t stomach watching a robot lose fake money.

The real problem: they couldn’t explain themselves. Just “this stock, this price, this confidence level.” A black box with good grades isn’t a system you can trust at 6am.

JLMoney is the rebuild. Same analytical ambition, but with visible reasoning. If the system can’t show its work, the system doesn’t ship.

The core bet

Here’s what we believe: we can’t compete on data access against Rentech or Jane Street, and we can’t compete on clever LLM use against 800 million ChatGPT users.

So what’s left?

Different cognitive machinery applied to commodity inputs.

The edge isn’t what you know — it’s how you think about what everyone already knows.

The vision

JLMoney is a macro synthesis engine. It will:

Generate testable market hypotheses using genetic algorithms
Validate those hypotheses against historical data
Explain its reasoning in human-readable narratives
Earn operator trust through interpretability, not mysticism

The target output isn’t “SPY will go up 2.3% tomorrow with 87% confidence.” That’s fortune-telling.

The target is: “Gold and oil moved inversely last week, which historically precedes volatility compression. Here’s why that pattern mattered in 2018, why it might matter now, and what would invalidate this reading.”

Less prophecy. More epistemology.

The technical approach

Why Elixir?

Not JavaScript. Not Python. Elixir.

Fault tolerance matters when you’re orchestrating multiple API calls. The supervision tree model means when Tavily chokes or OpenRouter rate-limits you, the system degrades gracefully instead of exploding.

Runtime characteristics > ecosystem size.

The data strategy

No data lake. No multi-terabyte Parquet files. Just-in-time retrieval.

Tavily for search and context gathering
OpenRouter for LLM orchestration across multiple models
Public market APIs for price data

If you need historical context, you fetch it. If you need current sentiment, you search for it. Storage is expensive and stale. Retrieval is cheap and fresh.

The core innovation: CA-initialized genetic algorithms

Here’s where it gets interesting.

Genetic algorithms can evolve solutions to complex problems, but they’re usually initialized randomly. Random initialization means:

Non-reproducible experiments
No way to explain why you started where you started
Black box on top of black box

Instead, we’re using cellular automata (specifically Rules 30 and 110) to generate initial populations deterministically. Same seed = same starting population = fully reproducible experiments.

The CA generates a bit pattern. We chunk that into weight vectors. Those become the founding population. Evolution takes over from there.

Why this matters:

When a weight vector survives 100 generations and explains market movement with 65% accuracy, we can trace its lineage back to a specific CA seed. We can re-run the experiment. We can mutate the seed and watch how the survivors change. We can explain the methodology to someone who asks “where did this model come from?”

The lineage is inspectable. The experiments are reproducible. The search process becomes part of the model.

The roadmap

Phase 0: Project scaffold (starting now)

Basic Elixir project structure. Supervision tree. Config management. The boring stuff that makes everything else possible.

jlmoney/
├── lib/
│   ├── jlmoney/
│   │   ├── application.ex
│   │   ├── signals/        # Data fetching
│   │   ├── evolution/      # GA + CA
│   │   ├── synthesis/      # LLM orchestration
│   │   └── output/         # Formatting results
├── test/
├── config/
├── mix.exs
└── README.md

Phase 1: The dumbest possible GA

Prove the core loop works. Evolve weight vectors that explain historical S&P 500 movement.

Signals (hardcoded for now):

Gold price (daily close)
Oil price (daily close)
VIX (daily close)
Basic sentiment score (placeholder)

Genome: five floats. That’s it.

[w_gold, w_oil, w_sentiment, w_vix, threshold]

Prediction:

if sum(weights * normalized_signals) > threshold:
  UP
else:
  DOWN

Fitness: percentage of days where prediction matches reality over a 90-day window.

Success criteria:

GA runs and converges
At least one survivor beats random (>50% accuracy)
Same seed produces same results

If this doesn’t work, nothing else matters. If it does work, we have a foundation.

Phase 2: Real signal ingestion

Replace mock data with actual market data. Tavily integration for news sentiment. Yahoo Finance or Alpha Vantage for prices. Historical data loader.

Phase 3: Richer genomes

Only if Phase 1 survivors are interesting but limited.

Explore:

Conditional rules: if gold > X then weight_oil = Y
Multiple thresholds (regime detection)
Time-lagged signals
Decision tree genomes

Constraint: keep interpretability. If we can’t read the survivor, we’ve failed.

Phase 4: LLM synthesis layer

Use LLMs to explain what the GA found.

Take surviving weight vectors
Pull relevant context via Tavily (why did gold move on date X?)
Generate human-readable narrative explaining the model’s logic
Multi-model setup: one for context gathering, one for synthesis, one for critique

This is where commodity LLM access becomes an advantage. We’re not asking AI to generate the insight. We’re asking it to explain what the evolution discovered.

Phase 5: Daily output pipeline

Automated morning briefing:

Pull latest signals
Run inference with best surviving models
LLM synthesis of current positioning
Markdown output → email/newsletter
Confidence indicators based on model agreement

The product is not a number. It’s a structured argument.

Principles

Interpretability over accuracy. A 60% accurate model we understand beats a 70% black box.
Start dumb. Let the results tell us when we need more complexity.
Reproducibility. Deterministic initialization means we can always explain how we got here.
Fail fast. If an idea doesn’t show promise in Phase 1, kill it.

Open questions

We don’t have all the answers. That’s the point. Some things we’re actively unsure about:

What’s the right signal granularity? Daily close? Intraday snapshots?
How do we score sentiment without building a whole NLP pipeline? (Maybe LLM one-shot classification?)
What’s the minimum historical window for meaningful fitness testing?
When do we know a Phase 1 survivor is “interesting” enough to proceed?

These questions will answer themselves as we build.

Why build this in public?

Two reasons:

Accountability. If we say we’re going to ship Phase 1 next week, we have to ship Phase 1 next week.

Feedback. Maybe you see a fatal flaw we’re missing. Maybe you’ve tried something similar and know where the bodies are buried. Tell us now, not after we’ve wasted three months.

What’s next

This week: Phase 0. Getting the scaffolding right.

Next week: Phase 1 implementation. The dumbest GA that could possibly work.

We’ll document progress, dead ends, and surprises here as they happen. This is a public build log — failures included.

Let’s find out if there’s signal in the evolutionary noise.

← All Lab Notes