Back to selected work

2026

Jane Street Neolithic reverse engineering

Independent reverse-engineering project

Reverse-engineered a massive sparse integer neural network into an interpretable hash pipeline, reducing the problem to a preimage search.

Public research repo
PythonPyTorchNumPySymPyPySATMatplotlib

Decompiled a 2721-layer sparse network into explicit equations and recognized its MD5-style structure.

Built several solver paths including symbolic analysis, SAT encoding, and hash-level validators.

Recovered the winning phrase and validated the model hypothesis against large randomized checks.

Context

This project focused on Jane Street's Neolithic puzzle: a very large sparse integer ReLU network had to be understood well enough to recover the unique input string that makes the model output 1.

The network was large enough that treating it as a black box was not a realistic strategy. The key was to reverse-engineer its structure and turn the puzzle into something more interpretable.

What I built

I explored the network as a program rather than as a generic model.

Overview of the competing reverse-engineering and solving approaches explored during the project.

A high-level view of the different solving paths explored while narrowing the problem down.

Main insight

The breakthrough was recognizing that the network was effectively implementing an MD5-style checker over null-terminated ASCII input rather than a learned model in the usual sense.

Once that structure became clear, the task changed from “invert a giant network” to “validate and solve a constrained preimage search problem.”

Result

The repo documents the recovered phrase as bitter lesson and includes large-scale validation showing the reduced checker matched the neural-network interpretation on randomized samples.

Network-scale visualization used while tracing the structure of the sparse model.

A visualization of the larger network structure that helped make the model less opaque during analysis.

Why it matters

This project is a good example of the kind of work I enjoy: understanding an opaque system deeply enough to simplify it, prove what it is doing, and then choose the right solving strategy instead of the most obvious one.

Technical highlights

The most important step was not brute-force solving. It was recognizing that the right abstraction level was much higher than the raw network itself. Once the MD5-like structure was clear, the remaining work became a much more tractable validation and search problem.

Reach out

Want more detail than I can share publicly?

I can walk through the architecture, tradeoffs, and implementation details for private work in a conversation.