Jane Street Neolithic reverse engineering

Context

This project focused on Jane Street's Neolithic puzzle: a very large sparse integer ReLU network had to be understood well enough to recover the unique input string that makes the model output 1.

The network was large enough that treating it as a black box was not a realistic strategy. The key was to reverse-engineer its structure and turn the puzzle into something more interpretable.

What I built

I explored the network as a program rather than as a generic model.

Overview of the competing reverse-engineering and solving approaches explored during the project.

A high-level view of the different solving paths explored while narrowing the problem down.

Decompiled sparse layers into explicit equations and gate-like operations
Analyzed the output logic from the end of the network backward
Identified repeated substructure matching an MD5-style hashing process
Built multiple inversion paths including symbolic manipulation, SAT solving, CSP-style propagation, and direct hash validation

Main insight

The breakthrough was recognizing that the network was effectively implementing an MD5-style checker over null-terminated ASCII input rather than a learned model in the usual sense.

Once that structure became clear, the task changed from “invert a giant network” to “validate and solve a constrained preimage search problem.”

Result

The repo documents the recovered phrase as bitter lesson and includes large-scale validation showing the reduced checker matched the neural-network interpretation on randomized samples.

Network-scale visualization used while tracing the structure of the sparse model.

A visualization of the larger network structure that helped make the model less opaque during analysis.

Why it matters

This project is a good example of the kind of work I enjoy: understanding an opaque system deeply enough to simplify it, prove what it is doing, and then choose the right solving strategy instead of the most obvious one.

Technical highlights

The most important step was not brute-force solving. It was recognizing that the right abstraction level was much higher than the raw network itself. Once the MD5-like structure was clear, the remaining work became a much more tractable validation and search problem.