This post is a living document. Last updated on 2026-04-12.
Computational research and AI
At the Wolfram Institute, I explore how ideas from mathematical physics can be carried into the framework of Wolfram models. My work focuses on designing the least complex discrete model capable of reproducing a macro phenomenon, with the potential to branch out and converge to known continuous structures in the limit of many computational steps. After implementing such a model and setting up a computational experiment, the empirical phase begins: making statistical observations, searching for emergent patterns, and identifying laws between well-designed observables as the complexity of the computational substrate generated by the experiment increases. It is exciting to think about how the seemingly continuous, unlimited, and unique world might emerge from a discrete one characterized by indivisibility, boundedness, and ambiguity.
Before committing to a concrete project, I brainstorm ideas, gather papers and other resources, distill suitable constructions, implement them in Wolfram Language, and create notebooks for experimentation and preliminary presentation to colleagues. If a direction looks promising, I zoom in and develop it further, eventually contributing to Wolfram Institute projects, paclets, or independent writing. In practice, three parts of this workflow are especially suitable for AI assistance: the exploratory phase, polishing phase, and repo maintenance and organization.
My utilization of AI
For two years I used AI in a basic way—for chat, reformats, and corrections. When working in a corporate environment, I was anxious about sharing unwanted context and responsibility for correctness. The peak of my AI usage was taking advantage of most of CodeCompanion’s functionality for Neovim, except agents.
A turning point came with the release of Claude Opus 4.5. I began experimenting with agentic workflows and got used to offloading tasks to Claude using only guidance and specifications, without touching the code directly. The results were surprisingly good. My first “wow” moment was vibe-coding a Swift app to manage and time-track my weekly tasks—generated entirely from a screenshot of my wife’s Excel table. I have never programmed in Swift and have never looked at the generated code, yet the app works perfectly. It feels like we are approaching a world where anyone can define the app they need and let the AI build it, like selecting a song on a jukebox.
Rapid prototyping—getting from A to B without deep dives—also seems essentially solved. When improvements are needed, the simplest approach is to direct the AI to reuse fragments of earlier code or templates. When I have time or need specific architecture, I still refine things myself, but AI gives me the option to strategically postpone that work.
Given this shift, I think it is time to consider agentic workflows and researcher best practices for computational research in the AI era. Since general standards and guidelines still seem to be missing, and I don’t have time to dive too deep, I decided to stay within the Claude ecosystem and began writing a plugin that captures my workflow.
Computational research plugin
The plugin (GitHub) is installed from the WolframInstitute marketplace. Core principles:
- Computational engine. Link to the Wolfram kernel for validation and Wolfram functions.
- Knowledge base. A plain-markdown wiki readable by both humans and LLMs.
- Resource management. Resources stored as Markdown with recovery instructions, regenerated on demand.
- Repo organization. Scaffolding for research projects, paclet repos, and LaTeX papers.
- Tour. Guide the user from the simplest parts to the most advanced, with revision at each step.
- Formal verification. (TBD) Creation of Lean projects for validating results.
See the LLM-updated version history below for a changelog after each major release.
Some philosophy and future direction
AI will surely make certain hard skills obsolete. Paradigm shifts are exciting, and letting things go and starting anew deepens life experience. But several human roles remain essential:
- Curiosity — the drive to explore a direction and push it as far as possible through questions and tasks.
- Ideas — identifying an AI-unsolvable problem that matters to your group of humans and challenging yourself to solve it.
- Coordination — defining the workflow for agents and deciding when humans intervene.
- Communication — spreading enthusiasm among humans about your problem and convincing them it is worth their time and resources.
A few ideas for missing pieces:
- Central database for both verified and human math. It could be some database involving formalizations in Lean and a collection of human proofs. A future mathematician could input a new result in any form—even blackboard photos—and the AI would place it correctly and add it to a queue for revision and formalization. Readers could search through with the help of LLMs and generate a study-ready paper tailored to their needs. This would remove the need to stitch together dry math papers and allow researchers to focus on novel contributions. Perhaps Lean and Mathlib, which I have to get more familiar with, already go in the correct direction. As we are demonstrating in our next project UniversalityDB, LLMs when equipped with a knowledge base, computational engine, and human guidelines, can help with auto-formalizations in Lean and lower the threshold for using it.
- A clear definition of an agent. What exactly is an agent? Can an agent create other agents? Can such agents be reused across contexts?
- An orchestration graph. A workflow graph including agents, tools, and humans, ideally compilable into a minimal version where most tasks are performed by tools—a more capable successor to LLMGraph.
Version history (LLM updated)
Version 3 (2026-04-12)
Version 3.0.0 expands the plugin from a wiki-centric research tool into a full project lifecycle manager: scaffolding for research projects, paclet repos, and standalone paclets; skills for building and publishing paclets to Wolfram Cloud; LaTeX paper scaffolding with amsart and biblatex; and search across the Wolfram ecosystem (documentation, Function Repository, Community, writings, Physics Project).
Version 2 (2026-04-05)
Version 2.0.0 introduces a plain-markdown wiki as the central knowledge base — readable by both LLMs and humans, version-controlled, with cross-references the LLM navigates instead of scanning every file. Resources and notebooks are stored as Markdown with recovery instructions, built on demand (idea by sw1sh). A revision protocol prevents the LLM from overwriting human-edited content. A tour skill walks through the project from simple to advanced.
The core MCP servers remain Wolfram MCP (or the unofficial wolfram-mcp with LSP support) and arXiv-mcp (plus arxiv-latex-mcp for reading LaTeX source).
Version 1 (2026-03-04)
The first version of the plugin bundled a few Claude skills: wolfram-notebook for creating Wolfram notebooks from prompts via Markdown import (an idea by sw1sh), and computational-exploration for scaffolding a structured research project.
The skill searched arXiv and Wolfram Community for papers, downloaded them, and produced organized notes with citations. Planned skills included notes-to-article, list-topics, setup-experiment, and polish-research.
After using this on several projects, I found the design too broad and not goal-oriented enough. Exploration, resource management, and knowledge organization were tangled together. Knowledge was spread across CLAUDE.md, notebooks, LaTeX notes, and resources. Generated notebooks were redundant to store, as they can be imported from Markdown anyway. Resources had no recovery mechanism, so a fresh clone lost all downloaded papers. And there was no revision protocol, so the LLM could overwrite user-edited content.