Why a “combiner model” might someday work

The Lindahl Letter

0:00

-5:41

Why a “combiner model” might someday work

A combiner model represents a critical shift away from the assumption that AI progress requires ever-larger single systems instead of training another trillion-parameter monolith.

Dr. Nels Lindahl

Nov 15, 2025

Thank you for tuning in to week 213 of the Lindahl Letter publication. A new edition arrives every Friday. This week the topic under consideration for the Lindahl Letter is, “Why a “combiner model” might someday work.”

Open models abound. Every week, new open-weight large language models appear on Hugging Face, adding to a massive archive of fine-tuned variants and experimental checkpoints. Together, they form a kind of digital wasteland of stranded intelligence. These models aren’t all obsolete; they’re simply sidelined because the community lacks effective open source tools to combine their specialized insights efficiently. The concept of a “combiner model” offers one powerful path to reclaim this lost potential. Millions of hours of training, billions of dollars in compute, and so much electricity have been spent. Sure you can work by distillation to capture outputs from one model into another, but a combiner model would be different as it overlays instead of extracts.

A combiner model represents a critical shift away from the assumption that AI progress requires ever-larger single systems. Instead of training another trillion-parameter monolith, we can learn to combine many smaller, specialized models into a coherent whole. The central challenge lies in making these models truly interoperable. The challenges form from questions around how to merge or align their parameters, embeddings, or reasoning traces without degrading performance. The combiner model would act as a meta-learner, adapting, weighting, and reconciling information across independently trained systems, unlocking the latent knowledge already encoded in thousands of open weights. Somebody at some point is going to make an agent that works on this problem and grows stronger by essentially eating other modals.

This vision can be realized through at least three technical routes. The first involves weight-space merging. Techniques such as Model Soups and Mergekit show that when models share a common base, their weights can be effectively averaged or blended. More advanced methods, like TIES-Merging, learn adaptive coefficients that vary across layers, turning model blending into a trainable optimization process rather than a static recipe. In this view, the combiner model becomes a universal optimizer for reuse, synthesizing the gradients of many past experiments into a single, functioning network.

The second approach focuses on latent-space alignment. When models differ in architecture or tokenizer, their internal representations diverge. Even so, a smaller alignment bridge can learn to translate between their embedding spaces, creating a shared semantic layer, or semantic superposition. This allows, for example, a legal-domain model and a biomedical model to exchange information while their original knowledge weights remain frozen. The combiner learns the translation rules, effectively building a common interlingua for neural representations that connects thousands of isolated domain experts.

The third approach treats the combiner not as a merger but as a controller or orchestrator. In this design, the combiner dynamically decides which expert model to invoke, evaluates their outputs, and fuses the results through its own learned inference layer. This idea already appears in robust multi-agent frameworks. A true combiner model or maybe combiner agent would internalize this orchestration as a core part of its reasoning process. Instead of running one model at a time, it would simultaneously select and synthesize outputs from many experts, producing complex, context-aware intelligence assembled on demand. This approach is the most immediately viable and is already being used in sophisticated production systems today.

If such systems mature, the economics of AI will fundamentally change. Rather than concentrating resources on a few massive, proprietary models, research will shift toward modular ecosystems built from reusable parts. Each fine-tuned checkpoint on Hugging Face will become a potential building block, not an obsolete artifact. The combiner would turn the open-weight landscape into an evolving lattice of knowledge, where specialization and reuse replace the endless cycle of frontier retraining. This vision is demanding, but the promise remains compelling: a world where intelligence is assembled, not hoarded; where the fragments of past experiments contribute directly to future understanding. The combiner model might not exist yet, but its underlying logic already dictates the future of open source AI.

What’s next for the Lindahl Letter? New editions arrive every Friday. If you are still listening at this point and enjoyed this content, then please take a moment and share it with a friend. If you are new to the Lindahl Letter, then please consider subscribing. Make sure to stay curious, stay informed, and enjoy the week ahead!

Links I’m sharing this week!

This is the episode with Sam Altman that everybody was talking about.

The Lindahl Letter

Why a “combiner model” might someday work

Discussion about this episode