Family vs. Lineage: Unpacking Two Often-Confused Ideas in the LLM World¶
LLMs have begun to resemble sprawling family trees. Folks that are relatively new to LLMs will notice two words appear constantly in technical blogs: "family" and "lineage".
They sound interchangeable and users frequently conflate them. But, they describe different slices of an LLM’s life story.
Important
Understanding the differences is more than trivia. This determines how you pick models, tune them, and keep inference predictable at scale.
What do we mean by “Family”?¶
A family groups models that share the same architecture and tokenizer—the engineering “DNA” that determines how tokens are encoded and how attention layers are wired. Parameter count, training dataset size, and alignment style can vary wildly inside a family, but the core execution graph is identical.
Info
For example, Meta’s Llama family includes "Llama 1-13B, Llama 2-70B Chat, and Llama 3-8B Instruct". Every family member uses grouped-query attention and the same 32k SentencePiece tokenizer. You can swap one checkpoint for another, but your inference code—kernels, quantizer etc remain unchanged.
Why & How Family Matters for Inference¶
Once a runtime (e.g. TensorRT-LLM) has been optimized for a family, all the siblings benefit. For example, this means, you can mix 7B and 70B versions in an autoscaling tier without recompiling kernels or rewriting prompt builders.
What do we mean by “Lineage”?¶
A lineage tracks the training history of a single branch inside that family: pre-training → instruction tuning → RLHF → continued-pretraining, and so on.
Think of lineage as the model’s history i.e.
- What data shaped it?
- What interventions modified its weights?
For example, OpenAI’s GPT-3.5-Turbo (Jun 2023) → GPT-3.5-Turbo-1106 (Nov 2023) → GPT-3.5-Turbo-0125 (January 2024) belong to the same GPT-3.5 family. Their lineage shows incremental fine-tunes on reinforcement feedback, policy refinements for safety, and compression tweaks for lower latency.
Another example is Mistral-7B. Starting from the base checkpoint, the lineage splits into "Mistral-7B (base)", "Mistral-7B-Instruct (SFT on ShareGPT)" and "OpenHermes-2.5 (community RLHF on the instruct model)". All three are “Mistral family,” but each branch carries additional biases, safety filters, and task skills.
Why & How Lineage Matters for Inference¶
Lineage can have a pretty substantial impact on Quality of Inference. Let's look at some of them with some examples:
1. Behavioral Predictability¶
Two siblings from different lineages can respond very differently to the same prompt. For example, the first sibling might refuse a request, but the second may accept because later RLHF passes may have added guardrails. Production systems need to factor this in for SLA and safety audits.
2. Token-Budget Planning¶
Continued-pre-training typically enlarges the tokenizer or alters special tokens. A downstream application hard-coded for and could break if a newer lineage variant switches to <|im_start|>. Tracking lineage prevents silent prompt-format errors.
3. Compatibility With Adapters¶
LoRA or QLoRA adapters assume a specific weight ordering. Apply an adapter trained on Mistral-7B-Instruct to the base checkpoint and you’ll get garbage outputs. Lineage awareness ensures you pair the right deltas with the right parent weights.
4. Regulatory & IP Concerns¶
Some lineages incorporate data under non-commercial or personal-data licenses. If your inference endpoint serves a commercial product, picking the wrong branch can expose you to compliance risk.
5. Kernel-Level Optimizations¶
A lineage that introduces Mixture of Experts (MoE) or sparsity will result in changes to the execution path. You may need different fused kernels or caching schemes to ensure you do not deal with latency spikes or memory bloat sneak into production.
Conclusion¶
In this blog, we discussed how Family and Lineage for LLMs are different. We also discussed how these can impact Inference. The TL;DR is as follows:
- Family = architectural siblings (swap-friendly)
- Lineage = training ancestry (behavior-defining)
For engineers running LLMs in production, we recommend you pick the family that fits your hardware stack, then trace the lineage to guarantee the behaviors, licenses, and performance you expect.
-
Free Org
Sign up for a free Org if you want to try this yourself with our Get Started guides.
-
Live Demo
Schedule time with us to watch a demo in action.