A common assumption about large language models is that a bigger, more capable model is also a fairer one. More data, more parameters, more computing power. The reasoning goes that all of this adds up to a system that is more accurate and more representative of the world. That assumption is worth questioning. This paper argues that scale is not a solution to bias in large language models. It is the primary mechanism that drives it. Specifically, scaling these models amplifies systemic bias by rewarding statistical majority patterns at the direct expense of minority representation. The bigger the model, the more confidently it reflects whoever dominated the data it learned from.
Before the argument can be made, the central terms need to be defined clearly. Scale in the context of large language models refers to three things growing at the same time: the number of parameters in a model, the volume of text it trains on and the computing power used in training. A parameter is a numerical weight the model adjusts during training to improve its ability to predict text. The more parameters a model has, the more refined its pattern recognition becomes. Bias in this context does not mean obvious prejudice. Explicit bias is the kind that can be caught through straightforward analysis, such as a model consistently associating a specific group with negative descriptors. Implicit bias is different and considerably harder to detect. It shows up not in any single output but in a structural pattern across many outputs, specifically in which perspectives get expressed with confidence and authority and which get consistently underrepresented. Research surveying bias across large language models identifies implicit bias as the more significant and harder-to-measure problem precisely because it does not look like prejudice on the surface. It looks like normal, confident language generation.[1] This is the form of bias this paper is concerned with because scale is the mechanism that makes it both stronger and less visible.
The structural argument begins with the basic mechanics of how these models produce text. Large language models generate outputs through a process called next-token prediction. The model takes a sequence of text and selects the statistically most probable next word. This process is not neutral. Statistically probable means statistically common and statistically common means whatever shows up most often in the training data. The model is not evaluating truth, balance or fairness. It is calculating probability distributions based on whatever corpus it was trained on. Research into the architecture underlying these models, specifically the self-attention mechanism central to transformer-based systems, documents that this architecture creates a structural orientation toward high-frequency patterns in training data.[2] The model does not move toward majority expression by accident. It is oriented that way at a design level. A separate analysis frames this point directly: the bias in these systems is not incidental to what they are but intrinsic to it.[3] Scaling the model does not correct for this tendency. It makes the model more precise at reproducing the same imbalanced picture it started with.
This structural orientation becomes more pronounced at scale specifically because of how implicit bias operates. Larger models do not just repeat biased patterns more often. They express them more fluently, more authoritatively and in a wider range of contexts. A smaller model might produce a detectable or awkward skew. A larger model produces the same skew with the linguistic confidence of a well-informed source. Research examining implicit bias across more than fifty large language models found that bias does not diminish as model size increases and that the compounding effect of building newer models on top of foundational ones tends to carry existing biases forward rather than filter them out.[4] The fluency that comes with scale makes implicit bias harder to identify and therefore harder to address.
The empirical evidence for this argument is specific and measurable. A study introducing the MiJaBench benchmark tested twelve state-of-the-art large language models across forty-four thousand adversarial prompts targeting sixteen distinct minority groups. The results showed that defense rates, meaning the rate at which a model refused to generate harmful outputs targeting a given group, fluctuated by as much as thirty-three percent within the same model depending solely on which demographic group was the target.[5] The study concluded that safety alignment in these models does not reflect a consistent principle of non-discrimination but instead reflects memorized refusal patterns specific to certain groups while leaving others structurally unprotected. The study also found that model scaling made these disparities worse rather than better.[6] A separate study examining social identity bias across large language models found consistent and measurable gaps in how different demographic groups are represented in model outputs and that these gaps did not narrow as model complexity increased.[7] Minority groups are not underrepresented in model outputs because of deliberate design choices. They are underrepresented because the data these models learn from reflects the demographic reality of who produces text at scale on the internet and scaling the model scales that reflection proportionally.
The problem extends beyond any single generation of models. Bias in large language models does not stay fixed. It compounds over time through a phenomenon called model collapse. This occurs when newer models are trained not exclusively on human-generated text but on outputs produced by earlier model generations. When a prior-generation model produces outputs shaped by a majority-skewed training corpus and those outputs are later incorporated into the next generation's training data, the statistical skew of the first generation becomes structurally embedded in the second. Then the third. Research published in Nature found that training generative models recursively on synthetic data leads to measurable collapse in output diversity as the training distribution becomes progressively more homogeneous across generations.[8] A separate empirical study of GPT-2 across iterative synthetic training cycles found that political bias amplification occurred consistently and substantially across generational training loops and that this amplification continued independently of model collapse itself, meaning the two phenomena operate through separate mechanisms and can compound simultaneously.[9] The internet was already skewed toward majority perspectives before any large language model processed it. Model collapse means that skew does not simply carry forward across model generations. It deepens with each one.
A counterargument worth addressing is that some fairness benchmarks show larger models performing better on bias metrics than smaller ones. This observation is accurate and should not be dismissed. Some larger models do score higher on standardized bias evaluations. However benchmark performance and actual representational fairness are not the same measurement. A model that scores well on a fairness benchmark has learned to produce outputs that satisfy the criteria of that benchmark. Those criteria were designed by researchers working within specific institutional and cultural contexts that are not themselves demographically neutral.[10] Research analyzing bias measurement methodology has noted that the benchmarks most commonly used to evaluate fairness in large language models are better at detecting explicit bias than implicit bias, which is the more significant problem at scale.[11] A model that expresses majority perspectives with greater fluency and fewer surface-level markers of prejudice has not become a more equitable model. It has become a model whose structural imbalances are more effectively embedded in the texture of its output rather than sitting on top of them where they can be caught.
The Wang et al. study on bias amplification provides further support for this conclusion. Using GPT-2 as a test case across iterative training cycles, the researchers found consistent and substantial political bias intensification over successive rounds of synthetic training and demonstrated that this amplification persists even when model collapse is effectively controlled for.[12] This finding matters because it separates the bias amplification problem from the performance degradation problem. Even a model that maintains output quality across generations will still intensify its pre-existing biases if it trains recursively on its own outputs. Scale and iteration together create a situation where representational imbalance grows more pronounced over time regardless of whether the model is otherwise improving in capability.
Scale in large language models does not produce representational neutrality. The statistical architecture at the core of these systems favors high-frequency patterns derived from majority-dominated training data. The empirical evidence shows that demographic disparities in model outputs worsen rather than improve as models grow larger. The mechanism of model collapse ensures that these disparities compound across successive model generations rather than self-correct over time. And the fluency that comes with scale conceals implicit bias within outputs that read as authoritative and accurate. Scaling a large language model does not move it toward a more balanced representation of human perspectives. It moves it toward a more precise reproduction of the perspectives that were already statistically dominant to begin with.
- [1] Gallegos, Isabel O. et al. "Bias and Fairness in Large Language Models: A Survey." Computational Linguistics Journal, 2024.
- [2] Li, Yingcong et al. "Mechanics of Next Token Prediction with Self-Attention." AISTATS, 2024.
- [3] "Large Language Models Are Biased Because They Are Large Language Models." arXiv, 2024.
- [4] Gallegos et al. See note 1.
- [5] Brito, Iago A. et al. "MiJaBench: Revealing Minority Biases in Large Language Models via Hate Speech Jailbreaking."
- [6] Brito et al. See note 5.
- [7] "Social Identity Bias in Large Language Models." Nature Computational Science, 2024.
- [8] Shumailov, Ilia et al. "AI Models Collapse When Trained on Recursively Generated Data." Nature, vol. 631, 2024, pp. 755–759.
- [9] Wang, Ze et al. "Bias Amplification: Large Language Models as Increasingly Biased Media." arXiv, 2024.
- [10] Navigli, Roberto et al. "A Systematic Analysis of Biases in Large Language Models." arXiv, 2025.
- [11] Gallegos et al. See note 1.
- [12] Wang et al. See note 9.
Acknowledgments
Header image and visual reference: Welch Labs. "How DeepSeek Rewrote the Transformer [MLA]." YouTube, 2025.
