April 04, 2024
Response to NTIA Request for Comment: “Dual Use Foundation Artificial Intelligence Models with Widely Available Model Weights”
In February 2024, the Department of Commerce’s National Telecommunications and Information Administration (NTIA) issued a Request for Comment (RFC) on the implications of “open-weight” AI models—models where the underlying algorithms learned during training are made widely available. CNAS expert Caleb Withers submitted a response, providing an overview of:
- Trends in cutting-edge AI models and the subsequent release of comparable open-weight models—and how factors like growing costs and secrecy may increase the lag between these going forward.
- How open-weight models realize some of the benefits of open-source software—and important distinctions around the potential misuse and opacity of large AI models.
- How policies around model weights intersect with policies throughout the wider AI lifecycle, as well as underlying U.S. national security objectives.
Selected excerpts from the full response follow
1.b. Is it possible to generally estimate the timeframe between the deployment of a closed model and the deployment of an open foundation model of similar performance on relevant tasks? How do you expect that timeframe to change? Based on what variables? How do you expect those variables to change in the coming months and years?
To date, well-resourced AI labs—Meta, in particular—have shown a strong commitment to releasing increasingly powerful downloadable models.
Nonetheless, developers of downloadable models will face growing challenges in keeping pace with the AI frontier, potentially increasing the lag time to deploy comparably performing, downloadable models. These challenges include growing costs, secrecy around model algorithms and training, limits to the current data regime, and increased competition.
Growing costs
Increased spending on training has been the largest driver of progress in cutting-edge AI capabilities. If current spending trends continue, the cost for training models will exceed $1 billion within a few years. As a result, several labs may no longer be able to afford training near the AI frontier, especially if releasing their model weights reduces potential monetization.
Mistral’s recent release of its Mistral Large language model illustrates this dynamic. Before this, Mistral had released the weights of its prior flagship models. If Mistral had done so for Mistral Large, it would likely be among the most powerful downloadable models. Instead, Mistral limited access to Mistral Large through online interfaces and APIs, including through a new partnership with Microsoft. On X (formerly Twitter), Mistral’s CEO asked “for a little patience, [as 1,500 NVIDIA H100 GPUs] only got us that far.”
Growing secrecy around model algorithms and training
Many key insights underpinning the performance of GPT-3 and subsequent large language models (LLMs) were widely publicized. In particular, GPT-3’s architecture was similar to previous models including GPT-2 (2019) and GPT-1 (2018)—just scaled up. These GPT models, along with all of the most powerful LLMs to date, apply the transformer architecture, publicly detailed by Google in a 2017 research paper. Other key insights at Google and OpenAI driving progress in state-of-the-art language models were also detailed publicly, including:
- OpenAI’s use of Reinforcement Learning from Human Feedback (RLHF) to train language models to follow instructions.
- Google’s ‘Chinchilla’ scaling laws, which advanced empirical understanding of “optimal model size and number of tokens for training a transformer language model under a given compute budget.”
More recently, leading models like GPT-4, Google’s Gemini, and Anthropic’s Claude 3 have been released without detailed discussion of their architecture or training (although there were credible leaks for GPT-4). If this shift to greater secrecy continues in tandem with the value of the underlying intellectual property, it could impede competitors from catching up—especially if leading labs further tighten operational and information security.
Limits of the current data regime
In recent years, labs have trained LLMs primarily on publicly available text, with a particular emphasis on higher-quality, user-generated content (e.g. upvoted posts on Reddit), along with books, scientific papers, code, and other higher-quality websites. For models trained through at least 2022, there was enough higher-quality data available. The bottleneck was scaling up training compute, not data. Competing labs could approximate GPT-3’s mix of training data, which drew primarily on publicly available data as OpenAI explained in its release paper.
However, as the amount of compute required to train leading models increases, the availability of higher-quality data has emerged as a constraint. As with model architecture and training more generally, leading labs are no longer detailing their training datasets in public. Additionally, performance in specialized tasks is increasingly driven by training on specialized datasets. Going forward, the most powerful models will employ new training architectures that leverage available datasets more efficiently, or train on novel, non-public, or synthetically generated data. This may prove challenging for some competitors, especially given the greater cost of strategies that rely on purchasing or generating data.
Increased competition at lower price points
In addition to competing at the frontier, leading labs are increasingly competing on price and speed. In recent months, OpenAI, Google, and Anthropic have released versions of their most powerful AI models that are both cheaper and faster: these models are generally the best available across a wide range of speeds and prices, eroding a traditional competitive advantage of the open-weights ecosystem.
All of these factors may contribute to widening the gap between the capabilities of downloadable models and closed models at the frontier of AI development.
2. How do the risks associated with making model weights widely available compare to the risks associated with non-public model weights? The following answer also addresses subquestions 2.a., 2.d., 2.d.i., 2.d.ii. and 2.f.
7.b. How might the wide availability of open foundation model weights facilitate, or else frustrate, government action in AI regulation?
Challenges to mitigating risks when model weights are widely available
Releasing weights for download can make it harder to mitigate certain risks. For instance, when models are accessed through a web interface or API, providers can implement safeguards such as filtering harmful user queries, restricting outputs, monitoring for misuse, and revoking access. But if model weights are downloadable, it is generally straightforward to remove these safeguards.
Most foundation models also undergo specific additional training (‘fine-tuning’) to reduce their propensity to follow harmful instructions. However, current fine-tuning techniques have largely failed to remove underlying capabilities from the model. Open access to model weights can allow users to reverse safety fine-tuning at relative ease and low cost.
As such, labs releasing downloadable models have limited ability to constrain bad actors that may seek to bypass model safeguards. Furthermore, releasing model weights is effectively irreversible—they cannot be recalled if concerning capabilities are discovered or unlocked through new techniques and tools.
Beyond removing safeguards, additional training and fine-tuning can also enhance model capabilities. The U.S. government should regularly assess if adversaries have non-public datasets that could fine-tune leading foundation models in ways that threaten U.S. security. Illustratively, the best coding models have either been, or been derived from, the most capable general-purpose foundation models, which are typically trained on curated datasets of coding data in addition to general training. While sophisticated offensive cyber capabilities have yet to materialize, this stems in part from limited public availability of the most relevant training data, such as exploits and documentation of their development. As such, when downloadable models begin to approach usefulness for sophisticated cyber operations, they may prove more dangerous in the hands of motivated state actors, who can fine-tune them with relevant datasets.
How widely available model weights impact the diffusion of AI capabilities
Training frontier models presents formidable challenges in terms of cost, hardware requirements, data availability, and human expertise. Releasing the weights of these models provides a significant head start to those unwilling or unable to invest the necessary resources to train them from scratch. Where competitors or malign actors leverage these models, there is little that can be done to restrict them from doing so in ways that harm U.S. interests. In its 2023 update to AI chip export controls, the Bureau of Industry and Security specifically highlighted dual-use AI foundation models as examples of the advanced AI systems motivating the new restrictions; widely releasing the weights of models that enable capabilities targeted by these controls risks directly undermining the underlying national security objectives.
When assessing the impacts of releasing model weights, decision-makers will need to account for other factors driving the diffusion of AI capabilities. Insights around model architecture and training techniques can have a more enduring impact than the release of model weights themselves (although releasing a model’s weights necessarily reveals its architecture and allows others to experiment with fine-tuning and post-training enhancements). For example, leading Chinese labs have applied the architecture and training process of Meta’s Llama models to train their own models with similar levels of performance.
The ongoing impact of a specific model’s weights being available will eventually diminish as the weights of more capable models are released; nonetheless, the pace that this occurs will be influenced by relevant policies.
3. What are the benefits of foundation models with model weights that are widely available as compared to fully closed models?
The benefits of foundation models with widely available model weights include:
- Facilitating in-depth research and evaluation: direct, widespread access to model weights themselves allows greater scrutiny of how models’ inner workings influence their behavior and capabilities.
- Enabling innovation and customization: developers can build on released model weights to develop new versions and iterations for their specific needs. For example, developers can fine-tune downloadable models on domain-specific datasets, adjusting their behavioral tendencies or response styles, or improving model efficiency while aiming to retain a given level of performance.
- Enabling self-hosting and avoiding lock-in: with the ability to download weights, customers can run models on their own infrastructure, potentially reducing risks from third party access to inputs and outputs, or loss of access following provider outages or model deprecation.
Downloadable model weights are not necessarily a silver bullet for realizing the above benefits. For example, without documentation of training data and processes, users will still be at a disadvantage relative to a model’s developers. Moreover, running and fine-tuning the largest foundation models is impracticable on consumer hardware.
3.d. How can the diffusion of AI models with widely available weights support the United States’ national security interests? How could it interfere with, or further the enjoyment and protection of human rights within and outside of the United States?
Widely available model weights do not differentially bolster U.S. national security interests: both domestic and foreign researchers, developers, customers, and users can all run, evaluate and build on these models. In some ways, U.S. adversaries are likely to disproportionately benefit:
- It is difficult to meaningfully constrain or monitor users of downloadable models. While many users will have legitimate reasons to prefer this, less restrictive models are nonetheless particularly useful to bad actors and adversaries.
- The U.S. currently leads its adversaries in foundation model capabilities. Countries with weaker models disproportionately benefit from being able to use, build on, and emulate foreign downloadable models.
- U.S. chip export controls constrain China’s ability to train compute-intensive foundation models. However, widely available model weights effectively circumvent these controls, allowing Chinese AI labs to download models which they themselves may not be able to train—or for which training may be cost-prohibitive—given these controls.
On the other hand, the availability of U.S. models and architectures makes it tempting for adversaries to rely on them at the expense of their own domestic innovation. As with U.S. chip exports, policymakers must weigh when it makes sense to foster this dependency, when cutting adversaries off may be wise, and to what extent doing so may accelerate indigenous capabilities. As long as the U.S. lead persists, a potentially attractive strategy would be encouraging the diffusion of models and architectures near or slightly ahead of Chinese equivalents, while discouraging this for the most advanced U.S. capabilities.
6.a. In which ways is open-source software policy analogous (or not) to the availability of model weights? Are there lessons we can learn from the history and ecosystem of open-source software, open data, and other “open” initiatives for open foundation models, particularly the availability of model weights?
Wide release of model weights offers some of the same benefits of open-source software more generally. However, there are important distinctions in their implications for security.
With traditional software, open-sourcing can help mitigate risks to users from exploitable code: exposing the code to more eyes makes it easier to identify and patch vulnerabilities. In contrast, powerful AI models pose significant risks not just from potential model vulnerabilities, but from the potential misuse of the model’s capabilities by operators, including malicious actors. Open-sourcing generally exacerbates these risks by increasing the ease of removing or weakening built-in safeguards.
Moreover, whereas typical software features human-written code, the internal representations learned by deep neural networks can be very difficult to explain or interpret—and are certainly not human-readable—reducing the practical benefits of accessing model weights. Unlike traditional, interpretable software logic, deep neural networks’ emergent complexity and highly parallel operation make isolating specific ‘bugs’ or backdoors—let alone formally verifying real-world robustness—impractical. Even if vulnerabilities have been broadly characterized, robustly mitigating them is generally not straightforward.
Where possible, claims about the in principle benefits or risks of downloadable models should be evaluated empirically. For example, white-box access to downloadable models appears to enable the discovery of particularly strong adversarial attacks.58 Going forward, would-be users and customers will have an interest in knowing how effectively such attacks have been addressed or prevented in a given model, whether downloadable or otherwise.
Download the Full Response.
More from CNAS
-
Technology to Secure the AI Chip Supply Chain: A Working Paper
Advanced artificial intelligence (AI) systems, built and deployed with specialized chips, show vast potential to drive economic growth and scientific progress....
By Tim Fist, Tao Burga & Vivek Chilukuri
-
Trump Must Rebalance America’s AI Strategy
The disagreements about AI progress are so fundamental and held with such conviction that they have evoked comparisons to a “religious schism” among technologists....
By Bill Drexel & Ruby Scanlon
-
Response to Request For Comment: “Bolstering Data Center Growth, Resilience, and Security”
CNAS experts emphasize the importance of data centers for artificial intelligence...
By Janet Egan, Geoffrey Gertz, Caleb Withers & Grace Park
-
Sovereign AI in a Hybrid World: National Strategies and Policy Responses
Going forward, the U.S. government will need to ensure that it continues to work with allies and partners as it attempts to mitigate the risks of international AI diffusion, e...
By Pablo Chavez