The Pentagon Partners with an AI Firm to Test New LLM Models

2 mins read February 20, 2024

Pentagon

Scale AI will create a test-and-evaluation (T&E) framework for the Pentagon’s large language models (LLMs). The goal is to ensure they’re safe and reliable for military use.
The T&E process will involve creating “holdout datasets.” DOD insiders will suggest response pairs. They will review them and ensure they’re as good as a human’s response in the military.
The goal is to make AI systems stronger and more resilient. This will allow LLM technology to be used in secure places. It will also help the DoD understand the technology’s strengths and limits.

Scale AI is making a test-and-evaluation (T&E) plan for the Pentagon’s large language models (LLMs). The project aims to make sure AI models are safe and reliable for military use.

The Pentagon’s Chief Digital and Artificial Intelligence Office (CDAO) needs a way to test and evaluate AI models for military use. The CDAO wants to use LLMs to support and improve military planning and decision-making. However, LLMs can also disrupt these processes.

The Pentagon has used T&E processes for a long time to ensure its systems, platforms, and technologies work well. But, AI safety standards and policies are not yet set. The complexities and uncertainties of LLMs make T&E even harder for generative AI.

How will It work?

Scale AI will create a framework for the CDAO to test and evaluate LLMs. The T&E process will include creating “holdout datasets” where DOD insiders will prompt response pairs and review them in layers. The experts will ensure that each response is as good as a human’s response in the military.

The process will be iterative, and once the datasets are ready, the experts will evaluate existing LLMs against them. Eventually, the models will send signals to CDAO officials if they start to waver from the domains they have been tested against.

The goal of the Pentagon

The goal is to enhance the robustness and resilience of AI systems in classified environments. This will enable the adoption of LLM technology in secure environments. The company plans to automate as much of the development process as possible. This way, as new models come in, there can be some baseline understanding of how they will perform, where they will perform best, and where they will probably start to fail.

Benefits of the partnership

The partnership between Scale AI and the DoD is a significant step towards ensuring the safe and responsible deployment of LLMs and generative AI within the military. The T&E framework will help the DoD understand the strengths and limitations of the technology. It will also ensure that the models are reliable, safe, and effective for military applications.

Scale AI’s CEO, Alexandr Wang, said, “Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly. Scale is honored to partner with the DoD on this framework.”

Apart from the CDAO, Scale AI has partnered with Meta, Microsoft, the U.S. Army, the Defense Innovation Unit, OpenAI, General Motors, Toyota Research Institute, Nvidia, and others. These partnerships show Scale AI’s commitment to ensuring the safe and responsible deployment of AI technology.

The partnership between Scale AI and the Pentagon is a big step. It is towards ensuring the safe use of LLMs and generative AI in the military. The T&E framework will help the DoD understand the technology’s strengths and limits. It will also make sure the models are reliable, safe, and effective. This is for military use. With Scale AI’s expertise and the Pentagon’s need for T&E, this partnership is a win-win for both parties.

Don’t just read crypto news. Understand it. Subscribe to our newsletter. It's free.

Share this article

Disclaimer. The information provided is not trading advice. Cryptopolitan.com holds no liability for any investments made based on the information provided on this page. We strongly recommend independent research and/or consultation with a qualified professional before making any investment decision.

Randa Moses

Randa Moses is an editor and reporter at Cryptopolitan covering tech, AI, robotics, crypto, scams, and hacks. She has worked in the crypto space since 2017. She held roles at Forward Protocol, AmaZix, and Cryptosomniac. Randa holds a degree in Electrical and Electronics Engineering from the University of Bradford.

TABLE OF CONTENT

1. How will It work?

2. The goal of the Pentagon

3. Benefits of the partnership

Share this article