AI testing

Scale AI to Create LLM Test, Evaluation Use Cases for CDAO

Scale AI has announced that it will work with the Department of Defense Chief Digital and Artificial Intelligence Office to create a test and evaluation framework for large language models.

Under the partnership, Scale AI will develop tests designed for DOD use cases and integrate them into its T&E platform. The effort will measure model performance, create specialized public sector evaluation sets for AI model testing, identify generative AI models for military applications and assess quantitive data through benchmarking and qualitative feedback assessment.

The result will give CDAO a framework for safe AI deployment, allow the department to mature its T&E policies for generative AI, improve the agency’s AI system resilience in classified environments and enable the DOD to deploy LLM in secure environments, Scale AI said Tuesday.

The partnership comes after CDAO launched a bounty program aimed at identifying biases in AI systems. The AI Bias Bounty, led by the CDAO Responsible AI division, will focus on identifying unknown risks in LLMs.

Meanwhile, the Defense Advanced Research Projects Agency launched the “Building an Adaptive & Competitive Workforce,” a project seeking AI- and LLM-driven tools for adult education and workforce development.