The USAF and DoD are investing heavily in AI technologies to help warfighters leverage the vast amounts of critical data available during modern warfare. However, most AI tech is developed using commercial data not-representative of real-world military data and operational environments. Enabled Intelligence, Inc. (EI) is proposing to adapt its AI model testing software platform to provide the Air Force with the automated and accurate testing and evaluation of AI technologies in real-world USAF environments and situations. The EI platform will help ensure USAF AI technologies work as needed and expected in the field and help detect and address potential adversarial spoofing of USAF AI tools. With the EI tool, the USAF could learn that a seemingly high performing aircraft detection AI doesnt work on new Russian Migs, or in snowing terrain, or when there is camouflage. EI testing would also detect if a model stops working due to adversarial changes. Through EIs platform, the USAF could realize the benefits enjoyed by EIs commercial clients, including: 1) ultra-fat AI model testing; 2) Highly detailed evaluations showing very specific errors and capabilities of each AI model by data type, object, environment and other factors; 3) Faster turnaround on AI model improvement to allow for faster AI deployment; and 4) Detection of changes / spoofing in AI models as they change over time in the field. EIs AI testing platform can be adapted to test AI models using classified and other truly representative AF mission data to provide detailed and accurate testing of real AI model performance. The platform can run in the cloud or on-premises (for more secure installations). Using the technology, EI has tested a wide variety of AI model types, use cases, and on a variety of data formats. The platform comprehensively and automatically tests AI tech and goes beyond typical commercial AI tests. The EI platform produces detailed direction on which types of data and/or instances show reduced performance, what kinds of additional training data would improve AI model performance. For example, a typical AI test may report a 96% detection rate of aircraft in satellite imagery. The tests can also detect changes in AI performance overtime due to adversarial spoofing.