Curtis Anderson, the software architect at Panasas and co-chair of the MLCommons storage working group, recently shed light on data storage’s importance in artificial intelligence and machine learning (AI/ML). He emphasized that the right storage infrastructure is crucial for the effective functioning of AI/ML applications.
The rising influence of AI/ML
AI/ML, at its core, is about pattern recognition. It can revolutionize business processes, enterprise outcomes, and people’s lives. IDC predicts that the global AI market, encompassing software, hardware, and services, will hit $900 billion by 2026, with a CAGR of 18.6% from 2022 to 2026.
As AI/ML adoption increases, IT teams must focus on building and managing an infrastructure that can support these capabilities and scale for future growth. One often underestimated and misunderstood component in this process is the data storage infrastructure necessary for these emerging applications.
Busting four common AI/ML storage myths
Myth 1: AI/ML is Solely About the GPU
While GPUs with high computational power have been instrumental in bringing AI/ML applications and neural networks to life, they are not the only critical component. Storage and networking, which ensure data availability for the accelerator, are equally important. The storage and networking infrastructure choice should be as carefully considered as the GPU.
Myth 2: AI/ML necessitates high-IOPs all-flash storage
Contrary to popular belief, AI/ML storage is not just about speed. High IOPs all-flash storage systems may not always be the best choice. The performance of accelerators and AI/ML applications varies, and in some cases, a hybrid system could work just as well as an all-NVMe solution but at a lower cost. Independent benchmarks like MLPerf can guide IT teams in finding the optimal balance between compute accelerators, AI/ML workloads, and storage options.
Myth 3: Storage tiering will cut AI/ML costs
While tiered storage is a common strategy to optimize storage resources and minimize costs, it does not apply to AI/ML applications. In AI/ML, all training data is used in every training run, making all data “hot.” Therefore, AI/ML storage solutions must ensure all data is always available. Moreover, the accuracy of AI/ML workloads improves with the volume of training data, necessitating a storage infrastructure that can scale linearly as data volumes grow.
Myth 4: AI/ML can utilize a dedicated single-use storage system
AI/ML is most valuable when applied to an organization’s core data. For many businesses, AI/ML has moved beyond being an experimental side project and has become integral to their operations. Therefore, these applications must be consolidated into the organization’s core IT infrastructure and storage solution.
AI/ML innovations are expected to drive significant transformations across enterprises, impacting almost every aspect of an organization. According to the Gartner hype cycle, technologies like edge AI, decision intelligence, and deep learning are predicted to reach mainstream adoption in the next two to five years. As organizations embark on their AI/ML journey, the choice of underlying storage infrastructure will play a pivotal role in maximizing the potential of these applications.