https://arab.news/jsd7q
As new infrastructure emerges to train artificial intelligence programs and offer new services, important implications arise around how to store this insight.
With AI creating new data and making existing data more valuable, a cycle quickly emerges, where increased data generation leads to expanded storage needs.
This fuels further data generation — forming a “virtuous AI data cycle.”
Understanding this AI data cycle is important for organizations looking to access the power of AI and leverage its capabilities.
The AI data cycle is a six-stage framework. The first focuses on collecting existing raw data and storage. Data here is collected and stored from various sources, and the analysis of the quality and diversity of collected data is critical — setting the foundation for the next stages.
For this stage of the cycle capacity enterprise hard disk drives (eHDDs) are recommended, as they deliver the highest capacity per drive and lowest cost per bit.
The next stage is where data is prepared for intake and the analysis from the previous stage is processed, cleaned and transformed for training.
To accommodate this stage, data centers are implementing upgraded storage infrastructure — such as fast data lakes — to support data for preparation and intake.
Here, high-capacity solid-state drives are needed to enhance existing HDD storage or to create new all-flash storage systems.
Then comes the training of AI models to make accurate predictions with training data. This happens on high-performance supercomputers — requiring specialized and high-performance storage to operate efficiently.
High-bandwidth flash storage and low-latency optimized eSSDs are designed to meet the specific needs of this stage.
Next, inference and prompting involves creating a user-friendly interface for AI models. This includes an application programming interface (API), dashboards and tools that combine context to specific data with end-user prompts.
With AI creating new data and making existing data more valuable, a cycle quickly emerges, where increased data generation leads to expanded storage needs.
Peter Hayles
Then, AI models will integrate into internet and client applications without needing to replace current systems, meaning that maintaining current systems alongside new AI computing will require further storage.
Here larger, faster SSDs are required for AI upgrades in computers, and higher-capacity embedded flash devices are required for smartphones and Internet of Things systems.
The AI inference engine stage follows, where trained models are deployed into production environments to analyze new data and generate new content or provide real-time predictions. The engine’s level of efficiency is critical to achieve quick and accurate AI responses.
To ensure comprehensive data analysis, significant storage performance is required. High-capacity SSDs can be used for streaming or to model data into inference servers based on scale or response time needs, while high-performance SSDs can be used for caching.
Finally, the new content is generated, with insights produced by AI models and then stored. This stage feeds back into the data cycle, driving continuous improvement by increasing the value of data for training or to be analyzed by future models.
The generated content will be stored in enterprise hard drives for datacenter archives and in both high-capacity SSDs and embedded flash devices for AI edge devices.
By understanding these six stages of the AI data cycle and having the right tools in place, businesses can better sustain the technology to perform internal business functions and capitalize on the benefits AI offers.
Today’s AI uses data to produce text, video, images and other interesting content. This continuous loop of data consumption and generation accelerates the need for performance-driven and scalable storage technologies for managing large AI datasets and re-factoring complex data efficiently, driving further innovation.
Demands for storage are significantly increasing as its role becomes more prevalent. Access to data, the efficiency and accuracy of AI models, and larger, higher-quality datasets will increasingly become important.
Additionally, as AI becomes embedded across nearly every industry, partners and customers can expect to see storage component providers tailor products to each stage of the AI data cycle.
• Peter Hayles is the product marketing manager for hard disk drives at the US computer manufacturer and data storage company Western Digital