In a significant stride towards bringing Generative AI to iPhones, Apple researchers have introduced a groundbreaking method to circumvent RAM limitations on mobile devices. Large Language Models (LLMs), such as OpenAI’s ChatGPT 4, are known for their immense computational demands, typically requiring powerful servers to handle their processing. However, Google’s recent Gemini AI, designed to rival GPT-4, offers a ‘Nano’ variant tailored for smartphones. This adaptation employs quantization techniques to reduce the model’s parameters to either 1.8 billion or 3.6 billion. Currently, one of these Nano variants powers Google’s Pixel 8 Pro smartphones, now available at a reduced price of $799 on Amazon.
While Google’s efforts with Gemini Nano mark a significant development, Qualcomm claims that its new Snapdragon 8 Gen 3 SoC can support generative AI LLMs with up to 10 billion parameters. While this surpasses Google’s capabilities, it still falls far short of the 1.7 trillion parameters required for GPT-4 to perform optimally. Quantization, although facilitating processing on mobile SoCs, inevitably sacrifices accuracy and effectiveness. Thus, the ability to accommodate larger LLMs on mobile devices becomes critical for improving their performance.
Apple’s ingenious solution
One of the major hurdles to enabling generative AI on smartphones is the substantial RAM requirement. For instance, an LLM model reduced to 8-bits per parameter with 7 billion parameters, like Meta’s Llama 2 supported by the Snapdragon 8 Gen 3, would necessitate a smartphone with at least 7GB of RAM. Apple’s iPhone 15 Pro series boasts 8GB of RAM, indicating that an Apple-developed LLM, such as Llama 2, would approach the upper limits of current iPhone capabilities. To overcome this RAM limitation, Apple’s researchers have devised a novel approach.
Flash storage augmentation
In a research paper titled “LLM in a flash: Efficient Large Language Model Inference with Limited Memory,” Apple’s generative AI researchers introduce a method that leverages an iPhone’s flash storage to supplement the device’s onboard system RAM. While flash storage bandwidth doesn’t match that of LDDR5/X mobile RAM, Apple’s researchers have ingeniously surmounted this inherent limitation. Their method combines “windowing,” which enables the AI model to reuse previously processed data stored on flash storage, with “row-column bundling,” a technique that optimally organizes LLM data for faster read speeds.
Future prospects for generative AI on iPhones
While Apple has yet to unveil an LLM-based product, rumors suggest the imminent arrival of a smarter Siri based on an LLM, set to debut with iOS 18 and run on the next-generation iPhone 16 Pro models. When this materializes, Apple is well-positioned to employ its innovative RAM extension method to deliver an LLM model with the maximum number of parameters feasible for on-device execution.
The generative AI landscape in 2024
As the tech industry continues its relentless pursuit of advancing generative AI capabilities, 2024 appears poised to be the year when generative AI becomes a commonplace feature on smartphones. Samsung, a formidable player in this arena, is gearing up to unveil its enhanced generative AI offerings with the launch of the Galaxy S24 series next month. With Apple’s innovative RAM augmentation method and Samsung’s forthcoming developments, consumers can anticipate a substantial transformation in the capabilities and performance of AI-driven features on their mobile devices.
Apple’s pioneering approach to overcome RAM limitations and facilitate on-device execution of Large Language Models marks a significant step towards making Generative AI a reality on iPhones. As the competitive landscape heats up, with Google’s Gemini Nano and Qualcomm’s Snapdragon 8 Gen 3 making strides, the year 2024 promises to be a turning point for the integration of generative AI into everyday smartphone experiences.
Land a High-Paying Web3 Job in 90 Days: The Ultimate Roadmap