By Ben Dickson — venturebeat — Apple was not mentioned much in the highlights of the 2023 generative AI race. However, it has been doing some impressive work in the field, minus the publicity that other tech companies have engaged in. In the past few months alone, Apple researchers have released papers, models and programming libraries that can have important implications for on-device generative AI. A closer look at these releases might give a hint at where Apple is headed and where it will fit in the growing market for generative AI. Apple is not a hyper-scaler and can’t build a business model based on selling access to large language models (LLMs) running in the cloud. But it has the strongest vertical integration in the tech industry with full control over its entire stack, from the operating system to the development tools and down to the processors running in every Apple device. This places Apple in a unique position to optimize generative models for on-device inference. The company has been making great progress in the field, according to some of the research papers it has released in recent months.
We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below. In January, Apple released a paper titled “LLM in a flash,” which describes a technique for running LLMs on memory-constrained devices such as smartphones and laptops. The technique loads a part of the model in DRAM and keeps the rest in flash memory. It dynamically swaps model weights between flash memory and DRAM in a way that reduces memory consumption considerably while minimizing inference latency, especially when run on Apple silicon. Before “LLM in a flash,” Apple had released other papers that showed how the architecture of LLMs could be tweaked to reduce “inference computation up to three times… with minimal performance trade-offs.” On-device inference optimization techniques can become increasingly important as more developers explore building apps with small LLMs that can fit on consumer devices. Experiments show that a few hundredths of a second can have a considerable effect on the user experience. And Apple is making sure that its devices can provide the best balance between speed and quality.