Khazen

By  Ben Dickson — venturebeat — Apple was not mentioned much in the highlights of the 2023 generative AI race. However, it has been doing some impressive work in the field, minus the publicity that other tech companies have engaged in. In the past few months alone, Apple researchers have released papers, models and programming libraries that can have important implications for on-device generative AI. A closer look at these releases might give a hint at where Apple is headed and where it will fit in the growing market for generative AI. Apple is not a hyper-scaler and can’t build a business model based on selling access to large language models (LLMs) running in the cloud. But it has the strongest vertical integration in the tech industry with full control over its entire stack, from the operating system to the development tools and down to the processors running in every Apple device. This places Apple in a unique position to optimize generative models for on-device inference. The company has been making great progress in the field, according to some of the research papers it has released in recent months.

We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below. In January, Apple released a paper titled “LLM in a flash,” which describes a technique for running LLMs on memory-constrained devices such as smartphones and laptops. The technique loads a part of the model in DRAM and keeps the rest in flash memory. It dynamically swaps model weights between flash memory and DRAM in a way that reduces memory consumption considerably while minimizing inference latency, especially when run on Apple silicon. Before “LLM in a flash,” Apple had released other papers that showed how the architecture of LLMs could be tweaked to reduce “inference computation up to three times… with minimal performance trade-offs.” On-device inference optimization techniques can become increasingly important as more developers explore building apps with small LLMs that can fit on consumer devices. Experiments show that a few hundredths of a second can have a considerable effect on the user experience. And Apple is making sure that its devices can provide the best balance between speed and quality.

Open-source models

Apple has also released several open-source generative models in the past few months. Ferret, silently released in October, is a multi-modal LLM that comes in two sizes: 7 billion and 13 billion parameters. The model is built on top of Vicuna, an open-source LLM, and LLaVA, a vision-language model (VLM). While multi-modal models usually analyze an input image in its entirety, Ferret has a special mechanism that enables it to generate its responses based on a specific area of the image. Ferret is especially good at handling small objects and details within images. It can potentially become the basis of a model that enables users to interact with objects they view through their iPhone camera or Vision Pro devices.

More recently, Apple released MLLM-Guided Image Editing (MGIE), a model that can modify images based on natural language commands. The capabilities of MGIE range from image-wide modifications such as changing the brightness and contrast, to edits on specific regions, such as “make the sky more blue,” or work on specific objects in the image. These features can be a good addition to the next generation of iOS devices. Apple is not renowned for embracing the open-source culture. The license for Ferret states that it can only be used for research purposes. However, the release of the models can help create traction for Apple’s future releases and prepare the developer community to build applications for Apple’s products. Once a model is released to the public, developers usually find ways to use it in ways that its creators had not thought of and provide important guidance on how to improve it or integrate it into existing products.

In December, Apple released MLX, a library for working with machine learning models. MLX uses the familiar interfaces of Python libraries such as NumPy and PyTorch, which makes it easy to use for machine learning developers. However, MLX has been optimized for Apple processors like M2 and M3. MLX uses the concept of “shared memory,” where machine learning (ML) models can be split between different types of memory as opposed to running either on CPU or GPU memory. It also uses other techniques to make models more memory efficient without incurring a speed penalty. This is in line with the research Apple has been doing to run large models on memory-constrained devices.

Apple has created the library in a way that makes it easy for developers to port code from other popular libraries and optimize them for Apple devices. MLX has also been released under the MIT license, which means it can be used for commercial purposes All signs indicate that Apple is creating the grounds for a platform shift that will enable it to become a major player in on-device generative AI. Apple has strong research and engineering teams that can work together to optimize models for Apple’s processors and create the next generation of chips that are better suited for Apple’s models and developer tools.

So, while Apple might not have a direct competitor to GPT-4 or its successor, it has everything it needs to power the next LLM that is running on your phone or watch.