After releasing the small language model OpenELM in April, Apple introduced two more compact models, DCLM, claiming their performance to have surpassed that of Mistral 7B and is on par with Google's Gemma, Meta's Llama 3 8B, and Microsoft's Phi-3, while being more resource-efficient.
According to reports fromVentureBeat and AppleInsider, Apple's machine learning team released DataComp for Language Models (DCLMs), which come in two parameter sizes of 1.4B and 7B and are now available on Hugging Face.
DCLM 7B was trained on 2.5 trillion tokens and features a context length of 2048 tokens. Apple claims that in the Multitask Language Understanding (MMLU) benchmark, its DCLM 7B model outperforms Mistral 7B and approaches the performance of models such as Llama 3 8B, Gemma, and Phi-3. DCLM 7B. DCLM 7B surpassed MAP-Neo by 6.6% in benchmarks while requiring 40% less computing power.
The DCLM 1.4B, trained on 2.6 trillion tokens, significantly outperformed similar models, such as Hugging Face's recently released SmolLM 1.7B, Alibaba's Qwen 2B, and Microsoft's Phi 1.5B, in the MMLU benchmark.
Vaishaal Shankar from Apple's machine learning team wrote on X that the DCLM models are the best-performing truly open-source models available today. Shankar defines truly open-source as models with open data, open weight models, and open training code.
Apple has made significant strides in AI with the introduction of Apple Intelligence and Private Cloud Compute at WWDC, addressing previous criticisms of its AI capabilities. The company has also been actively publishing AI research, demonstrating its commitment to the field. While Apple's DCLM models are currently research projects, focused on exploring data curation for language models, the company is partnering with OpenAI to integrate AI features, like GPT-4o mini, into future iOS updates.