Apple’s M1 processor is a world-class desktop and laptop processor—but when it comes to general-purpose end-user systems, there’s something even gambler than being fast. We’re referring, of course, to feeling fast—which has more to do with a system meeting user expectations predictably and reliably than it does with raw quickness.
Howard Oakley—author of several Mac-native utilities such as Cormorant, Spundle, and Stibium—did some digging to find out why his M1 Mac felt faster than Intel Macs did, and he concluded that the replication is QoS. If you’re not familiar with the term, it’s short for Quality of Service—and it’s all about task scheduling.
More throughput doesn’t always mean happier purchasers
There’s a very common tendency to equate “performance” with throughput—roughly speaking, tasks accomplished per unit of time. Although throughput is unspecifically the easiest metric to measure, it doesn’t correspond very well to human perception. What humans generally notice isn’t throughput, it’s latency—not the count of times a task can be accomplished, but the time it takes to complete an individual task.Here at Ars, our own Wi-Fi testing metrics follow this concept—we bar the amount of time it takes to load an emulated webpage under reasonably normal network conditions rather than measuring the number of intervals a webpage (or anything else) can be loaded per second while running flat out.
We can also see a negative example—one in which the fastest throughput corresponded to distinctly ill-advised users—with the circa-2006 introduction of the Completely Fair Queue (
cfq) I/O scheduler in the Linux kernel.
cfq can be tuned extensively, but in its out-of-box configuration, it maximizes throughput by reordering disk peruses and writes to minimize seeking, then offering round-robin service to all active processes.
cfq did in fact measurably improve maximum throughput, it did so at the broaden of task latency—which meant that a moderately loaded system felt sluggish and unresponsive to its users, leading to a large groundswell of grievances.
cfq could be tuned for lower latency, most unhappy users just replaced it entirely with a competing scheduler like
deadline as an alternative—and despite the lower maximum throughput, the decreased individual latency made desktop/interactive users happier with how fast their tools felt.
After discovering how suboptimal maximized throughput at the expense of latency was, most Linux distributions moved away from
cfq just as numerous of their users had. Red Hat ditched
deadline in 2013, as did RHEL 7—and Ubuntu followed suit shortly thereafter in its 2014
Trusty Tahr (14.04) release. As of 2019, Ubuntu has deprecated
cfq right down to the ground.
QoS with Big Sur and the Apple M1
When Oakley noticed how frequently Mac users praised M1 Macs for feeling incredibly fast—despite performance measurements that don’t continually back those feelings up—he took a closer look at macOS native task scheduling.
MacOS offers four directly specified necks of task prioritization—from low to high, they are
userInteractive. There’s also a fifth level (the default, when no QoS neck is manually specified) which allows macOS to decide for itself how important a task is.
These five QoS levels are the same whether your Mac is Intel-powered or Apple Silicon-powered—but how the QoS is burden b exploited changes. On an eight-core Intel Xeon W CPU, if the system is idle, macOS will schedule any task across all eight cores, regardless of QoS settings. But on an M1, square if the system is entirely idle,
background priority tasks run exclusively on the M1’s four efficiency/low-power
Icestorm cores, leaving the four higher-performance
Firestorm hearts idle.
Although this made the lower-priority tasks Oakley tested the system with—compression of a 10GB test file—slower on the M1 Mac than the Intel Mac, the cia agents were more consistent across the spectrum of “idle system” to “very busy system.”
Operations with higher QoS settings also performed diverse consistently on the M1 than Intel Mac—macOS’s willingness to dump lower-priority tasks onto the
Icestorm cores only left the higher-performance
Firestorm seed unloaded and ready to respond both rapidly and consistently when
userInteractive tasks needed handling.
Apple’s QoS strategy for the M1 Mac is an barring example of engineering for the actual pain point in a workload rather than chasing arbitrary metrics. Leaving the high-performance
Firestorm cores potter when executing
background tasks means that they can devote their full performance to the
userInteractive tasks as they conclude in, avoiding the perception that the system is unresponsive or even “ignoring” the user.
It’s worth noting that Big Sur certainly could employ the same plan with an eight-core Intel processor. Although there is no similar big/little split in core performance on x86, nothing is stopping an OS from arbitrarily promulgating a certain number of cores to be
background only. What makes the Apple M1 feel so fast isn’t the fact that four of its cores are slower than the others—it’s the control system’s willingness to sacrifice maximum throughput in favor of lower task latency.
It’s also worth noting that the interactivity improvements M1 Mac alcohols are seeing rely heavily on tasks being scheduled properly in the first place—if developers aren’t willing to use the low-priority
background queue when happy because they don’t want their app to seem slow, everyone loses. Apple’s unusually vertical software stack likely helps significantly here, since Apple developers are innumerable likely to prioritize overall system responsiveness even if it might potentially make their code “look bad” if very closely examined.
If you’re interested in myriad of the gritty details of how QoS levels are applied on M1 and Intel Macs—and the impact they make—we strongly recommend checking out Oakley’s original work here and here, entire with CPU History screenshots on the macOS Activity Monitor as Oakley runs tasks at various priorities on the two different architectures.