Tech
Intel Changes Mobile PC Architecture With Lunar Lake AI SoC
In just a week, we went from only having one option for AI PCs, or what Microsoft calls Copilot+PC, to three. At Computex, Intel announced the new Core Ultra system-on-chip (SoC) codenamed Lunar Lake for mobile PCs, which will exceed the minimum 40 NPU TOPS of performance outlined by Microsoft. While Intel will be just shy of the NPU crown currently claimed by AMD, at 48 AI NPU AI TOPS and 120 platform AI TOPS, Lunar Lake will hold the platform TOPS performance lead, which may be more important for complex AI applications. Please note that TOPS is not a good measure of performance, but it’s the only thing we have to compare these products at this time.
Lunar Lake will have massive improvements over its predecessor Meteor Lake, which was introduced in late 2023. It features significant enhancements in performance in the CPUs, GPU, NPU, memory subsystem, and the overall architecture while being 40% more efficient. According to Intel, Lunar Lake marks a shift in how future Intel PC processors will be designed going forward.
Lunar Lake Overview
Lunar Lake looks completely different than Meteor Lake. It is a two-die/tile configuration, as opposed to four for Meteor Lake, featuring a compute tile and an I/O tile. Both tiles are manufactured by TSMC (the N3 process node for the compute tile and the N6 process for the I/O tile). Intel plans to return to its internal process manufacturing when the company’s products move to Intel’s 18A process as early as 2025, which aligns with the Panther Lake product generation. Using TSMC N3 node for the Lunar Lake compute tile puts Intel on par with AMD.
The new Lunar Lake SoCs will feature eight CPU cores, including four performance (P-cores) CPU cores plus four efficiency (E-cores) CPU core, considerable fewer than the 12-cores used for Meteor Lake. This is not a problem as the new cores are drastically different. The design features an enhanced GPU and NPU, as well as other media support functions. The design also features up to 32GB of on-package DRAM memory, a first for Intel PC processors.
As noted previously, Intel shifted from four tiles – low-power, compute, GPU and I/O, to a two-tile configuration – compute and I/O. This change moves all the processing functions, including the low-power island for the E-cores and NPU, onto the compute tile. Combined with the memory and the need for more power rails for finer-grained power management, the Lunar Lake SoC is paired with four PMICs (power management ICs). According to Intel, the PMICs enable a smaller/simpler design of the system footprint around the SoC, which is not necessarily intuitive.
A little More Detail
Lunar Lake marks the second generation of Intel mobile PC processors that will feature both performance and efficiency CPU cores, referred to as P-cores and E-cores respectively. However, the similarity ends there as Intel made significant improvements to both CPU core types. The new P-cores will NOT have hyperthreading for the first time in many generations. The reason is that hyperthreading limits single thread performance and requires more die area, security, and management overhead for the middle of the performance range. High-performance single thread workloads run better on optimized performance cores without hyperthreading and for low-performance applications, Intel now has the e-cores. So, think of it as a similar effect using optimized cores rather than the virtual cores used in hyperthreading. According to intel, the result is a step function in performance and efficiency. The new smaller, more optimized, and higher frequency P-cores provide 15% more performance/watt, 10% more performance/die area, and 30% more performance/power/area.
In addition, Intel increased the amount of cache memory and moved to a three-level cache hierarchy. Other enhancements include improvements to the branch prediction block and fetch and decode capabilities, the way in which cells are organized, finer-grain power control, support for post-quantum cryptography, and AI instructions enhancement. The P-core was rearchitected from the ground up to be more efficient as well as more performant. According to Intel the result is an improvement of 14% IPC (instructions per cycle) from the previous generation P-core used in Meteor Lake.
While the enhancements to the P-core architecture are impressive, the real star of Lunar Lake is the new E-core, Intel increased everything on the E-cores to make it perform better while keeping it small and efficient. All the enhancements would take a page of details and much longer to explain. Intel scaled the E-core up from a limited function CPU core to a full CPU core with broader frequency range and paired it with L2 cache that doubled in bandwidth and size to 4MB. The result is a 38% improvement in integer processing and a 68% improvement in floating point processing over the Meteor Lake E-core. The performance/power curves compared to the previous generation e-cores are very impressive yielding 70% improvement in single-treaded performance at a third of the power and a 100% improvement in single-threaded peak performance. It also yields a 190% improvement in multithreaded performance at a third of the power and a 300% increase in peak performance! Even if you doubled the number of E-cores on Meteor Lake they would not equal this performance. And the higher E-core performance reduces the need for using the P-cores on many workloads. Intel also indicated that in conjunction with Microsoft, it enhanced Windows’ Thread Director to improve the management of workloads across the P-cores and E-cores.
Intel also focused on improving the performance for graphics and media, which includes the GPU, a media processing core, and a display processing core. The new Lunar Lake GPU is based on the Xe2 graphics architecture. The nice thing about the second generation of a new graphics processing architecture is it provides the opportunity to not only improve performance but to also work through all the hardware and software bottlenecks that were not known with the new architecture. The same holds true for Xe2. The new Lunar Lake version of the GPU has eight Xe2 cores with a total of 64 vector engines. It also supports Intel’s 16-bit SIMD instructions, XMX vector instructions and data types ranging from INT2 to FP32. The result is a 50% increase in overall GPU performance and 67 TOPS of AI performance.
The NPU also received a major overhaul. Intel scaled up the number of neural engines from two to six and increased the performance and efficiency of each engine. The result is a peak performance of 48 TOPS with twice the efficiency of the previous generation.
All For AI
A key theme for the Lunar Lake design was AI performance efficiency. Intel highlighted the fact that it is likely that the CPU, GPU, and NPU cores will all be used for AI processing because there are different types of AI workloads. The low-power NPU is best suited for those always-on AI tasks, while the CPUs are well suited for low-latency bursty AI tasks and the GPU is the best suited for the heavily lifting tasks like running generative AI large language models (LLMs). And in the short-term, the GPU is the most likely engine that software developers will target when designing generative applications with on-device AI because they are familiar with programing of GPUs. So, it seems logical to have a high-performance GPU.
Final Thoughts
According to Intel, PCs with the Lunar Lake SoCs will be available in the coming months from all the major OEMs in the form of more than 80 new thin and light mobile PC designs. The mobile PCs will also feature the latest connectivity features, such as Wi-Fi 7. We will have to wait and see how many support the Microsoft Copilot key. Note that Tirias Research considers all of the platforms AI PCs, not just Copilot+PCs.
As we have seen with the other PC SoC announcements, AMD, Intel, Qualcomm, and even Apple are shifting SoC design to focus on not just AI performance, but AI efficiency. While it is still too early for applications to leverage this performance, it sets the groundwork for new applications like Microsoft’s Copilot agent/assistant and other applications to leverage this new found performance. And if history is any indication, software developers will use up this new found performance and ask for more before the next generation of PC SoCs are available next year. And we will see a bit of leap frogging in the SoC specifications. However, the true test will come when we have the opportunity to evaluate these new AI PCs.
Just as it has in the data center, AI is changing the architecture for PC and other personal electronic devices to provide a fast, secure, and personalized AI experience. Tirias Research believes that this will be a market inflection point that will determine the fate of many SoC, software and systems companies over the next few years.