RM has officially presented its next-generation CPU microarchitecture, the Cortex-A77, that we will most definitely see in next wave of flagship smartphone chipsets from Qualcomm (Likely Snapdragon 865) and Samsung. The Cortex-A77 isn’t as big a change as the last years’ Cortex-A76 was over Cortex-A75, but ARMs numbers project a solid 20 to 35 percent of performance improvement.
What’s interesting is that since the process node (7nm) and peak clock frequency (3GHz) remain the same as Cortex-A76, all of these improvements are rooted in clever improvements in microarchitecture. Which is to say, the Cortex A76 is a solid groundwork for when ARM switches to a more power efficient 5nm process with its “Hercules” design in 2020.
This year, ARM has focused on delivering the best PPA (Power, Performance, and Area) and delivering better Instruction Per Clock execution than Cortex-A76. A number of changes have been made to realize these goals. Let’s see how much Cortex-A77 has improved over the last year’s Cortex-A76.
Cortex-A77 vs Cortex-A76: Front End Changes
- Branch Predictor bandwidth has been increased from 32Bytes/Cycle to 64Bytes/Cycles resulting in higher fetch bandwidth.
- Branch predictor design has been changed for better prediction and higher efficiency.
- Branch target buffer has been increased from 6K (already big enough) to 8K.
- A new Macro-OP L0 Cache (1.5K entries) for decoded instructions has been added in the initial phase. This should help lower mispredict penalty and speed things up. ARM Claims 85% + hit rate across a diverse load.
Cortex-A77 vs Cortex-A76: Middle core
- In the middle-core, ARM has increased decoder width from 4-wide to 6-wide.
- To accommodate the increase, the reorder buffer can now hold 160-entries (up from 128) or 160-instructions to be processed and reorder them for better efficiency.
- The changes in the middle-core are similar to changes Qualcomm did to Cortex-A76 for its customized Kryo 485 cores (Snapdragon 855).
Cortex-A77 vs Cortex-A76: Back-end execution
- As the front and middle-core get wider bandwidth, the back-end execution had to be increased to handle the added workload.
- There is a 50% increase in integer execution. ARM has added a fourth integer ALU to support single-cycle and complex 2-cycle operations.
- A second AES encryption pipe has been added as well.
- Execution pipelines and Floating-point pipelines don’t see any major change.
- There are still two load-store units, but two additional dedicated store ports have been added to both, thus doubling issue bandwidth.
- ARM has also worked on better management of cache hierarchy and prefetched lines to improve system performance.
Cortex-A77 vs Cortex-A66: What to expect from next-generation flagship chipsets?
On Benchmark platforms, we can see around a 20% increase in performance and a significant improvement in floating point performance, which should help with better web-browsing.
Both the Cortex-A76 and Cortex-A77 will take the same energy to execute the same workload. But the higher performance of Cortex-A77 will draw more power in practical use, which is why we expect their number to be limited to 2 in high-performance clusters of upcoming chipsets.
Since Cortex-A77 supports DynamIQ, chipset makers will have more flexibility in deciding core configurations and a tri-cluster configuration (similar to that of Snapdragon 855) seems well-suited.
The first set of chipsets using ARM Cortex-A77 cores should arrive by the end of this year. With Huawei’s Hi-silicon under scrutiny, the first SoC is expected to come from Qualcomm.