Data analytics may be the next HPC for IBM Power

In the next few months, Big Blue will launch its entry-level and mid-range Power10 servers, and to be frank, we don’t know what the HPC and AI angle will be for these systems. This is peculiar and inconsistent with the previous two decades of IBM’s Power Platform history.

Under normal circumstances, we would already be having conversations with IBM techs and top brass about how Power10 can transform HPC and AI, and none of that happened. Thus, the launch of these clusterable Power10 machines – unlike the “Denali” Power E1080 big iron, shared memory NUMA server announced last September – is not like the launch of Power9, which was very focused on HPC and AI workloads, even though Power9 income were still dominated by machines dedicated to the transactional processing of back-office applications running on relational databases.

The Power9 processor was first deployed, in December 2017, in Oak Ridge National Laboratories’ “Summit” hybrid CPU-GPU supercomputer, with Power AC922 system which packed two Power9 processors and four or six Nvidia “Volta” V100 GPUs into a 2U form factor. The Power9 chip was then rolled out to entry-level and mid-range systems in the spring of 2018, and then integrated into the Power E980 NUMA machine in the fall of the same year. The Power10 processor launches in reverse order, and with barely a glimpse of HPC or AI training, but some are talking about the importance of native AI inference for customers when considering Power10 platforms.

Last fall, the word on the street, based on statements to key dealers, partners and customers, was that entry machines with one or two Power10 outlets and the only midrange machine with four outlets would come out first half of 2022. The machines were expected to launch at the PowerUP 2022 conference that the Power Systems User Group and IBM are hosting in a few weeks in New Orleans. That won’t happen, and instead IBM made related software announcements this week for the Power Systems platform, primarily for the proprietary IBM i operating system and the embedded database that underpins the ERP applications at some 120,000 unique customers worldwide.

According to the latest rumors, IBM launched the entry-level and mid-range Power10 machines in June and shipped them at the end of July – we heard July 27 as the specific general availability date. We caught up with Steve Sibley, vice president and head of global offerings for Power Systems, and he warned us that IBM very rarely in its history announces products in a quarter and then ships them to the during another term. What we’re taking the hint from is that these Power10 machines will be released in early July and ship in late July. July 11 seems likely since the previous week is a holiday for many in the United States.

What seems clear to us is that IBM is focused on Power10 uses for its AIX and IBM i customers and has yet to demonstrate that the Red Hat stack performs best and cheapest on the platform. Power. We think that will eventually come, but that’s not the strategy yet. For now, IBM seems content with trying to push the IBM i stores forward, push the AIX stores forward, and get all the Linux sales it can on Power and make about $3.5 billion a year.

Here’s our model of how the Power Systems platform fits into IBM’s financials, which we can only count on an annual basis given what Big Blue said on Wall Street and our own modeling earlier before the company changed its financial reporting categories in January of this year. :

As usual, the data in bold red italics is estimated, and some estimates have multiple paths to arrive at the same number and we obviously place more confidence in them.

On the HPC front, the most important factor in Power Systems’ revenue stream and profitability is that the IBM-Nvidia tag team is releasing Power10 “Cirrus” chips plus “Hopper” GPU accelerators, very probably with clusters tied together using 400 Gb/sec InfiniBand but possibly only 200 Gb/sec InfiniBand, didn’t win the contracts for the “Frontier” supercomputer at Oak Ridge or the “El Capitan” supercomputer at Lawrence Livermore National Laboratory, for many reasons that we have discussed many times in the past.

From what we can tell, neither IBM nor Nvidia have much regret here because, as we’ve argued many times in the past, these capacity-class supercomputer deals are resource-intensive and we don’t think not that they generate profits. anything. Such deals generate a lot of noise and are great for public relations and political support, and are best viewed as sunk research and development costs that can lead to commercialized systems later. This didn’t really happen with the supercomputers built by Fujitsu in Japan and IBM in the United States. There have been some sales of Power7, BlueGene/Q and Power AC922 clusters to industry and government outside the main HPC centers, and similarly Fujitsu has sold smaller versions of the “K” and “Fukagu” systems that he built for RIKEN Lab in Japan at various university and government labs. But it’s not a big business either way. The scale of technology trickle from above that we absolutely expected when we founded The next platform over seven years ago did not happen. It’s really a trickle, not a steady stream, and definitely not a torrent.

None of this means that HPC, especially coupled with AI and data analytics or at least sharing the same architecture, can not be a good profitable business in the business. Along with supporting relational databases and systems of record applications, this is the primary systems business that Big Blue handles today.

IBM isn’t playing Core Wars against AMD, Ampere Computing, and Amazon Web Services, which have CPUs with lots of cores, but it’s definitely playing Thread Wars and for thread-enabled workloads – Java application servers, databases and data stores, and skinny containers or virtual machines immediately come to mind – IBM can put a box in the field that can hold its own in terms of throughput, memory bandwidth, and server bandwidth. I/O compared to anything anyone else is selling.

It will be interesting to see, indeed, what IBM does with what we call the Power E1050 – we don’t yet know its codename. This is the four-socket machine with an optional dual-chip Power10 module that will pack 30 cores and 240 threads per socket and scale to four tightly coupled sockets that have 120 cores and 960 threads.

In the table above, the Power E1080 layout is at the top and the Power E1050 layout is at the bottom. In the Power E1080 topology, IBM has a pair of high-speed links that couple the chips inside the DCM very tightly, and then a single link that connects each Power10 chip to the other seven Power chips in the compute complex in an all-to-all topology. The processors should run at around 3.5GHz, which is a 12.5% ​​reduction in clock to reduce thermals. Each Power10 chip has sixteen x8 open memory interface ports that operate at 32 GT/sec and provide a theoretical peak of 1 TB/sec of memory bandwidth. With two DIMMs per port IMO and a reasonable 128 GB DDR4 DIMM with a buffer that will initially be sold with Power10 machines, we are talking about 4 TB per Power10 chip and 410 GB/sec of bandwidth. The full Power E1050 machine therefore has a theoretical peak of 8TB/sec of memory bandwidth across its 128 memory ports IMO, but will deliver 3.2TB/sec in the initial configuration, which will cap out at 32TB of capacity compared to this bandwidth. Balancing this memory bandwidth is 32 lanes of PCI-Express 5.0 I/O capacity per Power10 chip, resulting in 256 lanes of PCI-Express 5.0 I/O per Power E1050 system.

There are all sorts of possibilities for this machine, given its compute, memory, and I/O capability. But it becomes even more interesting when you consider the possibility of using a Power E1050 as a shared memory server for a cluster of Power S1022 entry compute nodes that have no memory of their own. Imagine the architecture of a system using the “memory inception” memory clustering technology integrated into the Power10 chipswhich we called a memory area network.

Instead of moving data between nodes in a cluster over InfiniBand or Ethernet with RoCE, you can have them share data on a central memory server and perform collective operations on that memory server as needed. , and to entrust calculations to less expensive nodes, if necessary. The memory area network replaces the clustering network and data sharing should, if the programming model is correct, be easier because you only move it if you really need it.

IBM can even take memory zone networking to the extreme and link thousands of Powqer10 ingress server nodes, addressing up to 2 PB of total capacity across the cluster. Like that:

IBM hasn’t committed to releasing all the features inherent in its Memory Area Network for Power10, but as we said nearly two years ago when it was disclosed, it could be the basis for a new type of HPC system, which would be good at data analysis as well as traditional HPC and AI training and inference work (if some of the systems in a memory array were equipped with GPU accelerators).

Hopefully IBM finds smart ways to use technology in its entry-level and mid-range Power10 machines to build its Power Systems business and expand it from there, and not just sell motors from database to IBM i and AIX stores.

Leave a Comment