Cache Memory Performance

14d

New KV cache compaction technique cuts LLM memory 50x without accuracy loss

MIT researchers developed Attention Matching, a KV cache compaction technique that compresses LLM memory by 50x in seconds — ...

Penguin Solutions Introduces Industry's First Production-Ready CXL-Based KV Cache Server

Penguin Solutions today announced its MemoryAI KV cache server, the industry's first production-ready KV cache server utilizing CXL memory technology.

TMCnet

Breaking the 100M Token Limit: EverMind's MSA Architecture Achieves Efficient End-to-End Long-Term Memory for LLMs

This approach can be viewed as a memory plug-in for large models, providing a fresh perspective and direction for solving the ...

Nvidia says it can shrink LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.

Marvell Launches Next-generation CXL Switch, Enabling Memory Pooling to Break Through the AI "Memory Wall"

Marvell Technology, Inc. (NASDAQ: MRVL), a leader in data infrastructure semiconductor solutions, today announced Marvell® ...

EDN

Last-level cache has become a critical SoC design element

As AI workloads extend across nearly every technology sector, systems must move more data, use memory more efficiently, and respond more predictably than traditional design methodologies allow. These ...

10d

New memory architecture targets AI inference bottlenecks

Lightbits Labs Ltd. today is introducing a new architecture aimed at addressing one of the most stubborn bottlenecks in large-scale artificial intelligence inference: the growing mismatch between the ...

TMCnet

Penguin Solutions' OriginAI Factory Platform Delivers Optimized Performance for AI Inference

OriginAI inference solutions are designed leveraging Penguin Solutions 3.3+ billion hours of GPU runtime experience and more ...

YouTube on MSN

Outrageous performance! AMD Threadripper Pro 5000 in 2023

AMD's Threadripper Pro 5000 series hit the diy market last year, and today Luke takes a look at the 5995WX, 5975WX, and ...

FullCleared on MSN

AMD's 9800X3D processor anchors this high-memory gaming build

The latest Area-51 desktop from Alienware centers around AMD’s Ryzen 7 9800X3D, an 8-core processor with 104MB of total cache designed for gaming workloads. Paired with an RTX 5080 graphics card, 64GB ...

11d

Testing Apple’s 2026 16-inch MacBook Pro, M5 Max, and its new “performance” cores

M5 Pro and M5 Max both use the same 18-core CPU die, but Pro uses a 20-core GPU die, and Max gets a 40-core GPU die. (Because the memory controller is also part of the GPU die, the Max chip still ...

New Electronics

Marvell introduces new CXL switch to tackle AI memory bottlenecks

Marvell introduces the Structera S 30260, a next‑generation CXL switch designed to enable rack‑level memory pooling.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results