Telegram Group Search
NVIDIA released A100 TENSOR CORE GPU
TPUv3 be like :)

Some key features
- Sparsity optimized tensorcore
- 40 GB HBM2 and 40 MB L2 cache
- Multi-Instance GPU (MIG)
- Third-generation NVLink has a data rate of 50 Gbit/sec per signal pair, nearly doubling the 25.78 Gbits/sec rate in V100
- Support for NVIDIA Magnum IO and Mellanox interconnect solutions

YouTube presentation by CEO Jensen Huang

Technical Details
NVIDIA Ampere Architecture In-Depth
#notml
А bit of space news, but not about Crew Dragon as you may expect :)
It was 14 years ago, when Xilinx released it's previous radiation tolerance (RT) FPGAs, acceptable for space application - Virtex 5 series. And finally, new successor is coming - Artix 7 Ultra Scale RT.
https://www.xilinx.com/support/documentation/white_papers/wp523-xqrku060.pdf
Stratix-10-NX-Tehnology-Brief.pdf
763.2 KB
AI-Optimized FPGA for High-Bandwidth, Low-Latency AI Acceleration
The Intel® Stratix® 10 NX FPGA delivers a unique combination of capabilities needed to implement customized hardware with integrated high-performance artificial intelligence (AI). These capabilities include:

High-Performance AI Tensor Blocks
- Up to 15X more INT8 throughput than Intel Stratix 10 FPGA digital signal processing (DSP) block for AI workloads
- Hardware programmable for AI with customized workloads

Abundant Near-Compute Memory
- Embedded memory hierarchy for model persistence
- Integrated high- bandwidth memory (HBM)

High-Bandwidth Networking
- Up to 57.8 G PAM4 transceivers and hard Ethernet blocks for high efficiency
- Flexible and customizable interconnect to scale across multiple nodes
https://www.economist.com/technology-quarterly/2020/06/11/the-cost-of-training-machines-is-becoming-a-problem

The growing demand for computing power has fuelled a boom in chip design and specialised devices that can perform the calculations used in AI efficiently. The first wave of specialist chips were graphics processing units (GPUs), designed in the 1990s to boost video-game graphics. As luck would have it, GPUs are also fairly well-suited to the sort of mathematics found in AI.

Further specialisation is possible, and companies are piling in to provide it. In December, Intel, a giant chipmaker,
bought Habana Labs, an Israeli firm, for $2bn. Graphcore, a British firm founded in 2016, was valued at $2bn in 2019. Incumbents such as Nvidia, the biggest GPU-maker, have reworked their designs to accommodate AI. Google has designed its own “tensor-processing unit” (TPU) chips in-house. Baidu, a Chinese tech giant, has done the same with its own “Kunlun” chips. Alfonso Marone at KPMG reckons the market for specialised AI chips is already worth around $10bn, and could reach $80bn by 2025.

“Computer architectures need to follow the structure of the data they’re processing,” says Nigel Toon, one of Graphcore’s co-founders. The most basic feature of AI workloads is that they are “embarrassingly parallel”, which means they can be cut into thousands of chunks which can all be worked on at the same time. Graphcore’s chips, for instance, have more than 1,200 individual number-crunching “cores”, and can be linked together to provide still more power. Cerebras, a Californian startup, has taken an extreme approach. Chips are usually made in batches, with dozens or hundreds etched onto standard silicon wafers 300mm in diameter. Each of Cerebras’s chips takes up an entire wafer by itself. That lets the firm cram 400,000 cores onto each.

Other optimisations are important, too. Andrew Feldman, one of Cerebras’s founders, points out that AI models spend a lot of their time multiplying numbers by zero. Since those calculations always yield zero, each one is unnecessary, and Cerebras’s chips are designed to avoid performing them. Unlike many tasks, says Mr Toon at Graphcore, ultra-precise calculations are not needed in AI. That means chip designers can save energy by reducing the fidelity of the numbers their creations are juggling. (Exactly how fuzzy the calculations can get remains an open question.)

All that can add up to big gains. Mr Toon reckons that Graphcore’s current chips are anywhere between ten and 50 times more efficient than GPUs. They have already found their way into specialised computers sold by Dell, as well as into Azure, Microsoft’s cloud-computing service. Cerebras has delivered equipment to two big American government laboratories.
2025/07/03 11:59:40
Back to Top
HTML Embed Code: