Skip to content

InJuly7/High-performance-operators

Repository files navigation

硬件介绍

显卡型号 架构/计算能力 FP32算力(TFLOPS) FP16算力(TFLOPS) INT8算力(TFLOPS) FP64算力(TFLOPS) 带宽(GB/s) 显存容量
GTX 1650 Turing/7.5 2.984 TFLOPS(FP32) 5.967 TFLOPS(FP16)(2:1) - 93.24 GFLOPS(FP64)(1:32) 128.1 GB/s 4 GB
Orin NX 16G Ampere/8.6 1.880 TFLOPS (FP32) 3.760 TFLOPS(FP16)(2:1) - 940.0 GFLOPS(FP64)(1:2) 102.4 GB/s 16G
Jetson Tx2 Pascal/6.2 665.6 GFLOPS(FP32) 1,331.2 GFLOPS(FP16)(2:1) - 0.80 GFLOPS(FP64)(1:32) 59.71 GB/s 8 GB
GTX 1050T Pascal/6.1 2.488 TFLOPS(FP32) 38.88 GFLOPS(FP16)(1:64) - 77.76 GFLOPS(FP64)(1:32) 112.1 GB/s 4 GB
Goldwasser-UL ~/6.1 - 8TFLOPS (FP16) 32TOPS INT8 - 38.4 GB/s 8 GB

Reduce

Sgemm

Transpose

About

调优一些常见的算子

Resources

Stars

Watchers

Forks

Languages