NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Demystifying ARM SME to Optimize General Matrix Multiplications (arxiv.org)
bee_rider 49 minutes ago [-]
I don’t get why they didn’t compare against BLIS. I know you can only do so many benchmarks, and people will often complain no matter what, but BLIS is the obvious comparison. Maybe BLIS doesn’t have kernels for their platform, but they’d be well served by just mentioning that fact to get that question out of the reader’s head.

BLIS even has mixed precision interfaces. But might not cover more exotic stuff like low-precision ints? So this paper could have had a chance to “put some points on the board” against a real top-tier competitor.

my123 41 minutes ago [-]
Section VII.3 has:

> Libraries such as BLIS [19] lack SME support and are therefore excluded from comparison.

bee_rider 31 minutes ago [-]
Ah, reading comprehension failure on my part
dsharlet 38 minutes ago [-]
BLIS doesn't appear to support SME: https://github.com/search?q=repo%3Aflame%2Fblis+mopa&type=co...

Maybe you want a comparison anyways, but it won't be competitive. On Apple CPUs, SME is ~8x faster than a single regular CPU core with a good BLAS library.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 21:41:03 GMT+0000 (Coordinated Universal Time) with Vercel.