Programming Parallel Computers

Open 2025

I8MM9b: SIMD GPU ★★

You need to log in to make submissions.

What you will need to do in this task

Please read the general instructions for this exercise first. Here are the additional instructions specific to this task:

Implement a fast GPU solution using specialized SIMD instructions available on Turing. This is a technique exercise, a valid solution must make use of the __dp4a intrinsic.

What I will try to do with your code

I will first run all kinds of tests to see that your code works correctly. You can try it out locally by running ./grading test, but please note that your code has to compile and work correctly not only on your own computer but also on our machines.

If all is fine, I will run the benchmarks. You can try it out on your own computer by running ./grading benchmark, but of course the precise running time on your own computer might be different from the performance on our grading hardware.

Benchmarks

Name Parameters
benchmarks/1a k = 1000, m = 1000, n = 1000
multiply a 1000 × 1000 matrix with a 1000 × 1000 matrix for a 1000 × 1000 result.
benchmarks/1b k = 1003, m = 999, n = 1001
multiply a 999 × 1003 matrix with a 1003 × 1001 matrix for a 999 × 1001 result.
benchmarks/2a k = 6000, m = 6000, n = 6000
multiply a 6000 × 6000 matrix with a 6000 × 6000 matrix for a 6000 × 6000 result.
benchmarks/2b k = 15040, m = 15040, n = 15040
multiply a 15040 × 15040 matrix with a 15040 × 15040 matrix for a 15040 × 15040 result.
benchmarks/3 k = 24000, m = 24000, n = 24000
multiply a 24000 × 24000 matrix with a 24000 × 24000 matrix for a 24000 × 24000 result.

Grading

In this task your submission will be graded using benchmarks/3: multiply a 24000 × 24000 matrix with a 24000 × 24000 matrix for a 24000 × 24000 result..

The point thresholds are as follows. If you submit your solution no later than on Wednesday, 03 December 2025, at 23:59:59 (Helsinki), your score will be:

Running timePoints
≤ 4.000 sec 1
≤ 3.000 sec 2
≤ 2.000 sec 3
≤ 1.500 sec 4

If you submit your solution after the deadline, but before the course ends on Wednesday, 31 December 2025, at 23:59:59 (Helsinki), your score will be:

Running timePoints
≤ 4.000 sec 1
≤ 3.000 sec 2
≤ 2.000 sec 3
≤ 1.500 sec 4

Contest

Your submissions to this task will also automatically take part in the contest, and you can receive up to 2 additional points if your code is among the fastest solutions this year!

Running timeExtra points
≤ 1.20 × fastest 1
≤ 1.05 × fastest 2