Skip to content

This repo contains different implementations of sgemv() and dtpmv() functions & comparision between different execution times obtained.

Notifications You must be signed in to change notification settings

i-am-g2/cuda_sgemv_dtpmv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TestCases

To generate test cases, run the Test_Generator.py -N file, where N is the dimenstion of the matrix

How to Run

Run the run.sh script present in each directory to run the experiments.

./run.sh

Individual Run

To compile each program individually, see the run.sh . Preferable method is to comment out unnecessary part of run.sh files.

Plot

Plots can be generated by Gen_Plots.py, exeution time has to be filled manually in the script.

Observation

sgemv

Note : Y_Axis shows exection time in ms, and X_axis shows dimension of matrix [2^i X 2^i]

sgemv

  • sgemv : CPU with 8 thread pools were beating, cuda's Cuda experiments.
  • CPU_GO_Single: Single threaded CPU implementation in Go
  • CPU_N_THREAD : Multiple threads spawned simpultanously on CPU
  • GPU : GPU implementation
  • CPU_8_Pooled_THREAD : a pool of 8 worker threads
  • CPU_O3 : C code compiled with -O3 optimisation enabled in gcc
  • CPU : C code compiled with no optimsation
  • GPU_CUBLAS : CUBLAS Library

dtpmv

sgemv

sgemv

  • GPU : GPU implementation

  • GPU_CUBLAS : CUBLAS Library

  • CPU : C code compiled with no optimsation

  • CPU_O3 : C code compiled with -O3 optimisation enabled in gcc

  • dtpmv : cublas impementation was the slowest.

Limitiation

Experiments of matrix above size 1024 X 1024 cannot be performed (on my pc) due to memory limitation.

About

This repo contains different implementations of sgemv() and dtpmv() functions & comparision between different execution times obtained.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published