The TODO section for this project is broken into course and fine grained tasks.
- Memory manager for handling host, device, and managed memory
- Tests for memory manager
- Tensor wrapper around memory manager for multi-axis data storage
- Tests for tensor
- Create Make file and Build/Install system
- Creates Docs and doxygen config
- Compute graph and basic operations for tensors
- Tests for compute graph and tensor operations
- Link with BLAS/LAPACK (OpenBLAS?) and MAGMA
- Better MKL support
- Basic Layer classes (Dense, Activation, Flatten, CNN)
- Model with forward/backward propagation
- Tests for Model/Layer training
- Optimizers
- Tests for Optimizers
- Parallel training (Multi-GPU)
- Tests for parallel training
- Examples in Examples/ folder
- Tutorial / Presentation Slides
- Automatic or numerical gradient computations
- Test gradient computations
- I/O methods for Tensors
- Tests for tensor I/O
- Batch Loaders
- 100% Documentation
- Establish/Connect with build Pipeline
- Preprocessing methods (PCA, LDA, encoding)
- Tests for preprocessing methods
- Implement RNN
- Tests for RNN
- Compute graph optimizers/minimizers
- Hyperparameter Optimization tools
- Tests for hyperparameter optimization tools
- Package/Install configuration (deb packages, etc...)
- Tune compilation and runtime parameters to hardware
- Test on different hardwares (intel, amd, nvidia)
- OpenCL support (possibly? perhaps with different BLAS)
- AMD support (work with Frontier)
- Ensure CPU and GPU training results are the same.
- Revise memory system with compute graph and tensors. Check with gdb. Possibly replace MemoryManager pointers with reference-counting smart pointers.
- remove unused operation _internal files.
- check and fix speed of get/set with vector access
- tensor axis iterators
- CPU only convolution
- Fast ReduceSum
- Scalar Network output bug (CuDNN reduce sum issue)
- Adam
- h5 and/or onnx model load/save
- NeuralNetwork add constructor with custom loss function