fill out makefile. add dev pipeline. gitignore data dir. add logger.py. fill out readme.md. export env.yml.
Mimimal Viable Deep Learning Infrastructure
Deep learning pipelines are hard to reason about and difficult to code consistently.
Instead of remembering where to put everything and making a different choice for each project, this repository is an attempt to standardize on good defaults.
Think of it like a mini-pytorch lightening, with all the fory internals exposed for extension and modification.
Usage
Install:
Install the conda requirements:
make install
Which is a proxy for calling:
conda env updates -n ml_pipeline --file environment.yml
Run:
Run the code on MNIST with the following command:
make run
Tutorial
The motivation for building a template for deep learning pipelines is this: deep learning is hard enough without every code baase being a little different.
Especially in a research lab, standardizing on a few components makes switching between projects easier.
In this template, you'll see the following:
directory structure
-
src/model -
src/config -
data/ -
test/- pytest: unit testing.
- good for data shape
- TODO:
- pytest: unit testing.
-
docs/- switching projects is easier with these in place
- organize them
-
**/__init__.py- creates modules out of dir.
import moduleworks with these.
-
README.md- root level required.
- can exist inside any dir.
-
environment.yml -
Makefile- to install and run stuff.
- houses common operations and scripts.
-
launch.sh- script to dispatch training.
testing
if __name__ == "__main__".- good way to test things
- enables lots breakpoints.
config
- Hydra config.
- quickly experiment with hyperparameters
- good way to define env. variables
- lr, workers, batch_size
- debug
data
- collate functions!
formatting python
- python type hints.
- automatic linting with the
blackpackage.
running
- tqdm to track progress.
architecture
- dataloader, optimizer, criterion, device, state are constructed in main, but passed to an object that runs batches.