0f12b26e40
fill out makefile. add dev pipeline. gitignore data dir. add logger.py. fill out readme.md. export env.yml. |
||
---|---|---|
data | ||
src | ||
test | ||
.gitignore | ||
Makefile | ||
README.md | ||
environment.yml | ||
launch.sh |
README.md
Mimimal Viable Deep Learning Infrastructure
Deep learning pipelines are hard to reason about and difficult to code consistently.
Instead of remembering where to put everything and making a different choice for each project, this repository is an attempt to standardize on good defaults.
Think of it like a mini-pytorch lightening, with all the fory internals exposed for extension and modification.
Usage
Install:
Install the conda requirements:
make install
Which is a proxy for calling:
conda env updates -n ml_pipeline --file environment.yml
Run:
Run the code on MNIST with the following command:
make run
Tutorial
The motivation for building a template for deep learning pipelines is this: deep learning is hard enough without every code baase being a little different.
Especially in a research lab, standardizing on a few components makes switching between projects easier.
In this template, you'll see the following:
directory structure
-
src/model
-
src/config
-
data/
-
test/
- pytest: unit testing.
- good for data shape
- TODO:
- pytest: unit testing.
-
docs/
- switching projects is easier with these in place
- organize them
-
**/__init__.py
- creates modules out of dir.
import module
works with these.
-
README.md
- root level required.
- can exist inside any dir.
-
environment.yml
-
Makefile
- to install and run stuff.
- houses common operations and scripts.
-
launch.sh
- script to dispatch training.
testing
if __name__ == "__main__"
.- good way to test things
- enables lots breakpoints.
config
- Hydra config.
- quickly experiment with hyperparameters
- good way to define env. variables
- lr, workers, batch_size
- debug
data
- collate functions!
formatting python
- python type hints.
- automatic linting with the
black
package.
running
- tqdm to track progress.
architecture
- dataloader, optimizer, criterion, device, state are constructed in main, but passed to an object that runs batches.