Go to file

Matt 0f12b26e40 rename batch to runner. fill out makefile. add dev pipeline. gitignore data dir. add logger.py. fill out readme.md. export env.yml.		2023-01-26 11:00:24 -08:00
data	rename batch to runner.	2023-01-26 11:00:24 -08:00
src	rename batch to runner.	2023-01-26 11:00:24 -08:00
test	add hydra config.	2023-01-26 07:25:07 -08:00
.gitignore	add hydra config.	2023-01-26 07:25:07 -08:00
Makefile	rename batch to runner.	2023-01-26 11:00:24 -08:00
README.md	rename batch to runner.	2023-01-26 11:00:24 -08:00
environment.yml	rename batch to runner.	2023-01-26 11:00:24 -08:00
launch.sh	add hydra config.	2023-01-26 07:25:07 -08:00

README.md

Mimimal Viable Deep Learning Infrastructure

Deep learning pipelines are hard to reason about and difficult to code consistently.

Instead of remembering where to put everything and making a different choice for each project, this repository is an attempt to standardize on good defaults.

Think of it like a mini-pytorch lightening, with all the fory internals exposed for extension and modification.

Usage

Install:

Install the conda requirements:

make install

Which is a proxy for calling:

conda env updates -n ml_pipeline --file environment.yml

Run:

Run the code on MNIST with the following command:

make run

Tutorial

The motivation for building a template for deep learning pipelines is this: deep learning is hard enough without every code baase being a little different.

Especially in a research lab, standardizing on a few components makes switching between projects easier.

In this template, you'll see the following:

directory structure

src/model
src/config
data/
test/
- pytest: unit testing.
  - good for data shape
  - TODO:
docs/
- switching projects is easier with these in place
- organize them
**/__init__.py
- creates modules out of dir.
- import module works with these.
README.md
- root level required.
- can exist inside any dir.
environment.yml
Makefile
- to install and run stuff.
- houses common operations and scripts.
launch.sh
- script to dispatch training.

testing

if __name__ == "__main__".
- good way to test things
enables lots breakpoints.

config

Hydra config.
- quickly experiment with hyperparameters
- good way to define env. variables
  - lr, workers, batch_size
  - debug

data

collate functions!

formatting python

python type hints.
automatic linting with the black package.

running

tqdm to track progress.

architecture

dataloader, optimizer, criterion, device, state are constructed in main, but passed to an object that runs batches.