ml_pipeline/README.md

# Mimimal Viable Deep Learning Infrastructure

Deep learning pipelines are hard to reason about and difficult to code consistently.

Instead of remembering where to put everything and making a different choice for each project, this repository is an attempt to standardize on good defaults.

Think of it like a mini-pytorch lightening, with all the fory internals exposed for extension and modification.


# Usage

## Install:

Install the conda requirements:

```bash
make install
```

Which is a proxy for calling:

```bash
conda env updates -n ml_pipeline --file environment.yml
```

## Run:

Run the code on MNIST with the following command:

```bash
make run
```


# Tutorial

The motivation for building a template for deep learning pipelines is this: deep learning is hard enough without every code baase being a little different.

Especially in a research lab, standardizing on a few components makes switching between projects easier.

In this template, you'll see the following:

## directory structure

- `src/model`
- `src/config`
- `data/`
- `test/`
    - pytest: unit testing.
        - good for data shape
        - TODO:
- `docs/`
    - switching projects is easier with these in place
    - organize them

- `**/__init__.py`
    - creates modules out of dir.
    - `import module` works with these.
- `README.md`
    - root level required.
    - can exist inside any dir.
- `environment.yml`
- `Makefile`
    - to install and run stuff.
    - houses common operations and scripts.
- `launch.sh`
    - script to dispatch training.

## testing

- `if __name__ == "__main__"`.
    - good way to test things
- enables lots breakpoints.

## config
- Hydra config.
    - quickly experiment with hyperparameters
    - good way to define env. variables
        - lr, workers, batch_size
        - debug

## data
- collate functions!

## formatting python
- python type hints.
- automatic linting with the `black` package.

## running
- tqdm to track progress.

## architecture
- dataloader, optimizer, criterion, device, state are constructed in main, but passed to an object that runs batches.