From 72b3e66989deb8f06a90b19ced7d64e0056beebe Mon Sep 17 00:00:00 2001 From: Peter Bull Date: Fri, 13 Apr 2018 16:24:03 -0700 Subject: [PATCH] Add explanation of pip package for src --- docs/docs/index.md | 22 ++++++++-------------- {{ cookiecutter.repo_name }}/README.md | 1 + 2 files changed, 9 insertions(+), 14 deletions(-) diff --git a/docs/docs/index.md b/docs/docs/index.md index 98fe5c2..2144535 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -102,6 +102,7 @@ cookiecutter https://github.com/drivendata/cookiecutter-data-science ├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g. │ generated with `pip freeze > requirements.txt` │ +├── setup.py <- Make this project pip installable with `pip install -e` ├── src <- Source code for use in this project. │   ├── __init__.py <- Makes src a Python module │ │ @@ -140,25 +141,18 @@ Since notebooks are challenging objects for source control (e.g., diffs of the ` 1. Follow a naming convention that shows the owner and the order the analysis was done in. We use the format `--.ipynb` (e.g., `0.3-bull-visualize-distributions.ipynb`). - 2. Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at `src/data/make_dataset.py` and load data from `data/interim`. If it's useful utility code, refactor it to `src` and import it into notebooks with a cell like the following. If updating the system path is icky to you, we'd recommend making a Python package (there is a [cookiecutter for that](https://github.com/audreyr/cookiecutter-pypackage) as well) and installing that as an editable package with `pip install -e`. + 2. Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at `src/data/make_dataset.py` and load data from `data/interim`. If it's useful utility code, refactor it to `src`. + + Now by default we turn the project into a Python package (see the `setup.py` file). You can import your code and use it in notebooks with a cell like the following: ``` -# Load the "autoreload" extension +# OPTIONAL: Load the "autoreload" extension so that code can change %load_ext autoreload -# always reload modules marked with "%aimport" -%autoreload 1 +# OPTIONAL: always reload modules so that as you change code in src, it gets loaded +%autoreload 2 -import os -import sys - -# add the 'src' directory as one where we can import modules -src_dir = os.path.join(os.getcwd(), os.pardir, 'src') -sys.path.append(src_dir) - -# import my method from the source code -%aimport preprocess.build_features -from preprocess.build_features import remove_invalid_data +from src.data import make_dataset ``` ### Analysis is a DAG diff --git a/{{ cookiecutter.repo_name }}/README.md b/{{ cookiecutter.repo_name }}/README.md index bf024ec..010dd62 100644 --- a/{{ cookiecutter.repo_name }}/README.md +++ b/{{ cookiecutter.repo_name }}/README.md @@ -31,6 +31,7 @@ Project Organization ├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g. │ generated with `pip freeze > requirements.txt` │ + ├── setup.py <- makes project pip installable (pip install -e .) so src can be imported ├── src <- Source code for use in this project. │   ├── __init__.py <- Makes src a Python module │ │