Add explanation of pip package for src
This commit is contained in:
parent
b3e9dfa3f9
commit
72b3e66989
|
@ -102,6 +102,7 @@ cookiecutter https://github.com/drivendata/cookiecutter-data-science
|
||||||
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
|
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
|
||||||
│ generated with `pip freeze > requirements.txt`
|
│ generated with `pip freeze > requirements.txt`
|
||||||
│
|
│
|
||||||
|
├── setup.py <- Make this project pip installable with `pip install -e`
|
||||||
├── src <- Source code for use in this project.
|
├── src <- Source code for use in this project.
|
||||||
│ ├── __init__.py <- Makes src a Python module
|
│ ├── __init__.py <- Makes src a Python module
|
||||||
│ │
|
│ │
|
||||||
|
@ -140,25 +141,18 @@ Since notebooks are challenging objects for source control (e.g., diffs of the `
|
||||||
|
|
||||||
1. Follow a naming convention that shows the owner and the order the analysis was done in. We use the format `<step>-<ghuser>-<description>.ipynb` (e.g., `0.3-bull-visualize-distributions.ipynb`).
|
1. Follow a naming convention that shows the owner and the order the analysis was done in. We use the format `<step>-<ghuser>-<description>.ipynb` (e.g., `0.3-bull-visualize-distributions.ipynb`).
|
||||||
|
|
||||||
2. Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at `src/data/make_dataset.py` and load data from `data/interim`. If it's useful utility code, refactor it to `src` and import it into notebooks with a cell like the following. If updating the system path is icky to you, we'd recommend making a Python package (there is a [cookiecutter for that](https://github.com/audreyr/cookiecutter-pypackage) as well) and installing that as an editable package with `pip install -e`.
|
2. Refactor the good parts. Don't write code to do the same task in multiple notebooks. If it's a data preprocessing task, put it in the pipeline at `src/data/make_dataset.py` and load data from `data/interim`. If it's useful utility code, refactor it to `src`.
|
||||||
|
|
||||||
|
Now by default we turn the project into a Python package (see the `setup.py` file). You can import your code and use it in notebooks with a cell like the following:
|
||||||
|
|
||||||
```
|
```
|
||||||
# Load the "autoreload" extension
|
# OPTIONAL: Load the "autoreload" extension so that code can change
|
||||||
%load_ext autoreload
|
%load_ext autoreload
|
||||||
|
|
||||||
# always reload modules marked with "%aimport"
|
# OPTIONAL: always reload modules so that as you change code in src, it gets loaded
|
||||||
%autoreload 1
|
%autoreload 2
|
||||||
|
|
||||||
import os
|
from src.data import make_dataset
|
||||||
import sys
|
|
||||||
|
|
||||||
# add the 'src' directory as one where we can import modules
|
|
||||||
src_dir = os.path.join(os.getcwd(), os.pardir, 'src')
|
|
||||||
sys.path.append(src_dir)
|
|
||||||
|
|
||||||
# import my method from the source code
|
|
||||||
%aimport preprocess.build_features
|
|
||||||
from preprocess.build_features import remove_invalid_data
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Analysis is a DAG
|
### Analysis is a DAG
|
||||||
|
|
|
@ -31,6 +31,7 @@ Project Organization
|
||||||
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
|
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
|
||||||
│ generated with `pip freeze > requirements.txt`
|
│ generated with `pip freeze > requirements.txt`
|
||||||
│
|
│
|
||||||
|
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
|
||||||
├── src <- Source code for use in this project.
|
├── src <- Source code for use in this project.
|
||||||
│ ├── __init__.py <- Makes src a Python module
|
│ ├── __init__.py <- Makes src a Python module
|
||||||
│ │
|
│ │
|
||||||
|
|
Loading…
Reference in New Issue