2016-04-23 14:16:53 -07:00
# Cookiecutter Data Science
_A logical, reasonably standardized, but flexible project structure for doing and sharing data science work._
2015-10-30 12:47:00 -07:00
2016-03-19 13:32:33 -07:00
2016-04-23 13:00:53 -07:00
#### [Project homepage](http://drivendata.github.io/cookiecutter-data-science/)
2016-03-19 13:32:33 -07:00
2016-04-23 14:16:53 -07:00
### Requirements to use the cookiecutter template:
2016-03-19 13:32:33 -07:00
-----------
- Python 2.7 or 3.5
2016-09-06 12:25:23 -07:00
- [Cookiecutter Python package ](http://cookiecutter.readthedocs.org/en/latest/installation.html ) >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
``` bash
$ pip install cookiecutter
```
or
2017-01-28 20:01:56 -08:00
2016-09-06 12:25:23 -07:00
``` bash
$ conda config --add channels conda-forge
$ conda install cookiecutter
```
2016-03-19 13:32:33 -07:00
2016-04-23 14:16:53 -07:00
### To start a new project, run:
2016-03-19 13:32:33 -07:00
------------
2015-10-30 12:47:00 -07:00
2016-03-23 09:54:51 -07:00
cookiecutter https://github.com/drivendata/cookiecutter-data-science
2016-03-19 13:32:33 -07:00
2016-03-23 10:04:32 -07:00
[![asciicast ](https://asciinema.org/a/9bgl5qh17wlop4xyxu9n9wr02.png )](https://asciinema.org/a/9bgl5qh17wlop4xyxu9n9wr02)
2017-01-28 20:01:56 -08:00
2017-06-14 01:27:36 -07:00
### The resulting directory structure
------------
The directory structure of your new project looks like this:
```
├── LICENSE
├── Makefile < - Makefile with commands like `make data` or `make train`
├── README.md < - The top-level README for developers using this project .
├── data
│ ├── external < - Data from third party sources .
│ ├── interim < - Intermediate data that has been transformed .
│ ├── processed < - The final , canonical data sets for modeling .
│ └── raw < - The original , immutable data dump .
│
├── docs < - A default Sphinx project ; see sphinx-doc . org for details
│
├── models < - Trained and serialized models , model predictions , or model summaries
│
├── notebooks < - Jupyter notebooks . Naming convention is a number ( for ordering ) ,
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration` .
│
├── references < - Data dictionaries , manuals , and all other explanatory materials .
│
├── reports < - Generated analysis as HTML , PDF , LaTeX , etc .
│ └── figures < - Generated graphics and figures to be used in reporting
│
├── requirements.txt < - The requirements file for reproducing the analysis environment , e . g .
│ generated with `pip freeze > requirements.txt`
│
├── src < - Source code for use in this project .
│ ├── __init__ .py < - Makes src a Python module
│ │
│ ├── data < - Scripts to download or generate data
│ │ └── make_dataset.py
│ │
│ ├── features < - Scripts to turn raw data into features for modeling
│ │ └── build_features.py
│ │
│ ├── models < - Scripts to train models and then use trained models to make
│ │ │ predictions
│ │ ├── predict_model.py
│ │ └── train_model.py
│ │
│ └── visualization < - Scripts to create exploratory and results oriented visualizations
│ └── visualize.py
│
└── tox.ini < - tox file with settings for running tox ; see tox . testrun . org
```
2017-01-28 20:01:56 -08:00
## Contributing
We welcome contributions! [See the docs for guidelines ](https://drivendata.github.io/cookiecutter-data-science/#contributing ).
### Installing development requirements
------------
pip install -r requirements.txt
### Running the tests
------------
py.test tests