cookiecutter-data-science

Go to file

Peter Bull f24a8f77c7 Initial docs commit		2016-04-23 12:19:28 -04:00
docs	Initial docs commit	2016-04-23 12:19:28 -04:00
{{ cookiecutter.repo_name }}	Fix missing imports	2016-04-14 15:50:11 -04:00
.gitattributes	Start to flesh out READMEs	2015-10-30 15:47:00 -04:00
.gitignore	Initial docs commit	2016-04-23 12:19:28 -04:00
README.md	Add cookiecutter dependency version	2016-03-23 16:25:22 -04:00
cookiecutter.json	Stupidest part of the JSON spec.	2016-03-26 12:55:20 -04:00
requirements.txt	Initial docs commit	2016-04-23 12:19:28 -04:00

README.md

cookiecutter-data-science

An opinionated, but not-afraid-to-be-wrong project template for data science projects. Pull requests welcome. Debate encouraged.

Requirements to create project:

Python 2.7 or 3.5
cookiecutter Python package >= 1.4.0: pip install cookiecutter

To start a new project:

cookiecutter https://github.com/drivendata/cookiecutter-data-science

Data

** By default, the data folder is included in the .gitignore file.** If you have a small amount of data that rarely changes, you may want to include the data in the repository. Github currently warns if files are over 50MB and rejects files over 100MB. Some other options for storing large data include AWS S3 with a syncing tool (e.g., s3cmd), Git Large File Storage, Git Annex, and dat.

The prefered workflow if data is not in the repository is to have a make command make data that will download or create the relevant datasets.