From c6b256aa854358327565e93e13e1f7d79407a15d Mon Sep 17 00:00:00 2001 From: Peter Bull Date: Mon, 25 Apr 2016 17:29:25 -0400 Subject: [PATCH] Typos --- docs/docs/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/docs/index.md b/docs/docs/index.md index 888d627..198182e 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -142,7 +142,7 @@ from preprocess.build_features import remove_invalid_data ### Analysis is a DAG -Often in an analysis you have long-running steps that preprocesses data or trains models. If these steps have been run already (and you have stored the output somewhere like the `data/interim` directory), you don't want to wait to rerun them every time. We prefer [`make`](https://www.gnu.org/software/make/) for managing steps that depend on each other, especially the long-running ones. Make is a common tool on unix platforms (and [is available for Windows]()). Following the [`make` documentation](https://www.gnu.org/software/make/), [Makefile conventions](https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html#Makefile-Conventions), and [portability guide](http://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Portable-Make.html#Portable-Make) will help ensure your Makefiles work effectively across systems. Here are [some](http://zmjones.com/make/) [examples](http://blog.kaggle.com/2012/10/15/make-for-data-scientists/) to [get started](https://web.archive.org/web/20150206054212/http://www.bioinformaticszen.com/post/decomplected-workflows-makefiles/). +Often in an analysis you have long-running steps that preprocesses data or trains models. If these steps have been run already (and you have stored the output somewhere like the `data/interim` directory), you don't want to wait to rerun them every time. We prefer [`make`](https://www.gnu.org/software/make/) for managing steps that depend on each other, especially the long-running ones. Make is a common tool on unix platforms (and [is available for Windows]()). Following the [`make` documentation](https://www.gnu.org/software/make/), [Makefile conventions](https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html#Makefile-Conventions), and [portability guide](http://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Portable-Make.html#Portable-Make) will help ensure your Makefiles work effectively across systems. Here are [some](http://zmjones.com/make/) [examples](http://blog.kaggle.com/2012/10/15/make-for-data-scientists/) to [get started](https://web.archive.org/web/20150206054212/http://www.bioinformaticszen.com/post/decomplected-workflows-makefiles/). A number of data folks use `make` as their tool of choice, including [Mike Bostock](https://bost.ocks.org/mike/make/). There are other tools for managing DAGs that are written in Python instead of a DSL (e.g., [Paver](http://paver.github.io/paver/#), [Luigi](http://luigi.readthedocs.org/en/stable/index.html), [Airflow](http://pythonhosted.org/airflow/cli.html), [Snakemake](https://bitbucket.org/snakemake/snakemake/wiki/Home), [Ruffus](http://www.ruffus.org.uk/), or [Joblib](https://pythonhosted.org/joblib/memory.html)). Feel free to use these if they are more appropriate for your analysis. @@ -154,8 +154,8 @@ One effective approach to this is use [virtualenv](https://virtualenv.pypa.io/en 1. Run `mkvirtualenv` when creating a new project 2. `pip install` the packages that your analysis needs - 3. Run `pip freeze >> requirements.txt` to pin the exact package versions used to recreate the analysis - 4. If you find you need to install another package, run `pip freeze >> requirements.txt` again and commit the changes to version control. + 3. Run `pip freeze > requirements.txt` to pin the exact package versions used to recreate the analysis + 4. If you find you need to install another package, run `pip freeze > requirements.txt` again and commit the changes to version control. If you have more complex requirements for recreating your environment, consider a virtual machine based approach such as [Docker](https://www.docker.com/) or [Vagrant](https://www.vagrantup.com/). Both of these tools use text-based formats (Dockerfile and Vagrantfile respectively) you can easily add to source control to describe how to create a virtual machine with the requirements you need.