From 3978e7d31a4c740dbbe23d7e4f97d50627cf80b6 Mon Sep 17 00:00:00 2001 From: Arturo Moncada-Torres <37126116+arturomoncadatorres@users.noreply.github.com> Date: Sat, 6 Mar 2021 19:11:46 +0100 Subject: [PATCH] Update documentation (#243) Add explicit definition of DAG (as requested in issue #69) --- docs/docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/index.md b/docs/docs/index.md index 2c0110f..1733e83 100644 --- a/docs/docs/index.md +++ b/docs/docs/index.md @@ -155,7 +155,7 @@ Since notebooks are challenging objects for source control (e.g., diffs of the ` from src.data import make_dataset ``` -### Analysis is a DAG +### Analysis is a directed acyclic graph ([DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph)) Often in an analysis you have long-running steps that preprocess data or train models. If these steps have been run already (and you have stored the output somewhere like the `data/interim` directory), you don't want to wait to rerun them every time. We prefer [`make`](https://www.gnu.org/software/make/) for managing steps that depend on each other, especially the long-running ones. Make is a common tool on Unix-based platforms (and [is available for Windows]()). Following the [`make` documentation](https://www.gnu.org/software/make/), [Makefile conventions](https://www.gnu.org/prep/standards/html_node/Makefile-Conventions.html#Makefile-Conventions), and [portability guide](http://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Portable-Make.html#Portable-Make) will help ensure your Makefiles work effectively across systems. Here are [some](http://zmjones.com/make/) [examples](http://blog.kaggle.com/2012/10/15/make-for-data-scientists/) to [get started](https://web.archive.org/web/20150206054212/http://www.bioinformaticszen.com/post/decomplected-workflows-makefiles/). A number of data folks use `make` as their tool of choice, including [Mike Bostock](https://bost.ocks.org/mike/make/).