Development
===========

Writing ETL tasks is pretty repetitive.  In :py:mod:`tasks.util` are a
number of functions and classes that are meant to make life easier through
reusability.

.. contents::
   :local:
   :depth: 2

Utility Functions
-----------------

These functions are very frequently used within the methods of a new ETL task.

.. autofunction:: tasks.meta.current_session

.. autofunction:: tasks.util.shell

.. autofunction:: tasks.util.underscore_slugify

.. autofunction:: tasks.util.classpath

.. _abstract-classes

Abstract classes
----------------

These are the building blocks of the ETL, and should almost always be
subclassed from when writing a new process.

.. autoclass:: tasks.base_tasks.TableTask
   :members:
   :show-inheritance:

.. autoclass:: tasks.base_tasks.ColumnsTask
   :members:
   :show-inheritance:

.. autoclass:: tasks.base_tasks.TagsTask
   :members:
   :show-inheritance:

Batteries included
------------------

Data comes in many flavors, but sometimes it comes in the same flavor over and
over again.  These tasks are meant to take care of the most repetitive aspects.

.. autoclass:: tasks.base_tasks.TempTableTask
   :members:
   :show-inheritance:

.. autoclass:: tasks.base_tasks.DownloadUnzipTask
   :members:
   :show-inheritance:

.. autoclass:: tasks.base_tasks.Shp2TempTableTask
   :members:
   :show-inheritance:

.. autoclass:: tasks.base_tasks.CSV2TempTableTask
   :members:
   :show-inheritance:

.. autoclass:: tasks.base_tasks.Carto2TempTableTask
   :members:
   :show-inheritance:

Running and Re-Running Pieces of the ETL
-------------------------

When doing local development, it's advisable to run small pieces of the ETL
locally to make sure everything works correctly.  You can use the ``make --
run`` helper, documented in :ref:`run-any-task`. There are several methods for
re-running pieces of the ETL depending on the task and are described below:

Using ``--force`` during development
************************************

When developing with :ref:`abstract-classes` that offer a ``force`` parameter,
you can use it to re-run a task that has already been run, ignoring and
overwriting all output it has already created.  For example, if you have
a :ref:`tasks.base_tasks.TempTableTask` that you've modified in the course of
development and need to re-run:

.. code:: python

    from tasks.base_tasks import TempTableTask
    from tasks.meta import current_session

    class MyTempTable(TempTableTask):

        def run(self):
            session = current_session()
            session.execute('''
               CREATE TABLE {} AS SELECT 'foo' AS mycol;
            ''')

Running ``make -- run path.to.module MyTempTable`` will only work once, even
after making changes to the ``run`` method.

However, running ``make -- run path.to.module MyTempTable --force`` will force
the task to be run again, dropping and re-creating the output table.

Deleting byproducts to force a re-run of parts of ETL
*****************************************************

In some cases, you may have a :ref:`luigi.Task` you want to re-run, but does
not have a ``force`` parameter.  In such cases, you should look at its
``output`` method and delete whatever files or database tables it created.

Utility classes will put their file byproducts in the ``tmp`` folder, inside
a folder named after the module name.  They will put database byproducts into
a schema that is named after the module name, too.

Update the ETL & metadata through ``version``
*********************************************

When you make changes and improvements, you can increment the ``version``
method of :ref:`tasks.base_tasks.TableTask`, :ref:`tasks.base_tasks.ColumnsTask` and
:ref:`tasks.base_tasks.TagsTask` to force the task to run again.