Bigmetadata ETL Docs¶

All data for CARTO’s Data Observatory is obtained through tasks built subclassing Bigmetadata ETL classes.

The classes themselves are derived from Luigi tasks.

By performing the ETL using these classes, we gain a few guarantees:

  • Reproduceability, and avoidance of duplicate work
  • Generation of high-quality metadata consumable by the Observatory API
  • Scalability across multiple processes

Contents:

  • Quickstart
    • Requirements
    • Clone & configure
    • Start
    • Run
  • Example ETL/metadata pipeline
    • 1. Import libraries
    • 2. Download the data
    • 3. Import data into PostgreSQL
    • 4. Preprocess data in PostgreSQL
    • 5. Write metadata
    • 6. Populate output table
  • Development
    • Utility Functions
    • Abstract classes
    • Batteries included
    • Running and Re-Running Pieces of the ETL
  • Convenience tasks
    • Makefile
    • Tasks
    • Functions
  • Metadata model
    • Relational Diagram
    • Manually generated entities
    • Autogenerated entities
  • Validating your code
    • Best practices
    • Making sure ETL code works right
    • Making sure metadata works right
    • Regenerate and look at the Catalog
    • Upload to a test CARTO server
  • Testing your data
    • ETL unit tests
    • Metadata integration tests
    • API unit tests
    • Integration tests
    • Diagnosing common issues in integration tests
  • Deploying the Observatory

Indices and tables¶

  • Index
  • Module Index
  • Search Page

Table Of Contents

  • Bigmetadata ETL Docs
  • Indices and tables

This Page

  • Show Source

Quick search

©2017, CARTO. | Powered by Sphinx 1.4.5 & Alabaster 0.7.10 | Page source