Technical Documentation

This section provides an in-depth look at the architecture, configuration, and core concepts behind Kedro-Dagster. Here you'll find details on how Kedro projects are mapped to Dagster constructs, how to configure orchestration, and how to customize the integration for advanced use cases.

Danger

This documentation section is a work in progress. The translation configuration and logic are not fully defined here. Please check back later for a more complete guide!

Project Configuration

Kedro-Dagster expects a standard Kedro project structure. The main configuration file for Dagster integration is dagster.yml, located in your Kedro project's conf/<ENV_NAME>/ directory.

dagster.yml

This YAML file defines jobs, executors, and schedules for your project.

Example

schedules:
  my_job_schedule: # Name of the schedule
    cron_schedule: "0 0 * * *" # Parameterst of the schedule

executors:
  my_executor: # Name of the executor
    multiprocess: # Parameters of the executor
      max_concurrent: 2

jobs:
  my_job: # Name of the job
    pipeline: # Parameters of its corresponding pipeline
      pipeline_name: __default__
      node_namespace: my_namespace
    executor: my_executor
    schedule: my_job_schedule

jobs: Map Kedro pipelines to Dagster jobs, with optional filtering.
executors: Define how jobs are executed (in-process, multiprocess, k8s, etc) by picking executors from those implemented in Dagster.
schedules: Set up cron-based or custom schedules for jobs.

Customizing Schedules

You can define multiple schedules for your jobs using cron syntax. See the Dagster scheduling documentation and the API Reference for more details.

Customizing Executors

Kedro-Dagster supports several executor types for running your jobs, such as in-process, multiprocess, Dask, Docker, Celery, and Kubernetes. You can customize executor options in your dagster.yml file under the executors section.

For each available Dagster executor, there is a corresponding configuration Pydantic model documented in the API Reference.

Example: Custom Multiprocess Executor

We can select multiprocess as the executor type corresponding to the multiprocess Dagster executor and specify the mulitprocess executor according to the MultiprocessExecutorOptions.

executors:
  my_multiprocess_executor:
    multiprocess:
      max_concurrent: 4

Example: Custom Docker Executor

Similarly, we can configure a Docker Dagster executor with the available parameters defined in DockerExecutorOptions.

executors:
  my_docker_executor:
    docker_executor:
      image: my-custom-image:latest
      registry: "my_registry.com"
      network: "my_network"
      networks: ["my_network_1", "my_network_2"]
      container_kwargs:
        volumes:
          - "/host/path:/container/path"
        environment:
          - "ENV_VAR=value"

Note

The docker_executor requires the `dagster-docker package.

Customizing Jobs

You can filter which nodes, tags, or inputs/outputs are included in each job. Each job can be associated with a pre-defined executor and/or schedule. See the Kedro pipeline documentation for more on pipelines and filtering. The accepted pipeline parameters are documented in the associated Pydantic model, `PipelineOptions.

To each job, you can assign a schedule and/or an executor by name if it was previously defined in the configuration file.

definitions.py

The definitions.py file is auto-generated by the plugin and serves as the main entry point for Dagster to discover all translated Kedro objects. It contains the Dagster Definitions object, which registers all jobs, assets, resources, schedules, and sensors derived from your Kedro project.

In most cases, you should not manually edit definitions.py; instead, update your Kedro project or dagster.yml configuration.

Kedro-Dagster Concept Mapping

Kedro-Dagster translates core Kedro concepts into their Dagster equivalents. Understanding this mapping helps you reason about how your Kedro project appears and behaves in Dagster.

Kedro Concept	Dagster Concept	Description
Node	Op, Asset	Each Kedro node becomes a Dagster op. Node parameters are passed as config.
Pipeline	Job	Each Kedro pipeline is translated into a Dagster job. Jobs can be filtered and scheduled and can target executors.
Dataset	Asset, IO Manager	Each Kedro data catalog's dataset become Dagster assets managed by a dedicated IO managers.
Hooks	Hooks, Sensors	Kedro hooks are executed at the appropriate points in the Dagster job lifecycle.
Parameters	Config, Resources	Kedro parameters are passed as Dagster config.
Logging	Logger	Kedro logging is integrated with Dagster's logging system.

Catalog

Kedro-Dagster translates Kedro datasets into Dagster assets and IO managers. This allows you to use Kedro's Data Catalog with Dagster's asset materialization and IO management features.

For the Kedro pipelines specified in dagster.yml, the following Dagster objects are defined:

External assets: Input datasets to the pipelines are registered as Dagster external assets.
Assets: Output datasets to the pipelines are defined as Dagster assets
IO Managers: Custom Dagster IO managers are created for each dataset involved in the deployed pipelines mapping both their save and load functions.

See the API reference for CatalogTranslator for more details.

Node

Kedro nodes are translated into Dagster ops and assets. Each node becomes a Dagster op, and, additionally, nodes that return outputs are mapped to Dagster multi-assets.

For the Kedro pipelines specified in dagster.yml, the following Dagster objects are defined:

Ops: Each Kedro node within the pipelines is mapped to a Dagster op.
Assets: Kedro nodes that return output datasets are registered as Dagster multi-assets.
Parameters: Node parameters are passed as Dagster config to enable them to be modified in a Dagster run launchpad.

See the API reference for NodeTranslator for more details.

Pipeline

Kedro pipelines are translated into Dagster jobs. Each job can be filtered, scheduled, and assigned an executor via configuration.

Jobs: Each pipeline is mapped to a Dagster job.
Filtering: Jobs are defined granuarily from Kedro pipelines by allowing the filtering of their nodes, namespaces, tags, and inputs/outputs.

See the API reference for PipelineTranslator for more details.

Hook

Kedro-Dagster preserves all Kedro hooks in the Dagster context. Hooks are executed at the appropriate points in the Dagster job lifecycle. Catalog hooks are called in the handle_output and load_input function of each Dagster IO manager. Node hooks are plugged in the appropriate Dagster Op. As for the Context hook, they are called within a Dagster Op running at the beginning of each job along with the before_pipeline_run pipeline hook. The after_pipeline_run is called in a Dagster op running at the end of each job. Finally the on_pipeline_error pipeline, is embedded in a dedicated Dagster sensor that is triggered by a run failure.

Next Steps

Getting Started: Follow the step-by-step tutorial to set up Kedro-Dagster in your project.
Example: See the Example Documentation for a real-world use case.
API Reference: Explore the API Reference for details on available classes, functions, and configuration options.
External Documentation: For more on Kedro concepts, see the Kedro documentation. For Dagster concepts, see the Dagster documentation.