Skip to content

Getting Started

This guide walks you through setting up and deploying a Kedro project with Dagster using the Kedro‑Dagster plugin. The examples below use the Kedro spaceflights-pandas starter project, but you can use your own Kedro project. If you do, skip step 1.

1. Create a Kedro Project (Optional)

Skip this step if you already have a Kedro project you want to deploy with Dagster.

If you don't already have a Kedro project, you can create one using a starter template:

kedro new --starter=spaceflights-pandas

Follow the prompts to set up your project. Once it is done, install the dependencies of your project:

cd spaceflights-pandas
pip install -r requirements.txt

2. Installation

Install the plugin with pip:

pip install kedro-dagster

3. Initialize Dagster Integration

Use kedro dagster init to initialize Kedro‑Dagster:

kedro dagster init --env local

This creates:

  • src/definitions.py: Dagster entrypoint file that exposes all translated Kedro objects as Dagster objects.
definitions.py
"""Dagster definitions."""

import dagster as dg

from kedro_dagster import KedroProjectTranslator

translator = KedroProjectTranslator(env="local")
dagster_code_location = translator.to_dagster()

resources = dagster_code_location.named_resources
# The "io_manager" key handles how Kedro MemoryDatasets are handled by Dagster
resources |= {
    "io_manager": dg.fs_io_manager,
}

# Define the default executor for Dagster jobs
default_executor = dg.multiprocess_executor.configured(dict(max_concurrent=2))

defs = dg.Definitions(
    assets=list(dagster_code_location.named_assets.values()),
    resources=resources,
    jobs=list(dagster_code_location.named_jobs.values()),
    schedules=list(dagster_code_location.named_schedules.values()),
    sensors=list(dagster_code_location.named_sensors.values()),
    loggers=dagster_code_location.named_loggers,
    executor=default_executor,
)
  • conf/local/dagster.yml: Dagster configuration file for the local Kedro environment.
dagster.yml
# `dagster dev` configuration
dev:
  log_level: "info"
  log_format: "colored"
  port: "3000"
  host: "127.0.0.1"
  live_data_poll_rate: "2000"

# Dagster schedules configuration
schedules:
  daily: # Schedule name
    cron_schedule: "0 0 * * *" # Schedule parameters

# Dagster executors configuration
executors:
  sequential: # Executor name
    in_process: # Executor parameters

# Dagster jobs configuration
jobs:
  # You may filter pipelines by using e.g. `node_names`` to define a job
  # data_processing: # Job name
  #   pipeline: # Pipeline filter parameters
  #     pipeline_name: data_processing
  #     node_names:
  #     - preprocess_companies_node
  #     - preprocess_shuttles_node

  default:
    pipeline:
      pipeline_name: __default__
    schedule: daily
    executor: sequential

There's no need to modify the Dagster definitions.py file to get started, but let's have a deeper look on the dagster.yml file.

4. Configure Jobs, Executors, and Schedules

The Kedro‑Dagster configuration file dagster.yml includes the following sections:

  • schedules: Used to set up cron schedules for jobs.
  • executors: Used to specify the compute targets for jobs (in-process, multiprocess, k8s, etc).
  • jobs: Used to describe jobs through the filtering of Kedro pipelines.

You can edit the automatically generated conf/local/dagster.yml to customize jobs, executors, and schedules:

schedules:
  daily: # Schedule name
    cron_schedule: "0 0 * * *" # Schedule parameters

executors: # Executor name
  sequential: # Executor parameters
    in_process:

  multiprocess:
    multiprocess:
      max_concurrent: 2

jobs:
  default: # Job name
    pipeline: # Pipeline filter parameters
      pipeline_name: __default__
    executor: sequential

  parallel_data_processing:
    pipeline:
      pipeline_name: data_processing
      node_names:
      - preprocess_companies_node
      - preprocess_shuttles_node
    schedule: daily
    executor: multiprocess

  data_science:
    pipeline:
      pipeline_name: data_science
    schedule: daily
    executor: sequential

Here, we have added a "parallel_data_processing" and a "data_science" job to the jobs configuration. The first one makes use of the node_names Kedro pipeline filter arguments, to create a sub-pipeline of the Kedro "data_processing" pipeline from a list of two Kedro nodes: "preprocess_companies_node" and "preprocess_shuttles_node". Both jobs are to run daily using the "daily" schedule based on the cron_schedule "0 0 * * *". "parallel_data_processing" is to run using a "multiprocess" executor with 2 max_concurrent and "data_science" will run sequentially.

See the Technical Documentation for more on customizing the Dagster configuration file.

5. Browse the Dagster UI

Use kedro dagster dev to start the Dagster development server:

kedro dagster dev --env local

Note

The dagster.yml file also include a dev section, containing the default parameters of the command. Check out the API Reference for more info.

The Dagster UI will be available at http://127.0.0.1:3000 by default.

You can inspect assets, jobs, and resources, trigger or automate jobs, and monitor runs from the UI.

Assets

Moving to the "Assets" tab leads to the list of assets generated from the Kedro datasets involved in the filtered pipelines specified in dagster.yml.

List of assets involved in the specified jobs List of assets involved in the specified jobs

Asset List.

Each asset is prefixed by the Kedro environment that was passed to the KedroProjectTranslator in definitions.py. If the Kedro dataset was generated from a dataset factory, the namespace that prefixed its name will also appear as a prefix, allowing easy browsing of assets per environment and per namespace.

Clicking on the "Asset lineage" link at the top right of the window leads to the Dagster asset lineage graph, where you can observe the dependencies between assets and check their status and description.

Lineage graph of assets involved in the specified jobs Lineage graph of assets involved in the specified jobs

Asset Lineage Graph.

Resources

Kedro‑Dagster defines one Dagster IO Manager per Kedro Dataset to handle their saving/loading. As with assets, they are defined per Kedro environment and their name is prefixed accordingly.

List of the resources involved in the specified jobs List of the resources involved in the specified jobs

Resource list.

Automation

Moving to the "Automation" tab, you can see a list of the defined schedules and sensors.

List of the schedules and sensors involved in the specified jobs List of the schedules and sensors involved in the specified jobs

Schedule and Sensor List.

Jobs

To see the different jobs defined in dagster.yml, click on the "Jobs" tab.

List of the specified jobs List of the specified jobs

Job List.

Clicking on the "parallel_data_processing" job brings you to a graph representation of the corresponding Dagster-translated Kedro pipeline. before_pipeline_run and after_pipeline_run are included as the first and final nodes of the job graph.

Graph describing the "parallel_data_processing" job Graph describing the "parallel_data_processing" job

Job Graph.

The job can be run by clicking on the "Launchpad" sub-tab. The Kedro pipeline, its parameters (mapped to Dagster Config), and the Kedro datasets (mapped to IO managers) can be modified before launching a run.

Launchpad for the "parallel_data_processing" job Launchpad for the "parallel_data_processing" job

Job Launchpad.

Running the "parallel_data_processing" job Running the "parallel_data_processing" job

Job Run Timeline.

Next Steps