liminfo

Argo Workflows Reference

Free reference guide: Argo Workflows Reference

15 results

About Argo Workflows Reference

The Argo Workflows Reference is a searchable cheat sheet covering the complete Argo Workflows Kubernetes-native workflow engine syntax across five categories: Workflow (the Workflow CRD with entrypoint and template structure), Templates (Container, Script with inline Python/shell source, DAG for dependency-based task graphs, Steps for sequential/parallel stages, and Resource for creating Kubernetes objects), Artifacts (S3 artifact outputs, parameter passing between tasks with {{inputs.parameters.name}} syntax, and volume mounts with PVCs), Execution control (retry strategy with backoff, task timeout, exit handler via onExit, and suspend/resume for human-in-the-loop approvals), and Schedule (CronWorkflow with cron schedule strings, and WorkflowTemplate for reusable template libraries).

This reference is designed for MLOps engineers, data engineers, platform teams, and CI/CD pipeline developers who orchestrate complex multi-step workflows on Kubernetes using Argo Workflows. Whether you are building a DAG-based ML training pipeline where a preprocessing step must complete before parallel training runs, or setting up a CronWorkflow for nightly batch jobs, this cheat sheet provides the exact YAML spec you need without having to consult the full documentation.

Argo Workflows' template-based model requires you to choose the right template type for each task: Container for running Docker images, Script for inline code, DAG for complex dependency graphs, Steps for ordered phases, and Resource for managing Kubernetes objects like ConfigMaps and Jobs. The five categories mirror these decisions. All entries include copy-ready YAML examples. The reference runs entirely in your browser with no sign-up required and full dark mode support.

Key Features

  • Workflow CRD: entrypoint, generateName, and the top-level template structure for defining complete pipelines
  • Container Template: run arbitrary Docker images with custom commands for build, test, and deploy steps
  • Script Template: inline Python or shell scripts with image selection for lightweight data processing tasks
  • DAG Template: define task dependencies with "dependencies" arrays for complex parallel/sequential pipelines
  • Steps Template: sequential phases with parallel tasks within each phase using double-bracket syntax
  • Artifact passing: S3/GCS artifact outputs with {{workflow.name}} key templates and artifact input references
  • Parameter passing: task arguments with {{inputs.parameters.name}} references for dynamic workflow configuration
  • Execution control: retry with limit/backoff, timeout/activeDeadlineSeconds, exit handler with onExit, and suspend for approvals

Frequently Asked Questions

What is the difference between a DAG template and a Steps template in Argo Workflows?

A DAG template defines tasks with explicit dependency arrays ("dependencies: [A, B]"), creating a directed acyclic graph. Argo automatically parallelizes tasks whose dependencies are satisfied. A Steps template defines sequential phases using a list of lists — each inner list runs in parallel, outer lists run sequentially. Use DAG for complex dependency graphs and Steps for phased pipeline stages.

How do I pass data between tasks in Argo Workflows?

There are two mechanisms: Parameters (for small values like strings and numbers) use {{inputs.parameters.name}} and {{outputs.parameters.name}} syntax. Artifacts (for files and large data) are defined in the outputs section with a path on the container filesystem and a storage backend like S3. The next task references them in its inputs.artifacts section.

What is a WorkflowTemplate and how is it different from a Workflow?

A WorkflowTemplate is a reusable library of named templates stored in the cluster as a Kubernetes CRD. A Workflow is an actual execution instance. You reference a WorkflowTemplate's templates from other Workflows or WorkflowTemplates using "templateRef: {name: my-tmpl, template: common-build}", enabling DRY (Don't Repeat Yourself) workflow design.

How does the retry strategy work in Argo Workflows?

Add a "retryStrategy" block to any template with "limit" (max retries), "retryPolicy" (OnFailure, Always, OnError), and optional "backoff" settings (duration, factor, maxDuration for exponential backoff). When a task fails and retries remain, Argo creates a new pod for the retry attempt. This is useful for flaky tasks like external API calls or network-dependent jobs.

What is a CronWorkflow and when should I use it?

A CronWorkflow is a Kubernetes CRD that schedules Argo Workflows using a standard cron expression (e.g., "0 */6 * * *" for every 6 hours). It is ideal for batch processing, data pipelines, nightly report generation, and any periodic automation task. The workflowSpec inside is identical to a regular Workflow spec.

How do I pause a running workflow for manual approval?

Use the Suspend template type: include a template with "suspend: {}" and add it as a step in your workflow. When execution reaches this step, the workflow pauses. A human can then review the state and resume with "argo resume WORKFLOW_NAME" to continue execution. This enables human-in-the-loop approval gates.

What is an exit handler in Argo Workflows?

An exit handler is a template specified in "spec.onExit" that runs after the workflow completes, regardless of whether it succeeded or failed. It is used for cleanup tasks (deleting temporary resources), notifications (sending Slack alerts), or post-processing (uploading final results). The exit handler receives the workflow status and can branch on success or failure.

How do I mount a PersistentVolumeClaim in an Argo Workflow?

Define the PVC in "spec.volumes" with a volume name and claimName, then reference it in the template container using "volumeMounts" with the volume name and mountPath. This allows tasks to share a filesystem across steps in the same workflow, which is useful for build artifacts, datasets, and checkpoints.