What I learned about WDL

What is WDL?#

The Workflow Description Language (WDL) is an open-source language designed to simplify complex computational workflows, particularly in genomics and bioinformatics. Developed by the Broad Institute and maintained by the OpenWDL community, WDL allows scientists to define analysis pipelines in a human-readable format.

Why It Matters:

Standardizes workflow definitions across platforms
Enables reproducibility in scientific research
Simplifies scaling from laptops to cloud environments

WDL Syntax: A Step-by-Step Breakdown#

Let’s dissect a simple WDL workflow to understand its structure.

Example Workflow#

version 1.2  # Modern WDL version

task say_hello {
    input {
        String greeting
        String name
    }
    
    command <<<;
        echo "~{greeting}, ~{name}!"
    >>>;
    
    output {
        String message = read_string(stdout())
    }
    
    requirements {
        container: "ubuntu:latest"
    }
}

workflow main {
    input {
        String name
        Boolean is_pirate = false
    }
    
    Array[String] greetings = select_all([
        "Hello",
        "Hallo",
        "Hej",
        (
            if is_pirate
            then "Ahoy"
            else None
        ),
    ])
    
    scatter (greeting in greetings) {
        call say_hello {
            input:
                greeting = greeting,
                name = name
        }
    }
    
    output {
        Array[String] messages = say_hello.message
    }
}

Line-by-Line Breakdown#

1. Version Declaration#

version 1.2

New in 1.2: Adds features like select_all() and improved error handling.
Best Practice: Always declare versions explicitly for compatibility.

2. Task Definition#

task say_hello {
  input {
      String greeting
      String name
  }

Input Parameters: Declares two required inputs for personalization.
Flexibility: Tasks can be reused across workflows with different inputs.

3. Command Section (Heredoc Syntax)#

command >>>;
    echo "~{greeting}, ~{name}!"
<<<;

<<<; Syntax: Allows multi-line commands without escaping quotes.
Variable Substitution: ~{} injects WDL variables into shell commands.

4. Output Declaration#

output {
    String message = read_string(stdout())
}

read_string(): Built-in function captures command output.
Type Safety: Explicit String type ensures data consistency.

5. Runtime Requirements#

requirements {
    container: "ubuntu:latest"
}

Reproducibility: Uses Docker containers for consistent environments.
Alternatives: Can specify CPU/memory constraints instead.

6. Workflow Inputs#

input {
    String name
    Boolean is_pirate = false
}

Default Values: is_pirate is optional (defaults to false).
Runtime Flexibility: Users can override defaults when executing.

7. Array with Conditional Logic#

Array[String] greetings = select_all([
    "Hello",
    "Hallo",
    "Hej",
    (if is_pirate then "Ahoy" else None),
])

select_all(): Filters out None values, creating a clean array.
Conditional Expression: Adds “Ahoy” only if is_pirate is true.

8. Parallel Execution#

scatter (greeting in greetings) {
    call say_hello { input: greeting, name }
}

Scatter/Gather: Runs say_hello in parallel for each greeting.
Cloud Optimization: Automatically scales on distributed systems.

9. Workflow Output#

output {
    Array[String] messages = say_hello.message
}

Aggregation: Collects outputs from all parallel tasks.
Downstream Use: These messages could feed into another workflow.

Key WDL 1.2 Features Demonstrated#

Conditional Arrays:

(if is_pirate then "Ahoy" else None)

- Enables dynamic workflow configurations based on inputs.

Scatter Parallelization:

scatter (greeting in greetings) { ... }

- Simplifies parallel processing of large datasets.

Type-Safe Outputs:

Array[String] messages = say_hello.message

- Ensures data integrity between workflow steps.

Running the Workflow#

Input JSON:

{
  "main.name": "Dave",
  "main.is_pirate": true
}

Expected Output:

{
  "main.messages": ["Hello, Dave!", "Hallo, Dave!", "Hej, Dave!", "Ahoy, Dave!"]
}

Getting Started with WDL#

Install a WDL Runner:

# For MiniWDL (used in our project):  
pip install miniwdl

Write Your First Workflow:

version 1.0  
workflow hello_wdl {  
  call say_hello  
  output {  
    String message = "Workflow completed!"  
  }  
}  
task say_hello {  
  command { echo "Hello, WDL!" }  
}

Run It:

miniwdl run hello.wdl

Conclusion#

By learning WDL, I gained the foundation needed to build tools around WDL, which aims to make workflow development even more intuitive.

Resources:

OpenWDL Documentation -> The documentation is really good to understand WDL.
WDL Quickstart Guide