Skip to article frontmatterSkip to article content

Understanding config.yaml file

Broad Institute of MIT and Harvard

To define an interface for your algorithm, it is essential to understand the structure and key components of the config.yaml file (aka spec file).

The config.yaml file specifies how the interface interacts with the algorithm and is located at: bilayers/src/algorithms/algorithm_name/config.yaml.
Here, algorithm_name should accurately represent the task, such as cellpose_inference or cellpose_training.

Each config.yaml contains key sections that define how an algorithm integrates with an interface. These sections include: citations, docker_image, algorithm_folder_name, exec_function, inputs, outputs, parameters, and display_only. The skeleton structure can be copied from an existing example.

Once defined, the config.yaml is processed to generate an executable CLI command, where user-specified inputs are mapped to command-line arguments, facilitating algorithm execution and output retrieval.

Key Sections of config.yaml

config.yaml
citations:
  algorithm:
    - name: ""
      doi: ""
      license: ""
      description: ""

docker-image:
  org:
  name:
  tag:
  platform:

algorithm_folder_name:

exec_function:
  name: "generate_cli_command"
  script:
  module:
  cli_command:
  hidden_args:
    # - cli_tag:
    #   value:
    #   cli_order:

inputs:

outputs:

parameters:

display_only:

Before you start working on the config.yaml file, we recommend reviewing the command-line usage of the specific algorithm you’re building an interface for. This will help you determine what should go in inputs, outputs, parameters, display_only, and exec_function’s hidden_args.

Understanding cli_command

The cli_command is the starting point for executing the command line, like python -m cellpose, which is then followed by appending parameters and their arguments. We will cover more about cli_command in the context of the exec_function, but for now, it’s important to understand its role.

Defining citations

Citations are used to credit the relevant works associated with the algorithm. Include the correct name, doi, license and description of the algorithm. Guidelines on how to find citations can be found here. Note that interface citations are added dynamically, so you don’t need to include them manually.

config.yml
citations:
  algorithm:
    - name: "cite-1"
      doi: ""
      license: ""
      description: ""
    - name: "cite-2"
      doi: ""
      license: ""
      description: ""

Sample Example:

config.yml
citations:
  algorithm:
    - name: "Cellpose"
      doi: 10.1038/s41592-020-01018-x
      license: "BSD 3-Clause"
      description: "Deep Learning algorithm for cell segmentation in microscopy images"

Defining docker_image

Each interface’s Docker image is built on top of the base Docker image specific to the algorithm. Therefore, it’s highly recommended to choose an algorithm with a pre-built Docker image available on DockerHub.

For guidance on selecting a compatible base image, refer Choosing the Right Base Docker Image

To define the container image, select one from DockerHub by specifying its full reference. The image identifier follows a structured format, which can be deconstructed as follows:
For instance, cellprofiler/runcellpose_no_pretrained:0.1 consists of four components:

To learn more about docker image naming, refer to What are Docker tags?

Also, here’s the template to directly paste in your config.yaml file

config.yml
docker-image:
  org: 
  name: 
  tag: 
  platform:

Defining algorithm_folder_name

This specifies the folder where the generated Gradio and Jupyter Notebook interface files will be stored. The folder name should follow the convention of the config.yaml’s parent folder, such as algorithm_inference or algorithm_training.
Example: algorithm_folder_name: “cellpose_inference”

Defining exec_function

exec_function is instrumental in converting the yaml file to desired interface. It defines the specific function responsible for this conversion. The exec_function consists of the following components: name, script, module, cli_command, and hidden_args.

Below is the template to attach directly to your configuration file, followed by a breakdown:

config.yml
exec_function:
  name: "generate_cli_command"
  script: ""
  module: "algorithms."
  cli_command: ""
  hidden_args:
    # dummy example
    # - cli_tag: "--save_png"
    #   value: "True"
    #   append_value: False
    #   cli_order: 3

Below is an example to follow, along with a breakdown of each component:

config.yml
exec_function:
  name: "generate_cli_command"
  script: "cellpose_inference"
  module: "algorithms.cellpose_inference"
  cli_command: "python -m cellpose --verbose"
  hidden_args:
    # dummy example
    # - cli_tag: "--save_png"
    #   value: "True"
    #   append_value: False
    #   cli_order: 3

In command-line systems, there are several common command line patterns used for constructing a cli_command. Here, we support several widely used patterns: Explore the full discussion here

  1. someexecutable --unordered_flag_1 unordered_value_1 --unordered_flag_2 unordered_value_2
  2. someexecutable unordered_value_1 unordered_value_2
  3. someexecutable ordered_value_1 unordered_value_2
  4. someexecutable --ordered_flag_1 ordered_value_1 unordered_value_2
  5. someexecutable ordered_value_1 --unordered_flag_2 unordered_value_2
  6. someexecutable --unordered_flag_1=unordered_value_1 --unordered_flag_2 unordered_value_2
  7. someexecutable --ordered_flag_1=ordered_value_1 unordered_value_2

What are ordered_flag and unordered_flag?

In some cases, cli_command requires flags in fixed positions (e.g., always the 1st or last argument). To handle this, we use the cli_order flag. Here’s how it works:

How to specify --flag_1=value_1?

By default, flags and their arguments are appended with a space between them. If you want to use = between the flag and value, simply add an = at the end of cli_tag. For example, cli_tag: “--savedir=” default: “/bilayers/my_outputs” This will generate: someexecutable --savedir=/bilayers/my_outputs

hidden_args: Need of hidden_args?

Sometimes, certain cli_tag and argument values should always be included in the cli_command, but you don’t want to expose them in the user interface. In these cases, use hidden_args.

Where can it be used?

A potential use case for hidden_args is ensuring output files are saved to a specific folder without allowing the user to modify it. If the algorithm’s command-line usage includes a specific cli_tag for this, you can define it as a hidden_arg. Use the following fields to configure hidden_args:

Organizing Parameters from the Algorithm’s Command-Line Usage

Defining inputs

config.yml
name: 
type: 
label: ""
description: ""
cli_tag: 
cli_order:
default: 
optional: True
format:
folder_name:
file_count:
section_id: ""
mode: ""

Defining outputs

config.yml
name: 
type: 
label: ""
description: ""
cli_tag: 
cli_order:
default: 
optional: True
format:
folder_name:
file_count:
section_id: ""
mode: ""

outputs follow same schema as inputs in the spec file.

Defining parameters

Each parameter object has mandatory tags, some of which depend on the parameter type. While the order of tags should generally be maintained, it’s okay if they are slightly rearranged.

config.yml
name: 
type: 
label: ""
description: ""
default: 
cli_tag: ""
cli_order: 0
optional: True
section_id: ""
mode: ""

Defining display_only

display_only functions similarly to parameters, but the key difference is that these objects are only displayed in the user interface and are not appended to the cli_command. They are non-interactive, meaning users cannot modify the values, which will always reflect the default specified in the config.yaml file.

Since they are not part of the cli_command, you can omit cli_tag and cli_order. For the rest of the structure, you can reuse the template from parameters based on the object type.