Bulkers Skill Reference

A complete, task-oriented guide for operating bulkers end-to-end

Contents

1. Overview

Bulkers manages multi-container computing environments. You define a "crate" — a YAML manifest listing Docker images and the commands they provide. Activate a crate and bulkers creates symlinks that dispatch each command to the right container at runtime. Commands appear on your PATH as if natively installed. No install step needed — just bulkers activate and go.

2. Install

# One-liner install (Linux/macOS)
curl -sL https://raw.githubusercontent.com/databio/bulkers/master/install.sh | bash

# This installs the binary to ~/.local/bin/bulkers
# and adds a shell function to ~/.bashrc or ~/.zshrc

Verify the installation:

bulkers --version

If not already initialized, create a config file:

# Auto-detects docker or singularity
bulkers config init

# Or specify config path and engine explicitly
bulkers config init -c ~/.config/bulker/bulker_config.yaml -e docker

3. Using an existing crate

Activate a crate (interactive shell):

bulkers activate databio/pepatac
# Auto-fetches from registry if not cached locally
# You're now in a shell where samtools, bowtie2, etc. are on PATH
samtools --version
bulkers deactivate

activate auto-fetches the manifest from the registry on first use, creates symlinks for each command, and prepends them to your PATH. deactivate restores the original PATH. No separate install step needed.

Pre-cache for offline use (optional):

bulkers crate install databio/pepatac:1.0.13      # cache manifest
bulkers crate install databio/pepatac:1.0.13 -b   # cache manifest + pull images

Run a single command (non-interactive, for scripts/AI):

bulkers exec databio/pepatac -- samtools view -h input.bam

exec is the preferred method for AI agents and scripts — it runs a single command in the crate's environment without modifying your current shell.

See what's cached:

bulkers crate list                           # list all cached crates
bulkers crate list --simple                  # space-separated, for scripting
bulkers crate inspect databio/pepatac        # show commands in this crate
bulkers crate clean databio/pepatac          # remove a cached crate
bulkers crate clean --all                    # clear entire cache

4. Creating a crate manifest

4a. Minimal manifest — one tool

manifest:
  name: my-tools
  commands:
  - command: samtools
    docker_image: quay.io/biocontainers/samtools:1.9--h91753b0_8

Each field:

4b. Multiple tools

manifest:
  name: biotools
  version: "1.0"
  commands:
  - command: samtools
    docker_image: quay.io/biocontainers/samtools:1.9--h91753b0_8
  - command: bedtools
    docker_image: quay.io/biocontainers/bedtools:2.29.2--hc088bd4_0
  - command: python
    docker_image: python:3.11
    docker_command: python
    docker_args: "-it"

4c. Volumes and mounts

manifest:
  name: data-tools
  commands:
  - command: mytool
    docker_image: myimage:latest
    volumes:
      - /data/shared
      - /scratch

How mounting works:

When you need extra volumes:

4d. Environment variables

manifest:
  name: display-tools
  commands:
  - command: firefox
    docker_image: jess/firefox
    docker_args: "-it"
    envvars:
      - DISPLAY
      - XAUTHORITY
    no_network: false

4e. Advanced options

manifest:
  name: advanced
  commands:
  - command: redis-server
    docker_image: redis:7
    docker_command: redis-server
    docker_args: "--name redis -p 6379:6379"
    workdir: /data
    no_user: true
    no_network: false
  - command: jq
    docker_image: ghcr.io/jqlang/jq
    docker_args: "-i --entrypoint jq"
    docker_command: " "

Field reference:

Field Type Default Description
command string required Executable name created on PATH
docker_image string required Docker image reference
docker_command string command Command to run in container
docker_args string none Extra docker run flags
volumes list [] Additional mount paths
envvars list [] Env var names to pass through
no_user bool false If true, run as root (not current user)
no_network bool false If true, don't add --network=host
workdir string $(pwd) Working directory in container

4f. Host commands and imports

manifest:
  name: pepatac
  version: "1.0.14"
  imports:
    - bulker/coreutils
    - databio/bedstuff
  host_commands:
    - python3
    - perl
    - git
  commands:
  - command: samtools
    docker_image: quay.io/biocontainers/samtools:1.9--h91753b0_8

5. Using a local manifest

# Activate directly from a local file
bulkers activate ./my-manifest.yaml

# Pre-cache a local manifest (optional)
bulkers crate install ./my-manifest.yaml

# Pre-cache and pull all Docker images
bulkers crate install ./my-manifest.yaml -b

# Exec with a local manifest
bulkers exec ./my-manifest.yaml -- samtools --version

When the argument starts with ./, /, or ends in .yaml/.yml, it is treated as a local manifest file. The manifest is cached automatically for shimlink dispatch at runtime. Otherwise the argument is treated as a registry shorthand (namespace/crate:tag).

6. Publishing a crate to the registry

The registry is a static collection of YAML files served from GitHub via hub.bulker.io. Publishing means adding your manifest file to this repository.

# 1. Write your manifest
cat > my-crate.yaml << 'EOF'
manifest:
  name: my-crate
  version: "1.0"
  commands:
  - command: mytool
    docker_image: myorg/mytool:1.0
EOF

# 2. Fork or clone the hub.bulker.io repo
git clone git@github.com:databio/hub.bulker.io.git
cd hub.bulker.io

# 3. Add your manifest under your namespace
mkdir -p mynamespace
cp ../my-crate.yaml mynamespace/my-crate.yaml
# For versioned: mynamespace/my-crate_1.0.yaml

# 4. Commit and push
git add mynamespace/my-crate.yaml
git commit -m "Add mynamespace/my-crate"
git push origin main

# 5. Open a pull request to databio/hub.bulker.io

File naming convention:

After merge, the crate is available via: bulkers activate mynamespace/my-crate:1.0

7. Configuration reference

Full config file with annotations:

bulker:
  container_engine: docker          # "docker" or "apptainer"
  default_namespace: bulker         # used when namespace is omitted
  registry_url: http://hub.bulker.io/  # default manifest registry

  # Shell settings
  shell_path: ${SHELL}
  shell_rc: $HOME/.bashrc

  # Global defaults (applied to all containers)
  volumes:
    - $HOME
  envvars:
    - DISPLAY

  # Build template (for crate install --build)
  build_template: docker_build.tera

  # Apptainer-specific
  apptainer_image_folder: ~/.local/share/apptainer/images

Manifests are cached in ~/.config/bulker/manifests/ (managed automatically by activate). There is no crates map in the config — the filesystem cache is the source of truth.

Config file location lookup order:

  1. -c flag on any command
  2. $BULKERCFG environment variable
  3. ~/.config/bulker/bulker_config.yaml

Setting config values from the CLI:

# Add global volumes that all containers will mount
bulkers config set volumes=$HOME,/data/shared,/scratch

# Add global environment variables
bulkers config set envvars=DISPLAY,LANG,MY_VAR

# Change container engine
bulkers config set container_engine=singularity

# View current config
bulkers config show

# Get a specific value
bulkers config get container_engine

8. Apptainer/Singularity support

# Initialize with apptainer
bulkers config init -e apptainer

# Or change engine in existing config
bulkers config set container_engine=apptainer

Differences from Docker:

9. Common patterns for AI agents

Pattern: Set up a bioinformatics environment

# Just exec — auto-fetches manifest on first use
bulkers exec databio/pepatac -- samtools view -h input.bam > output.sam

Pattern: Create a custom crate for a project

cat > manifest.yaml << 'EOF'
manifest:
  name: my-analysis
  commands:
  - command: samtools
    docker_image: quay.io/biocontainers/samtools:1.17--hd87286a_2
    docker_args: "-i"
  - command: bedtools
    docker_image: quay.io/biocontainers/bedtools:2.31.0--hf5e1c6e_2
  - command: R
    docker_image: r-base:4.3
    docker_command: R
    docker_args: "-it"
  host_commands:
    - python3
    - git
EOF

# Activate directly from local file
bulkers activate ./manifest.yaml
samtools --version
bulkers deactivate

# Or exec without activating
bulkers exec ./manifest.yaml -- samtools --version

Pattern: Run a pipeline step

# Non-interactive, for scripts -- no shell function needed
bulkers exec databio/pepatac -- trim_galore --paired R1.fastq.gz R2.fastq.gz -o trimmed/
bulkers exec databio/pepatac -- bowtie2 -x /data/genomes/hg38 -1 trimmed/R1.fq.gz -2 trimmed/R2.fq.gz -S aligned.sam

Pattern: Check what's available before running

bulkers crate list
bulkers crate inspect databio/pepatac
# Shows: samtools, bowtie2, trim_galore, bedtools, ...

Pattern: Multiple crates at once

# Activate multiple crates (commands from all are on PATH)
bulkers activate databio/pepatac,bulker/coreutils

# Exec with multiple crates
bulkers exec databio/pepatac,bulker/coreutils -- samtools --version

Pattern: Strict mode (only crate commands on PATH)

# No host commands leak into the environment
bulkers activate -s databio/pepatac
bulkers exec -s databio/pepatac -- samtools --version

10. Glossary

Term Definition
Crate A loaded collection of containerized commands, identified by namespace/name:tag
Manifest YAML file defining the commands in a crate, their Docker images, and configuration
Namespace Organizational prefix (e.g., databio, bulker). Like a GitHub org.
Tag Version identifier for a crate (e.g., 1.0.13, default)
Registry HTTP endpoint serving manifest YAML files (default: hub.bulker.io)
Shimlink A symlink to the bulkers binary. When invoked (e.g., as samtools), bulkers checks argv[0] and dispatches to the right container at runtime.
Activate Auto-fetch a manifest, create shimlinks, and put them on your PATH. Works with registry crates and local files.
Exec Run a single command in a crate's environment without modifying your shell