How To

Deploy new instance of existing Dataflow Job

In following situations you will need to deploy a new instance of already running Dataflow job:

  • Dataflow template which job uses has been updated, and we want to migrate to the new version.

  • We are making incompatible changes to the job settings, which require a new instance to be started, because existing one cannot be updated in-place.

  • Dataflow job has failed, or got stuck and we want to start a new one.

Dataflow jobs YAML file (gcp/resources/dataflow_jobs.common.yaml) allows us to simultaneously deploy multiple instances for the same job, that will run in parallel. Normally, we run only one instance of a job, but in aforementioned cases we might need to first start a new instance, and then stop the old one, so that we don’t have any downtime in message processing.

In YAML file each Dataflow resource has a following field:

jobs:
  - version: 1
    templateVersion: '1.0.0'

To start a new instance of a job simply add another job version (optionally change the templateVersion field if the template has changed), and deploy it:

jobs:
  - version: 1
    templateVersion: '1.0.0'
  - version: 2
    templateVersion: '1.0.1'

After it has been successfully deployed, remove the old version, and deploy again:

jobs:
  - version: 2
    templateVersion: '1.0.1'