How To
Deploy new instance of existing Dataflow Job
In following situations you will need to deploy a new instance of already running Dataflow job:
-
Dataflow template which job uses has been updated, and we want to migrate to the new version.
-
We are making incompatible changes to the job settings, which require a new instance to be started, because existing one cannot be updated in-place.
-
Dataflow job has failed, or got stuck and we want to start a new one.
Dataflow jobs YAML file (gcp/resources/dataflow_jobs.common.yaml) allows us to simultaneously deploy multiple instances for the same job, that will run in parallel. Normally, we run only one instance of a job, but in aforementioned cases we might need to first start a new instance, and then stop the old one, so that we don’t have any downtime in message processing.
In YAML file each Dataflow resource has a following field:
jobs:
- version: 1
templateVersion: '1.0.0'
To start a new instance of a job simply add another job version (optionally change the templateVersion field if the template has changed), and deploy it:
jobs:
- version: 1
templateVersion: '1.0.0'
- version: 2
templateVersion: '1.0.1'
After it has been successfully deployed, remove the old version, and deploy again:
jobs:
- version: 2
templateVersion: '1.0.1'