Learn by examples

For all the examples please see the GitHub repository here. For annotations used on the examples please see the annotations section.

Example 1: Basic Pipeline

File: example_01.yaml

This is a simple pipeline definition that uses the autoprovisioning mechanism for documentstores.

This definition will direct the operator to autoprovision the documentstore.
Direct the operator to preserve the documentsture upon the deletion of the pipeline.
Define a pipeline called query that will run our queries on the documenstore.
Define a pipeline called indexing that will read files and index them in the documentstore.

This pipeline is used in our quickstart guide.

Example 2: Haystack Version

File: example_02.yaml

This example shows how to define the version of Haystack to be used in the pipeline.

spec.version is used to define the version of Haystack to be used in the pipeline.
The operator will download the specified version of Haystack and use it in the pipeline.

It is advised to use latest stable version of Haystack.

Example 3 & 4: Resource requests

File: example_03.yaml File: example_04.yaml

This example shows how to specify resource requests for the pipeline.

...
    'compute.pipelines.baler.gatecastle.com/request-cpu': '600m'
    'compute.pipelines.baler.gatecastle.com/request-memory': '2Gi'
    'compute.pipelines.baler.gatecastle.com/request-gpu': '0'
...

Make sure that you only request GPU if you have a GPU available in your cluster.

Example 5 & 6: Node Selectors

File: example_05.yaml File: example_06.yaml

This example shows how to specify node selectors for the pipeline. The operator will ensure that the pipeline runs on the specified node that matches node selector.

See more about node selectors here.

...
'compute.pipelines.baler.gatecastle.com/node-selector': 'gpu:true'
...

The above example assumes that you want to schedule the pipeline on a node that has the label gpu set to true.

You can specify multiple node selectors by separating them with a semicolon.

...
‘compute.pipelines.baler.gatecastle.com/node-selector’: ‘gpu:true;otherlabel:true’
...

Example 7: Tolerations

File: example_07.yaml

This example shows how to specify tolerations for the pipeline. The operator will ensure that the pipeline runs on the specified node that matches tolerations.

See more about tolerations here.

...
'compute.pipelines.baler.gatecastle.com/tolerations': 'gpu:Equal:true:NoSchedule'
...

The above example assumes that you want to schedule the pipeline on a node that has the taint gpu set to Equal and NoSchedule.

You can specify multiple tolerations by separating them with a semicolon.

...
‘compute.pipelines.baler.gatecastle.com/tolerations’: ‘gpu:Equal:true:NoSchedule;otherlabel:Equal:true:NoSchedule’
...

Example 8: Passing envvars

You can pass environment variables to the pipeline by using the env field in spec.

File: example_08.yaml

...
  env:
    - name: OPENAPI_KEY
      value: 'your_openai_key'
...

This directive will pass the environment variable to your haystack pod so it can be used in your pipeline runtime.

Example 9: Service Accounts

File: example_09.yaml

You can specify the service account name as annotation on the pipeline. The operator will sure that your pods starting up as a part of the pipeline will use the specified service account.

...
    'compute.pipelines.baler.gatecastle.com/service-account': 'example-09'
...

An examples service account yaml could look like this:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: example-09

Example 10: Reference Secrets

File: example_10.yaml

It is possible to reference secrets in the pipeline definition. This will help you to keep your secrets out of the pipeline definition.

You can reference your secrets just as you would in a pod definition. In your spec.components list, under the params field, you can reference the secret with the following syntax:

...
  components:
    - name: DocumentStore
      type: ElasticsearchDocumentStore
      params:
        host:
          valueFrom:
            secretKeyRef:
              name: elasticsearch
              key: host
        port: 9200
        embedding_dim: 384
...

The above statement will reference the secret elasticsearch and use the key host as the value for the host parameter. The value is decrpyted by your pipeline so least privilege can be applied.

Make sure that you use a service account on your pipeline that can read the required secrets!

Read more about Kubernetes secrets here.

The above feature enables devops to manage secrets in a secure way and developers to use them in their pipelines without actually knowing the secret.

Example 11: Image Pull Secrets

File: example_11.yaml

You can specify image pull secrets for the pipeline so if you use private image registries you can pull images from there.

...
    'compute.pipelines.baler.gatecastle.com/image-pull-secrets': 'dockerhub'
...

The above annotation will make sure that the pipeline can pull images using the secret dockerhub.

Please note that this secret must be available in the namespace where the pipeline is running and your service account must have the required permissions to use the secret.