Skip to main content
Version: Release 23.1

Troubleshoot Install or Upgrade on Kubernetes

Problem: The product is deployed with a wrong property (wrong version, host, etc.)

Solution:

1. With xl kube install and generated yaml files

It is suggested first to clean existing deployments:

xl kube clean

For more information, see xl kube clean Command Reference.

Fix the wrong value using the generated yaml file (for example, Deploy/Release image version from CR yaml file) and reinstall with:

xl kube install --files <reference to the generated yaml files>

For more information see xl kube install Command Reference.

2. With xl kube install and generated answer file

If the product is deployed with a wrong property because of a mistake you made while answering questions from xl kube install wizard, it might be the simplest way to change that value directly in the answer file.

It is suggested first to clean the existing deployment:

xl kube clean

For more information, see xl kube clean Command Reference.

Let's say you entered wrong Keystore Passphrase, you can update it's value in answer file (the key is KeystorePassphrase).

Install the product using the updated answer file. New yaml files will be generated based on that.

xl kube install --answers <path to the generated answer files>

For more information see xl kube install Command Reference.

3. With kubectl

  • To edit Digital.ai Release's custom resource (CR), use:

    kubectl edit digitalaireleases.xlr.digital.ai dai-xlr -n digitalai

  • To edit Digital.ai Deploy's custom resource (CR), use:

    kubectl edit digitalaideploys.xld.digital.ai dai-xld -n digitalai

Save the updated custom resource file.


Problem: Entered wrong answer during the install/upgrade with --dry-run flag active

Solutions:

If flag --dry-run is used with xl kube install/upgrade command

With --dry-run flag nothing will be deployed, but yaml files and answers will be generated and kept on the filesystem. Change wrong values from the generated file as explained in the last section, and reinstall the product with xl kube install --files <reference to the generated yaml files>.

Another option is to run xl kube install --answers <path to the answers file> with generated and corrected answers file. As a result of this command, new yaml files will be generated and the product will be installed based on the updated answer file.


Problem: Pod(s) are not running

Solution: Use the xl kube check. command to check everything related to the operator and to store the resources describe details, yaml and logs. Investigate the collected data or use --zip-files flag to zip all collected files and results of the check and send it to your support team.


Problem: I upgraded the product but I need to fix a few wrong values for a few properties (wrong version, image tag, and so on)

Solution: You can fix your product (Deploy or Release) via one of the following two ways: use the generated answer files or use the generated yaml files.

  1. Using the generated answers file:

    Yaml files contain more parameters than the answers file and therefore might be difficult to edit. Answer files might be the better option if you just want to edit what you answered during the previous upgrade.

    Once you edit the answers file, repeat the upgrade (xl kube upgrade) with the updated answers file.

    xl kube upgrade --answers ./answers.yaml

    This will generate new yaml file from the xl-op-blueprints.

  2. Using the generated YAML file

    If you would like to repeat upgrade with already generated yaml files, which you might have updated to fix some upgrade issues, run the xl kube install --files command with a reference to the generated YAML files from the upgrade you want to repeat.

    xl kube install --files 20220824-153907

Digital.ai Deploy—PostgreSQL Pod is Not Starting

Problem

  • After installation on the cluster it is possible to have problems with starting PostgreSQL successfully.

  • The status of the PostgreSQL pod is CrashLoopBackOff.

  • In the container log looks like following:

    postgresql 09:41:26.12
    postgresql 09:41:26.12 Welcome to the Bitnami postgresql container │
    postgresql 09:41:26.13 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql │
    postgresql 09:41:26.13 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues │
    postgresql 09:41:26.13
    postgresql 09:41:26.15 INFO ==> ** Starting PostgreSQL setup **
    postgresql 09:41:26.17 INFO ==> Validating settings in POSTGRESQL_* env vars..
    postgresql 09:41:26.18 INFO ==> Loading custom pre-init scripts...
    postgresql 09:41:26.18 INFO ==> Initializing PostgreSQL database...
    postgresql 09:41:26.21 INFO ==> pg_hba.conf file not detected. Generating it...
    postgresql 09:41:26.22 INFO ==> Generating local authentication configuration

This could mean that this is a problem with volume permissions on the mounted file system for the PostgreSQL volume.

Solution

If you are on AWS EKS with EFS storage class, see the next troubleshooting section.

For other clusters try to enable following property on the CR:

spec.postgresql.volumePermissions.enabled: true

That option will try to fix volume permissions. To apply the CR change use:

$ kubectl patch -n $NAMESPACE --type=merge digitalaideploys.xld.digital.ai $CR_NAME -p '{"spec":{"postgresql":{"volumePermissions":{"enabled":"true"}}}}'

Replace the $NAMESPACE with the selected namespace where the Deploy installation will be done. Replace the $CR_NAME with the correct Deploy CR name, it is possible to check for the CR name with:

$ kubectl get -n $NAMESPACE digitalaideploys.xld.digital.ai`
NAME AGE
dai-xld 13h

Later, delete the existing PVC and postgres pod with:

$ kubectl delete -n $NAMESPACE --wait=false pvc data-dai-xld-postgresql-0
$ kubectl delete -n $NAMESPACE pod dai-xld-postgresql-0

Correct names of the resources depend on the CR name.


Digital.ai Deploy—PostgreSQL Pod is Not Starting on AWS EKS with EFS Storage Class

Problem

  • After installation on AWS EKS cluster it is possible to have problems with starting successfully PostgreSQL.

  • The status of the pod is CrashLoopBackOff.

  • In the container log looks like following:

    postgresql 09:41:26.12
    postgresql 09:41:26.12 Welcome to the Bitnami postgresql container │
    postgresql 09:41:26.13 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql │
    postgresql 09:41:26.13 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues │
    postgresql 09:41:26.13
    postgresql 09:41:26.15 INFO ==> ** Starting PostgreSQL setup **
    postgresql 09:41:26.17 INFO ==> Validating settings in POSTGRESQL_* env vars..
    postgresql 09:41:26.18 INFO ==> Loading custom pre-init scripts...
    postgresql 09:41:26.18 INFO ==> Initializing PostgreSQL database...
    postgresql 09:41:26.21 INFO ==> pg_hba.conf file not detected. Generating it...
    postgresql 09:41:26.22 INFO ==> Generating local authentication configuration

This could mean that this is a problem with volume permissions on the mounted file system for the PostgreSQL volume.

Solution

One way to solve the problem is to use with PostgreSQL other storage class, for example gp2.

If you are required to use AWS EKS based storage class, you can do following (example is based on efs.csi.aws.com provisioner):

  1. Create new AWS EKS storage class with differently setup permissions. Create following file efs-postgresql-sc.yaml:

    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
    name: efs-postgresql
    provisioner: efs.csi.aws.com
    parameters:
    basePath: /dynamic_provisioning
    provisioningMode: efs-ap
    fileSystemId: fs-XXXXXXXX
    directoryPerms: "700"
    gid: "1001"
    uid: "1001"

replace fs-XXXXXXXX with correct EFS file system ID.

  1. Apply the defined storage class on your cluster.

    kubectl apply -f efs-postgresql-sc.yaml
  2. Repeat installation of the Digital.ai Deploy with using PostgreSQL storage class name the one you just created: efs-postgresql-sc.


Digital.ai Deploy—RabbitMQ pod is not starting on AWS EKS with EFS storage class

Problem

  • After installation on AWS EKS cluster it is possible to have problems with starting successfully RabbitMQ.

  • The status of the pod changes periodically between Init:Error and Init:CrashLoopBackOff.

  • In the container volume-permissions log there is a line:

    volume-permissions chown: changing ownership of '/bitnami/rabbitmq/mnesia': Operation not permitted

This could mean that this is a problem with volume permissions on the mounted file system for the RabbitMQ volume.

Solution

One way to solve the problem is to use with RabbitMQ other storage class, for example gp2.

If you are required to use AWS EKS based storage class, you can do following (example is based on efs.csi.aws.com provisioner):

  1. Create new AWS EKS storage class with differently setup permissions. Create following file efs-rabbitmq-sc.yaml:

    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
    name: efs-rabbitmq
    provisioner: efs.csi.aws.com
    parameters:
    basePath: /dynamic_provisioning
    provisioningMode: efs-ap
    fileSystemId: fs-XXXXXXXX
    directoryPerms: "700"
    gid: "1001"
    uid: "1001"

Replace fs-XXXXXXXX with correct EFS file system ID.

  1. Apply the defined storage class on your cluster:

    kubectl apply -f efs-rabbitmq-sc.yaml
  2. Repeat installation of the Digital.ai Deploy with using RabbitMQ storage class name the one you just created: efs-rabbitmq-sc.


Digital.ai Release—PostgreSQL Pod is Not Starting

Problem

  • After installation on the cluster it is possible to have problems with starting PostgreSQL successfully.

  • The status of the PostgreSQL pod is CrashLoopBackOff.

  • In the container log looks like following:

    postgresql 09:41:26.12
    postgresql 09:41:26.12 Welcome to the Bitnami postgresql container │
    postgresql 09:41:26.13 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql │
    postgresql 09:41:26.13 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues │
    postgresql 09:41:26.13
    postgresql 09:41:26.15 INFO ==> ** Starting PostgreSQL setup **
    postgresql 09:41:26.17 INFO ==> Validating settings in POSTGRESQL_* env vars..
    postgresql 09:41:26.18 INFO ==> Loading custom pre-init scripts...
    postgresql 09:41:26.18 INFO ==> Initializing PostgreSQL database...
    postgresql 09:41:26.21 INFO ==> pg_hba.conf file not detected. Generating it...
    postgresql 09:41:26.22 INFO ==> Generating local authentication configuration

This could mean that this is a problem with volume permissions on the mounted file system for the PostgreSQL volume.

Solution

If you are on AWS EKS with EFS storage class, see the next troubleshooting section.

For other clusters try to enable following property on the CR:

spec.postgresql.volumePermissions.enabled: true

That option will try to fix volume permissions. To apply the CR change use:

$ kubectl patch -n $NAMESPACE --type=merge digitalaireleases.xlr.digital.ai $CR_NAME -p '{"spec":{"postgresql":{"volumePermissions":{"enabled":"true"}}}}'

Replace the $NAMESPACE with the selected namespace where the Release installation will be done. Replace the $CR_NAME with the correct Release CR name, it is possible to check for the CR name with:

$ kubectl get -n $NAMESPACE digitalaireleases.xlr.digital.ai`
NAME AGE
dai-xlr 13h

Later, delete the existing PVC and postgres pod with:

$ kubectl delete -n $NAMESPACE --wait=false pvc data-dai-xlr-postgresql-0
$ kubectl delete -n $NAMESPACE pod dai-xlr-postgresql-0

Correct names of the resources depend on the CR name.


Digital.ai Release—PostgreSQL Pod is Not Starting on AWS EKS with EFS Storage Class

Problem

  • After installation on AWS EKS cluster it is possible to have problems with starting successfully PostgreSQL.

  • The status of the pod is CrashLoopBackOff.

  • In the container log looks like following:

    postgresql 09:41:26.12
    postgresql 09:41:26.12 Welcome to the Bitnami postgresql container │
    postgresql 09:41:26.13 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql │
    postgresql 09:41:26.13 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql/issues │
    postgresql 09:41:26.13
    postgresql 09:41:26.15 INFO ==> ** Starting PostgreSQL setup **
    postgresql 09:41:26.17 INFO ==> Validating settings in POSTGRESQL_* env vars..
    postgresql 09:41:26.18 INFO ==> Loading custom pre-init scripts...
    postgresql 09:41:26.18 INFO ==> Initializing PostgreSQL database...
    postgresql 09:41:26.21 INFO ==> pg_hba.conf file not detected. Generating it...
    postgresql 09:41:26.22 INFO ==> Generating local authentication configuration

This could mean that this is a problem with volume permissions on the mounted file system for the PostgreSQL volume.

Solution

One way to solve the problem is to use with PostgreSQL other storage class, for example gp2.

If you are required to use AWS EKS based storage class, you can do following (example is based on efs.csi.aws.com provisioner):

  1. Create new AWS EKS storage class with differently setup permissions. Create following file efs-postgresql-sc.yaml:

    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
    name: efs-postgresql
    provisioner: efs.csi.aws.com
    parameters:
    basePath: /dynamic_provisioning
    provisioningMode: efs-ap
    fileSystemId: fs-XXXXXXXX
    directoryPerms: "700"
    gid: "1001"
    uid: "1001"

Replace fs-XXXXXXXX with correct EFS file system ID.

  1. Apply the defined storage class on your cluster.

    kubectl apply -f efs-postgresql-sc.yaml
  2. Repeat installation of the Digital.ai Release with using PostgreSQL storage class name the one you just created: efs-postgresql-sc.


Digital.ai Release—RabbitMQ pod is not starting on AWS EKS with EFS storage class

Problem

  • After installation on AWS EKS cluster it is possible to have problems with starting successfully RabbitMQ.

  • The status of the pod changes periodically between Init:Error and Init:CrashLoopBackOff.

  • In the container volume-permissions log there is a line:

    volume-permissions chown: changing ownership of '/bitnami/rabbitmq/mnesia': Operation not permitted

This could mean that this is a problem with volume permissions on the mounted file system for the RabbitMQ volume.

Solution

One way to solve the problem is to use with RabbitMQ other storage class, for example gp2.

If you are required to use AWS EKS based storage class, you can do following (example is based on efs.csi.aws.com provisioner):

  1. Create new AWS EKS storage class with differently setup permissions. Create following file efs-rabbitmq-sc.yaml:

    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
    name: efs-rabbitmq
    provisioner: efs.csi.aws.com
    parameters:
    basePath: /dynamic_provisioning
    provisioningMode: efs-ap
    fileSystemId: fs-XXXXXXXX
    directoryPerms: "700"
    gid: "1001"
    uid: "1001"

Replace fs-XXXXXXXX with correct EFS file system ID.

  1. Apply the defined storage class on your cluster:

    kubectl apply -f efs-rabbitmq-sc.yaml
  2. Repeat installation of the Digital.ai Release with using RabbitMQ storage class name the one you just created: efs-rabbitmq-sc.