Skip to main content
Version: 0.4

Common problems

This document describes how to identify and resolve common Capact problems that might occur.

Action

In this section, you can find common Action failures that might occur.

Action does not have status

Symptoms:

Debugging steps:

Action stuck in the BeingRendered phase

Rendering more complex workflow may take a few minutes. An Action in BeingRendered for more than 15 minutes may mean that it is stuck.

Symptoms:

  • An Action was created more than 15 minutes ago. To check the AGE column, run:

    kubectl get actions.core.capact.io ${ACTION_NAME} -n {ACTION_NAMESPACE}

Debugging steps:

Action in the Failed phase

The action may fail for a variety of reasons. First what you need to do is to check the status message.

Debugging steps:

  • Check the Action status message. If status message contains: while fetching latest Interface revision string: cannot find the latest revision for Interface "cap.interfac.db.install" (giving up - exceeded 15 retries):

  • Check the Engine logs. You can grep logs using Action name. This will narrow-down the number of log entries. The common problem can be that the Engine doesn't have proper permission to schedule Action execution, e.g. cannot create ServiceAccount, Secret, Argo Workflow. Ensure that the k8s-engine-role ClusterRole in the capact-system Namespace has all necessary permissions.

  • Check the Action execution.

Clean up Action execution pods

After Action execution there are a lot of Pods with name pattern {ACTION_NAME}-{RANDOM_10_DIGITS} in theCompleted state.

NAME                READY   STATUS      RESTARTS   AGE
mattermost-1602179194 0/2 Completed 0 14d
mattermost-2270774275 0/2 Completed 0 14d
mattermost-823541112 0/2 Completed 0 14d
mattermost-470211537 0/2 Completed 0 14d
mattermost-1030672350 0/2 Completed 0 14d
mattermost-147207013 0/2 Completed 0 14d
mattermost-2768336525 0/2 Completed 0 14d
mattermost-3634435893 0/2 Completed 0 14d
mattermost-4236050029 0/2 Completed 0 14d
mattermost-2282111071 0/2 Completed 0 14d
mattermost-3762917690 0/2 Completed 0 14d
mattermost-4129897782 0/2 Completed 0 14d
mattermost-1307838837 0/2 Completed 0 14d
mattermost-2309417707 0/2 Completed 0 14d
mattermost-1619688498-1 1/1 Running 0 12d
mattermost-1619688498-0 1/1 Running 0 12d

Those Pods were created by Argo Workflow and each of them represent executed Action step e.g. create a database, create user in the database etc. For failed Actions they are useful to debug the root cause of an error. For successfully execute Action you can remove them. To remove only Argo Workflow Pods, run:

kubectl delete workflows.argoproj.io {ACTION_NAME} -n {ACTION_NAMESPACE}

To remove Action and all resources associated with it (Argo Workflow Pods, ServiceAccount, user input data etc.), run:

capact action delete {ACTION_NAME} -n {ACTION_NAMESPACE}

Wrong Implementation was selected

Actions may define theirs dependencies via Interfaces. Depending on cluster Policy configuration, every time user runs an Action, a different Implementation may be picked for a given Interface.

Symptoms:

  • Rendered Action workflow contains Implementation which should not be used.

  • Executed Action create resources in the unexpected destination. For example, deployed PostgreSQL on a cluster instead of provisioning RDS instance on AWS side.

Debugging steps:

Namespace stuck in the Terminating state

If the Namespace has been marked for deletion and the Capact components were removed before, the Namespace may become stuck in the Terminating state. This is typically due to the fact that the Capact Engine cannot execute clean-up logic and remove the finalizer from the Action resource. To resolve it, you need to remove the finalizer from the Action:

kubectl patch actions.core.capact.io ${ACTION_NAME} -n {ACTION_NAMESPACE} -p '{"metadata":{"finalizers":null}}' --type=merge

Unreachable Gateway

Gateway aggregates GraphQL APIs from the Capact Engine, Public Hub, and Local Hub. If one of the aggregated component is not working properly, Gateway is not working either.

Symptoms:

  • Gateway responds with the 502 status code.

  • Gateway logs contain a message similar to: while introspecting GraphQL schemas: while introspecting schemas with retry: while introspecting schemas: invalid character 'l' looking for beginning of value.

  • Gateway Pod is frequently restarting.

Debugging steps: