Error handling

Kopf tracks the status of the handlers (except for the low-level event handlers) catches the exceptions, and processes them from each of the handlers.

The last (or the final) exception is stored in the object’s status, and reported via the object’s events.

Note

Keep in mind, the Kubernetes events are often garbage-collected fast, e.g. less than 1 hour, so they are visible only soon after they are added. For persistence, the errors are also stored on the object’s status.

Temporary errors

If an exception raised inherits from kopf.TemporaryError, it will postpone the current handler for the next iteration, which can happen either immediately, or after some delay:

import kopf

@kopf.on.create('kopfexamples')
def create_fn(spec, **_):
    if not is_data_ready():
        raise kopf.TemporaryError("The data is not yet ready.", delay=60)

In that case, there is no need to sleep in the handler explicitly, thus blocking any other events, causes, and generally any other handlers on the same object from being handled (such as deletion or parallel handlers/sub-handlers).

Note

The multiple handlers and the sub-handlers are implemented via this kind of errors: if there are handlers left after the current cycle, a special retriable error is raised, which marks the current cycle as to be retried immediately, where it continues with the remaining handlers.

The only difference is that this special case produces fewer logs.

Permanent errors

If a raised exception inherits from kopf.PermanentError, the handler is considered as non-retriable and non-recoverable and completely failed.

Use this when the domain logic of the application means that there is no need to retry over time, as it will not become better:

import kopf

@kopf.on.create('kopfexamples')
def create_fn(spec, **_):
    valid_until = datetime.datetime.fromisoformat(spec['validUntil'])
    if valid_until <= datetime.datetime.now(datetime.timezone.utc):
        raise kopf.PermanentError("The object is not valid anymore.")

See also: Excluding handlers forever to prevent handlers from being invoked for the future change-sets even after the operator restarts.

Regular errors

Kopf assumes that any arbitrary errors (i.e. not kopf.TemporaryError and not kopf.PermanentError) are the environment’s issues and can self-resolve after some time.

As such, as default behaviour, Kopf retries the handlers with arbitrary errors infinitely until the handlers either succeed or fail permanently.

The reaction to the arbitrary errors can be configured:

import kopf

@kopf.on.create('kopfexamples', errors=kopf.ErrorsMode.PERMANENT)
def create_fn(spec, **_):
    raise Exception()

Possible values of errors are:

kopf.ErrorsMode.TEMPORARY (the default).
kopf.ErrorsMode.PERMANENT (prevent retries).
kopf.ErrorsMode.IGNORED (same as in the resource watching handlers).

Timeouts

The overall runtime of the handler can be limited:

import kopf

@kopf.on.create('kopfexamples', timeout=60*60)
def create_fn(spec, **_):
    raise kopf.TemporaryError(delay=60)

If the handler is not succeeded within this time, it is considered as fatally failed.

If the handler is an async coroutine and it is still running at the moment, an asyncio.TimeoutError is raised; there is no equivalent way of terminating the synchronous functions by force.

By default, there is no timeout, so the retries continue forever.

Retries

The number of retries can be limited too:

import kopf

@kopf.on.create('kopfexamples', retries=3)
def create_fn(spec, **_):
    raise Exception()

Once the number of retries is reached, the handler fails permanently.

By default, there is no limit, so the retries continue forever.

Backoff

The interval between retries on arbitrary errors, when an external environment is supposed to recover and be able to succeed the handler execution, can be configured:

import kopf

@kopf.on.create('kopfexamples', backoff=30)
def create_fn(spec, **_):
    raise Exception()

The default is 60 seconds.

Note

This only affects the arbitrary errors. When TemporaryError is explicitly used, the delay should be configured with delay=....