Troubleshooting

Here you can find the solution to most of the common errors.

Debug with a Cursor Agent Skill

If you use Cursor, you can install an optional Agent Skill that walks through systematic CoCo checks on your cluster (Trustee and OSC operators, PCR8 consistency, attestation policies, sealed secrets, and failing pods).

The skill and a companion diagnostic script live in the workshop-skill repository.

To install it for Cursor:

Clone the repository and copy the skill into your personal skills directory:

git clone https://github.com/confidential-devhub/workshop-skill.git
cd workshop-skill
mkdir -p ~/.cursor/skills/coco-aro-diagnostics
cp coco-aro-diagnostics.md ~/.cursor/skills/coco-aro-diagnostics/SKILL.md

Optional: install the coco-diagnostics helper script (requires oc and jq, and an active oc login):
```
./install.sh
```

In Cursor, open a chat and ask the agent to help debug your workshop deployment (for example: "Diagnose why my CoCo pod is stuck"). The agent applies the skill when relevant. You can also run ./coco-diagnostics.sh --full from the cloned repository for a quick command-line health report.

Follow the order

Order is very important. If you install OSC and insert an empty/incomplete INITDATA in it (for example empty trustee URL), the CoCo pod will not start because it will try to connect to a non-existing service, getting 404 and therefore preventing the pod from creating.

In general, the order is:

Install & Configure Trustee. This allows us to know the Trustee route to provide to OSC, but also prepare the secrets needed by the CoCo pods (image signature, sealed secrets, and so on).
Install & Configure OSC. Now that we know the Trustee route, we have all pieces necessary to start a CoCo pod.

It is possible to follow alternative orders, but you need to then remember to go back to the various configs and update them and refresh the respective deployments.

CoCo pod doesn’t start

In this section, we cover the cases where the pod doesn’t start at all. No CVM is created in Azure, and no container is started.

Check the Pod Events

Most of the time, errors will be propragated to the pod events. Make sure to check that first.

Image pull fail

If the CVM doesn’t have enough disk space (set with ROOT_VOLUME_SIZE in peer-pods-cm), image pulling might fail because the CoCo components need to pull the container image into the CVM and due to space fail to do so.

Usually the error is visible in the pod events:

Error: CreateContainer failed: rpc status: Status { code: INTERNAL, message: "[CDH] [ERROR]: Image Pull error: Failed to pull image  from all mirror/mapping locations or original location: image: , error: Errors happened when pulling image: Failed to decode layer data stream: Failed to unpack layer"

In order to solve this issue, try to increase ROOT_VOLUME_SIZE and restart the OSC deployment.

Wrong Azure instance size

Not all region support all the available confidential instance sizes. For example eastus only supports Standard_DC*as_v5, and not Standard_DC*es_v5.

If you use the wrong instance size, you will see the following error in the pod events:

failed to create pod sandbox: rpc error: code = Unknown desc = CreateContainer failed:
remote hypervisor call failed: rpc error: code = Unknown desc = creating an instance :
Creating instance (): beginning VM creation or update:
PUT https://management.azure.com/subscriptions/.../
--------------------------------------------------------------------------------
RESPONSE 400: 400 Bad Request ERROR CODE: InvalidParameter
--------------------------------------------------------------------------------
{ "error": { "code": "InvalidParameter", "message": "The requested VM size Standard_DC4es_v5
 is not available in the current region. The sizes available in the current region are: ...

The key message here is The requested VM size Standard_DC4es_v5 is not available in the current region. Make sure you are using a VM size that is supported.

If you update AZURE_INSTANCE_SIZE in peer-pods-cm, make sure you restart the OSC deployment.

CoCo pod stuck

In this section, we cover the cases where the pod or at least the VM has started, but the container fails to start.

Peer-pods gateway not working/set correctly

This is usually happening because the CoCo internal components are not able to connect back to the worker node.

Upon inspection of the OSC operator osc-caa-ds deployment, you should see it is stuck printing the following logs:

POD_NAME=$(oc get pods -n trustee-operator-system -l app=kbs -o jsonpath='{.items[0].metadata.name}')

echo ""
oc logs -n trustee-operator-system "$POD_NAME"

2025/12/05 16:47:39 [adaptor/proxy] Retrying failed agent proxy connection: dial tcp 10.0.128.4:15150: connect: connection refused
2025/12/05 16:47:39 [adaptor/proxy] Retrying failed agent proxy connection: dial tcp 10.0.128.4:15150: connect: connection refused
2025/12/05 16:47:39 [adaptor/proxy] Retrying failed agent proxy connection: dial tcp 10.0.128.4:15150: connect: connection refused
2025/12/05 16:47:39 [adaptor/proxy] Retrying failed agent proxy connection: dial tcp 10.0.128.4:15150: connect: connection refused
2025/12/05 16:47:39 [adaptor/proxy] Retrying failed agent proxy connection: dial tcp 10.0.128.4:15150: connect: connection refused
2025/12/05 16:47:39 [adaptor/proxy] Retrying failed agent proxy connection: dial tcp 10.0.128.4:15150: connect: connection refused

In order to fix this, make sure you have:

Logged in with the right Azure SP.

AZ_CID=$(oc get secrets/azure-credentials -n kube-system -o json | jq -r .data.azure_client_id | base64 -d)

AZ_CS=$(oc get secrets/azure-credentials -n kube-system -o json | jq -r .data.azure_client_secret | base64 -d)

AZ_TID=$(oc get secrets/azure-credentials -n kube-system -o json | jq -r .data.azure_tenant_id | base64 -d)

echo azure_client_id $AZ_CID
echo azure_client_secret $AZ_CS
echo azure_tenant_id $AZ_TID

az login --service-principal -u $AZ_CID -p $AZ_CS --tenant $AZ_TID

Created and configured the Azure public IP address and NAT Gateway.

INITDATA wrongly set

Ispect the INITDATA. Inspect the INITDATA field in the peer-pods-cm configmap in OSC namespace, or alternatively if you are providing manually in as annotation.

echo "H4sIAAAAAAAAA+1YW3PiuBJ+969wsQ/ZLYaLuTNV8yCMIYTYgG0gZCqVErZsZBvLSDLGnDr//cghk8qZ..." \
	| base64 -d -w0 | gunzip

Make sure all url= point to the right Trustee route
Make sure cert= and kbs_cert= are not empty if using https
Make sure image_security_policy_uri points at an existing policy, and the policy points to an existing secret
Make sure the formatting and syntax is correct. Refer to this initdata for an example with correct syntax.

If you update the INITDATA, remember to update the PCR8 in the reference values and restart the Trustee deployment! Also if INITDATA is added/updated in peer-pods-cm, remember to restart the OSC deployment!

Image signature policy too strict

If you put an image signature too strict and you forget about it, the CVM will start but the pod will not run, because it isn’t allowed to.

For example, the policy used in this example is pretty permissive, as it has "default": [ { "type": "insecureAcceptAnything" } ]. However, if you change the type to reject, then you drastically reduce the amounts of CoCo pod images you can run.

If you update the signature policy, remember to restart the Trustee deployment!

CoCo pod attestation failed

In this section, we cover the cases where the pod has fully started, but it can’t manage to get any secret from the Trustee.

Do you see any request coming in the trustee-deployment logs?

POD_NAME=$(oc get pods -n trustee-operator-system -l app=kbs -o jsonpath='{.items[0].metadata.name}')

echo ""
oc logs -n trustee-operator-system "$POD_NAME"

HTTPS certificates

If not, then it’s a routing issue. If you are using passthrough, are you using the right https certificates?

A common error is not pointing dnsNames to the Trustee route, or if creating the certs manually, not using subjectAltName=DNS:$ROUTE.

If you update the https certificates, remember to restart the Trustee deployment!

Connectivity between Azure and Trustee

If you are deploying Trustee in a separate cluster, make sure the connectivity between the Trustee and Azure OCP cluster is working.

Policy Deny

If you do see the request coming into the Trustee pod, check what it says. Usually it’s something like PolicyDeny.

If you see that error, make sure the following is correct:

The attestation policy is formatted correctly. Sometimes wrong formatting or syntax errors prevent it from being loaded.
Same applies to the reference values.
- If you changed the podvm image to be used, the reference values also change!
- If you changed the INITDATA, the reference values (PCR8) also change!
Lastly check also the resource access policy.

After ANY change to any config/secret of the Trustee operator, remember to restart the deployment!

Wrong reference values

In this workshop, the reference values are taken from the latest OSC available image, as we assume the operator is freshly installed. However, if OSC is not the latest version, or has not been updated, it might happen that latest won’t match with the OSC deployed image.

If this happens, then the reference values will not match, as the images are different, and therefore some PCRs will differ.

The easiest way to be sure the image is correct is to use the tag matching the operator version. For example, if you installed OSC 1.11.1, look for the same tag in the osc-dm-verity-image container repo.

You can see all available tags with the following command:

podman search --list-tags registry.redhat.io/openshift-sandboxed-containers/osc-dm-verity-image --authfile ./cluster-pull-secret.json

Once you found the right tag, replace OSC_VERSION with the right tag and repeat the reference values creation step.

It could be very easy to think that a quick way to get the right image is to directly ask the OSC operator, as it is stored in the CSV under RELATED_IMAGE_PODVM_OCI.

However, this is highly unsafe, especially in production environments! The OSC operator runs in the untrusted cluster, therefore queying anything from it is against the CoCo Threath model. The CSV could be easily compromised and point to a malicious image, leading to Trustee downloading and using reference values that successfully attest it, causing a compromised image to access Trustee secrets.

Sealed secret contains gibberish

If you mounted a sealed secret, but when reading it the application reads something like secret=sealed.fakejwsheader… it simply means that the attestation failed. What you are reading is the "pointer" original value inside the secret, and since attestation failed it was not replaced by the actual secret content. Check the connectivity with Trustee and its logs.

CoCo pod inaccessible

All is working apparently, but you cannot see any log or access the pod.

INITDATA too strict

If your pod is running, and service/route are working, but you cannot see logs nor exec, remember that this is controlled by INITDATA. If no INITDATA is provided in the CoCo pod via annotation, the global default one in peer-pods-cm will be used. If no global is set, default permissive settings will be used (no signature image verification, no Trustee connection, exec and logs allowed).

Check the policy.rego section.

If default ExecProcessRequest := false, exec will be forbidden.
- It is possible to allow only specific exec commands. Refer to the initdata example.
If default ReadStreamRequest := false, logs are not shown.

Facing an error that is not listed here?

If you have access to the Red Hat workspace on Slack, feel free to post the error on #forum-sandboxed-containers! If you don’t, contact a Red Hat representative and forward your question to him/her.