How-To: Implementing InitContainers for Dependency Validation
Introduction
This guide provides step-by-step instructions for implementing initContainers in your Helm charts to validate service dependencies before your application starts. It assumes you have read the explanation documentation and understand the basic concepts of initContainers and their purpose.
Prerequisites
- Access to your service's Helm chart repository: Permissions to create branches and merge requests
- Basic familiarity: Understanding of YAML syntax, Helm templating, and Kubernetes pod lifecycle
- Explanation Document: Read "Understanding InitContainers for Dependency Validation" first
- kubectl access: Ability to view pods and logs in your target cluster (dev/staging/prod)
How to Add InitContainers to Your Service
This workflow adds initContainer support to your Helm chart's deployment template.
Step 1: Update deployment.yaml Template
Navigate to your service's <appname>-deploy repository and edit the chart/templates/deployment.yaml template to include an initContainers section.
File Location: ./chart/templates/deployment.yaml
Add the initContainers templating block in the pod spec, before the containers section:
# ... existing deployment.yaml template ...
spec:
template:
spec:
## INIT CONTAINERS SECTION ##
{{- if .Values.initContainers }}
initContainers:
{{- toYaml .Values.initContainers | nindent 8 }}
{{- end }}
## END INIT CONTAINERS SECTION ##
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.app.image.name }}"
imagePullPolicy: "{{ .Values.app.image.pullPolicy }}"
volumeMounts:
- name: env-creds
mountPath: "/app/config/creds.{{ .Values.app.env }}.json"
readOnly: true
subPath: "creds.{{ .Values.app.env }}.json"
- mountPath: /app/pems/
name: env-creds
ports:
- name: http
containerPort: {{ .Values.service.port }}
protocol: TCP
env:
- name: DEPLOY_ENV
value: "{{ .Values.app.env }}"
- name: NODE_ENV
value: "{{ .Values.app.env }}"
- name: GRAYLOG_HOST
value: logs.wwnorton.com
- name: GRAYLOG_PORT
value: '12201'
resources:
{{- toYaml .Values.resources | nindent 12 }}
readinessProbe:
{{- toYaml .Values.readinessProbe | nindent 12 }}
# ... rest of deployment.yaml template ...
Key points:
- The
{{- if .Values.initContainers }}conditional ensures initContainers are only added when defined in values.yaml - The
{{- toYaml .Values.initContainers | nindent 8 }}renders the complete initContainer array from values.yaml - InitContainers are placed before the
containerssection in the pod spec - The
nindent 8ensures proper YAML indentation (8 spaces for pod spec level)
Step 2: Commit and Test the Template Change
Commit the deployment.yaml change to your feature branch:
git add chart/templates/deployment.yaml
git commit -m "feat: Add initContainers template support"
git push origin feature/add-init-containers
Verification: At this point, your service should deploy exactly as before (no initContainers defined yet and changes need to be merged to the main branch to test the new template).
Step 3: Define InitContainers in values.yaml
Now you can define actual initContainer configurations at flux/<environment>/values.yaml files. See the following sections for specific examples.
How to Add a Database Dependency Check
Use this pattern when your service requires a database (PostgreSQL, MySQL, etc.) to be available before starting.
Step 1: Choose Your Database Check Image
For PostgreSQL:
image: postgres:<version>
For MySQL:
image: mysql:<version>
For MongoDB:
image: mongo:<version>
Step 2: Add Database InitContainer to values.yaml
File Location: ./flux/dev/values.yaml (or your environment-specific values file)
# ... existing configuration ...
initContainers:
- name: wait-for-postgres
image: postgres:13
env:
- name: PGHOST
value: "<AWS_RDS_HOST>"
- name: PGPORT
value: "5432"
- name: PGUSER
valueFrom:
secretKeyRef:
name: my-service-db-creds
key: db_username
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: my-service-db-creds
key: db_password
command: ["sh", "-c"]
args:
- |
echo "Waiting for PostgreSQL at $PGHOST:$PGPORT..."
MAX_RETRIES=10
RETRY=0
WAIT=2
until pg_isready -h "$PGHOST" -p "$PGPORT" -U "$PGUSER"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "ERROR: PostgreSQL not ready after $MAX_RETRIES attempts"
exit 1
fi
echo "Attempt $((RETRY + 1))/$MAX_RETRIES failed, waiting ${WAIT}s..."
sleep $WAIT
RETRY=$((RETRY + 1))
WAIT=$((WAIT * 2))
# Cap exponential backoff at 5 minutes
if [ $WAIT -gt 300 ]; then
WAIT=300
fi
done
echo "SUCCESS: PostgreSQL is ready!"
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
# ... rest of configuration ...
Step 3: Update Secret References
Ensure your database credentials secret exists and contains the required keys:
Using our AWS Secrets Manager with CSI driver setup, ensure your chart/templates/secretproviderclass.yaml includes the database credentials:
# In chart/templates/secretproviderclass.yaml
spec:
parameters:
objects: |
- objectName: "{{ .Values.app.env }}/{{ .Values.app.namespace }}/json/my-service"
objectType: "secretsmanager"
objectAlias: "creds.{{ .Values.app.env }}.json"
## NEW DB CREDENTIALS SECTION ##
- objectName: "{{ .Values.app.env }}/{{ .Values.app.namespace }}/env/db_username"
objectType: "secretsmanager"
objectAlias: "db_username"
- objectName: "{{ .Values.app.env }}/{{ .Values.app.namespace }}/env/db_password"
objectType: "secretsmanager"
objectAlias: "db_password"
## END NEW DB CREDENTIALS SECTION ##
secretObjects:
- data:
- key: creds.{{ .Values.app.env }}.json
objectName: creds.{{ .Values.app.env }}.json
secretName: my-service-creds
type: Opaque
## NEW DB CREDENTIALS SECTION ##
- data:
- key: db_username
objectName: db_username
- key: db_password
objectName: db_password
secretName: my-service-db-creds
type: Opaque
## END NEW DB CREDENTIALS SECTION ##
Step 4: Deploy and Verify
Create a new MR for the changes and merge it to the main branch. This will trigger a deployment of the service to the cluster.
Verify initContainer is running:
# Watch pod startup
kubectl get pods -n <app-namespace> -w
# You should see:
# my-service-xxx Init:0/1 0/1 PodInitializing 0 5s
# my-service-xxx Init:0/1 0/1 PodInitializing 0 7s
# my-service-xxx PodInitializing 0/1 PodInitializing 0 12s
# my-service-xxx Running 1/1 Running 0 15s
Check initContainer logs:
kubectl logs my-service-xxx -n <app-namespace> -c wait-for-postgres
You should see output like:
Waiting for PostgreSQL at <AWS_RDS_HOST>:5432...
Attempt 1/10 failed, waiting 2s...
Attempt 2/10 failed, waiting 4s...
SUCCESS: PostgreSQL is ready!
How to Add an HTTP API Dependency Check
Use this pattern when your service depends on another microservice or HTTP API being available.
Step 1: Add HTTP API InitContainer to values.yaml
File Location: ./flux/dev/values.yaml
initContainers:
- name: wait-for-auth-api
image: curlimages/curl:latest
command: ["sh", "-c"]
args:
- |
echo "Waiting for Auth API at http://services.dev.wwnorton.net/nortonauth/api/v1/token/jwks..."
MAX_RETRIES=8
RETRY=0
WAIT=2
URL="http://services.dev.wwnorton.net/nortonauth/api/v1/token/jwks"
until curl -f --connect-timeout 10 --max-time 30 "$URL"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "ERROR: Auth API not ready after $MAX_RETRIES attempts"
exit 1
fi
echo "Attempt $((RETRY + 1))/$MAX_RETRIES failed, waiting ${WAIT}s..."
sleep $WAIT
RETRY=$((RETRY + 1))
WAIT=$((WAIT * 2))
if [ $WAIT -gt 300 ]; then
WAIT=300
fi
done
echo "SUCCESS: Auth API is ready!"
resources:
requests:
cpu: 50m
memory: 32Mi
limits:
cpu: 100m
memory: 64Mi
Key parameters explained:
curl -f: Fail on HTTP errors (non-2xx responses)--connect-timeout 10: Wait max 10 seconds to establish connection--max-time 30: Total timeout for the entire request (30 seconds)MAX_RETRIES=8: Try up to 8 times before giving up- Exponential backoff: 2s → 4s → 8s → 16s → 32s → 64s → 128s → 256s
Step 2: Deploy and Verify
Create a new MR for the changes and merge it to the main branch. This will trigger a deployment of the service to the cluster.
# Check pod startup
kubectl get pods -n <app-namespace> -w
# You should see:
# my-service-xxx Init:0/1 0/1 PodInitializing 0 5s
# my-service-xxx Init:0/1 0/1 PodInitializing 0 7s
# my-service-xxx PodInitializing 0/1 PodInitializing 0 12s
# my-service-xxx Running 1/1 Running 0 15s
Check initContainer logs:
kubectl logs my-service-xxx -n <app-namespace> -c wait-for-auth-api
You should see output like:
Waiting for Auth API at http://services.dev.wwnorton.net/nortonauth/api/v1/token/jwks...
Attempt 1/8 failed, waiting 2s...
Attempt 2/8 failed, waiting 4s...
SUCCESS: Auth API is ready!
How to Add Multiple Dependencies
Services often depend on multiple resources (database + multiple APIs). Define multiple initContainers to check each dependency sequentially.
Complete Example: Service with Database and Two APIs
File Location: ./flux/dev/values.yaml
initContainers:
# 1. Check database first (foundational dependency)
- name: wait-for-postgres
image: postgres:13
env:
- name: PGHOST
value: "<AWS_RDS_HOST>"
- name: PGPORT
value: "5432"
- name: PGUSER
valueFrom:
secretKeyRef:
name: my-service-db-creds
key: db_username
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: my-service-db-creds
key: db_password
command: ["sh", "-c"]
args:
- |
echo "Checking PostgreSQL..."
MAX_RETRIES=10
RETRY=0
WAIT=2
until pg_isready -h "$PGHOST" -p "$PGPORT" -U "$PGUSER"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "ERROR: PostgreSQL not ready"
exit 1
fi
echo "Retry $((RETRY + 1))/$MAX_RETRIES..."
sleep $WAIT
RETRY=$((RETRY + 1))
WAIT=$((WAIT * 2))
[ $WAIT -gt 300 ] && WAIT=300
done
echo "PostgreSQL ready"
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
# 2. Check auth service (depends on database)
- name: wait-for-auth-api
image: curlimages/curl:latest
command: ["sh", "-c"]
args:
- |
echo "Checking Auth API..."
MAX_RETRIES=8
RETRY=0
WAIT=2
URL="http://services.dev.wwnorton.net/nortonauth/api/v1/token/jwks"
until curl -f --connect-timeout 10 --max-time 30 "$URL"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "ERROR: Auth API not ready"
exit 1
fi
echo "Retry $((RETRY + 1))/$MAX_RETRIES..."
sleep $WAIT
RETRY=$((RETRY + 1))
WAIT=$((WAIT * 2))
[ $WAIT -gt 300 ] && WAIT=300
done
echo "Auth API ready"
resources:
requests:
cpu: 50m
memory: 32Mi
limits:
cpu: 100m
memory: 64Mi
# 3. Check entitlement service (higher-level dependency)
- name: wait-for-entitlement-api
image: curlimages/curl:latest
command: ["sh", "-c"]
args:
- |
echo "Checking Entitlement API..."
MAX_RETRIES=8
RETRY=0
WAIT=2
URL="http://services.dev.wwnorton.net/entitlement/entitlement/status"
until curl -f --connect-timeout 10 --max-time 30 "$URL"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "ERROR: Entitlement API not ready"
exit 1
fi
echo "Retry $((RETRY + 1))/$MAX_RETRIES..."
sleep $WAIT
RETRY=$((RETRY + 1))
WAIT=$((WAIT * 2))
[ $WAIT -gt 300 ] && WAIT=300
done
echo "Entitlement API ready"
resources:
requests:
cpu: 50m
memory: 32Mi
limits:
cpu: 100m
memory: 64Mi
Ordering rationale:
- Database first: Most fundamental dependency
- Auth API second: Depends on database, needed by higher-level services
- Entitlement API last: Depends on both database and auth
What happens during startup:
Pod Created → Init:0/3 (checking database)
↓ (database ready in 5s)
Init:1/3 (checking auth API)
↓ (auth ready in 3s)
Init:2/3 (checking entitlement API)
↓ (entitlement ready in 2s)
PodInitializing (application containers starting)
↓
Running (application containers running)
Total init time: ~10-15 seconds (sum of all checks if all are successful)
How to Configure Environment-Specific Timeouts
Different environments have different performance characteristics. Configure timeouts appropriately per environment. Select your target environment below:
| Setting | Development | Staging | Production |
|---|---|---|---|
| Strategy | Fail fast | Balanced | Conservative |
| Max Retries | 3 | 5 | 10 |
| Timeout per attempt | 10s | 30s | 60s |
| Backoff | Fixed 2s | Exponential | Exponential (capped 5min) |
| Max total wait | ~30s | ~3min | ~10min |
- Development
- Staging
- Production
File Location: ./flux/dev/values.yaml
Optimized for fast feedback — fails quickly so developers aren't waiting during iteration.
initContainers:
- name: wait-for-postgres
image: postgres:13
env:
- name: PGHOST
value: <AWS_RDS_HOST>
- name: PGPORT
value: "5432"
- name: MAX_RETRIES
value: "3" # Fail fast in dev
- name: TIMEOUT
value: "10" # Short timeout
command: ["sh", "-c"]
args:
- |
echo "Development environment - using fast failure mode"
RETRY=0
until timeout $TIMEOUT pg_isready -h "$PGHOST" -p "$PGPORT"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "Failed quickly for fast dev feedback"
exit 1
fi
echo "Retry $((RETRY + 1))/$MAX_RETRIES..."
sleep 2
RETRY=$((RETRY + 1))
done
echo "Database ready"
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
File Location: ./flux/stg/values.yaml
Production-like behavior with moderate tolerance — balances reliability with reasonable wait times.
initContainers:
- name: wait-for-postgres
image: postgres:14
env:
- name: PGHOST
value: <AWS_RDS_HOST>
- name: PGPORT
value: "5432"
- name: MAX_RETRIES
value: "5" # Moderate retries
- name: TIMEOUT
value: "30" # Moderate timeout
command: ["sh", "-c"]
args:
- |
echo "Staging environment - using balanced configuration"
RETRY=0
WAIT=2
until timeout $TIMEOUT pg_isready -h "$PGHOST" -p "$PGPORT"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "Failed after $MAX_RETRIES attempts"
exit 1
fi
echo "Retry $((RETRY + 1))/$MAX_RETRIES, waiting ${WAIT}s..."
sleep $WAIT
RETRY=$((RETRY + 1))
WAIT=$((WAIT * 2))
done
echo "Database ready"
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
File Location: ./flux/production/values.yaml
Conservative configuration — maximizes reliability with generous timeouts and capped exponential backoff.
initContainers:
- name: wait-for-postgres
image: postgres:14
env:
- name: PGHOST
value: <AWS_RDS_HOST>
- name: PGPORT
value: "5432"
- name: MAX_RETRIES
value: "10" # Many retries for reliability
- name: TIMEOUT
value: "60" # Generous timeout
command: ["sh", "-c"]
args:
- |
echo "Production environment - using conservative configuration"
RETRY=0
WAIT=2
until timeout $TIMEOUT pg_isready -h "$PGHOST" -p "$PGPORT"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "CRITICAL: Database unavailable after $MAX_RETRIES attempts"
exit 1
fi
echo "Attempt $((RETRY + 1))/$MAX_RETRIES, waiting ${WAIT}s..."
sleep $WAIT
RETRY=$((RETRY + 1))
WAIT=$((WAIT * 2))
[ $WAIT -gt 300 ] && WAIT=300 # Cap at 5 minutes
done
echo "Database ready"
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
How to Test InitContainers Locally
Before deploying to a given environment, you can test initContainer scripts locally or in isolation.
Test your script logic using Docker:
# For PostgreSQL check
docker run --rm -it postgres:14 sh -c '
export PGHOST=<AWS_RDS_HOST>
export PGPORT=5432
MAX_RETRIES=3
RETRY=0
until pg_isready -h "$PGHOST" -p "$PGPORT"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "Failed"
exit 1
fi
echo "Retry $((RETRY + 1))..."
sleep 2
RETRY=$((RETRY + 1))
done
echo "Success"
'
# For HTTP API check
docker run --rm -it curlimages/curl sh -c '
URL="http://<API_HOST>/health"
MAX_RETRIES=3
RETRY=0
until curl -f "$URL"; do
if [ $RETRY -ge $MAX_RETRIES ]; then
echo "Failed"
exit 1
fi
echo "Retry $((RETRY + 1))..."
sleep 2
RETRY=$((RETRY + 1))
done
echo "Success"
'
Troubleshooting Common Issues
Issue: Pod Stuck in Init:0/N Status
Symptoms:
- Pod shows
Init:0/Nfor extended period (>5 minutes) - First initContainer not completing
- No progress in startup
Diagnosis:
# Check initContainer logs
kubectl logs <pod-name> -n <namespace> -c <init-container-name>
# Check pod events
kubectl describe pod <pod-name> -n <namespace>
# Look for:
# - Image pull errors
# - Connection timeouts
# - Script errors
Common causes:
-
Dependency actually unavailable
# Verify dependency service is running
kubectl get pods -n <dependency-namespace>
# Test connectivity from a debug pod
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl http://dependency-service/health -
Credential issues (for database checks)
# Verify secret exists and has correct keys
kubectl get secret my-service-creds -n <namespace>
kubectl describe secret my-service-creds -n <namespace>
Resolution:
- Fix the underlying dependency issue
- Check credentials in secrets
- Consider temporarily increasing
MAX_RETRIESwhile debugging
Issue: InitContainer CrashLoopBackOff
Symptoms:
- Pod shows
Init:ErrororInit:CrashLoopBackOff - RESTARTS count increasing
- Pod keeps restarting
Diagnosis:
# View logs from failed initContainer
kubectl logs <pod-name> -n <namespace> -c <init-container-name> --previous
# Check exit code
kubectl describe pod <pod-name> -n <namespace>
# Look for: Last State: Terminated, Exit Code: <number>
Common causes:
-
Script syntax error
- Look for shell script errors in logs
- Test script locally with Docker
-
Wrong image
# Make sure you're using correct image for the check type
# PostgreSQL: postgres:14
# MySQL: mysql:8
# HTTP checks: curlimages/curl:latest -
Missing environment variables
- Verify all required env vars are defined
- Check for typos in variable names
Resolution:
- Fix script syntax errors
- Use correct container image
- Ensure all environment variables are properly defined
Issue: Multiple Pods Stuck Initializing
Symptoms:
- Many pods stuck in Init phase simultaneously
- Deployment rollout stuck
- Cluster-wide startup issues
Diagnosis:
# Check how many pods are stuck
kubectl get pods --all-namespaces | grep Init
# Check cluster resource availability
kubectl top nodes
kubectl describe nodes | grep -A 5 "Allocated resources"
Common causes:
-
Cluster capacity issues
- Too many pods initializing at once
- Not enough CPU/memory for init containers
-
Shared dependency overloaded
- Database connection pool exhausted
- API rate limits hit
Resolution:
- Stagger deployments across services
- Increase resource limits on shared dependencies
Monitoring
Once your initContainers are deployed, you should verify their behavior using the platform observability tools. We have configured specific dashboards and log queries to help you distinguish between a slow startup and a broken deployment.
Using the InitContainers Dashboard
Access the Kubernetes / Compute Resources / InitContainers dashboard in Grafana. This dashboard provides a real-time view of dependency validation across the cluster.
Key Panels to Watch:
| Panel Title | Description | Thresholds | Action Required |
|---|---|---|---|
| Pods Initializing | The volume of pods currently running init scripts. | > 10 turns Red | If high during a deployment, check if a shared dependency (like a DB) is overwhelmed. |
| Stuck Pods (>5min) | Pods that have been in the Init phase for over 300 seconds. | > 1 turns Red | Immediate investigation needed. The init script is likely hanging or hitting a timeout cap. |
| InitContainer Restart Counts | How many times the init container has failed and retried. | > 5 (Orange) and > 10 (Red) | Indicates the check is failing repeatedly. Check logs for connection errors. |
Querying Logs with Loki
If the dashboard shows high restart counts or stuck pods, you need to inspect the logs of the specific initContainer.
Note that standard log views often default to the main application container; you must explicitly select the initContainer name.
You can use the Explore view in Grafana with the Loki datasource. Use the following query structure to filter specifically for your validation script:
{k8s_namespace_name="<your-namespace>", k8s_deployment_name="<your-app>", k8s_container_name="<init-container-name>"}
Example for an Auth Service dependency check: To view the logs for the wait-for-auth-api container in the entitlement-service-app deployment:
{k8s_namespace_name="nas", k8s_deployment_name="entitlement-service-app", k8s_container_name="wait-for-auth-api"}
Example live query: Wait for Auth API Logs
What to look for in logs:
-
Timestamp gaps: Specific lines appearing 2s, 4s, 8s apart confirm your backoff logic is working.
-
Specific Error Messages: Connection refused implies the target is down; Could not resolve host implies a DNS or configuration error.
-
"Success" message: Confirm that the script actually reached the "Success" echo statement before exiting.
Best Practices
Keep InitContainers Simple
Do:
- Use simple shell scripts with clear logic
- Focus on single responsibility (one check per initContainer)
- Use standard tools (pg_isready, curl, nc)
Don't:
- Complex multi-step logic in single initContainer
- Application business logic in init scripts
- Parsing complex JSON responses
Set Appropriate Resource Limits
Always define resources:
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
Use Structured Logging
Good - Structured output:
echo "INFO: Starting database check at $(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "WARN: Attempt 3/10 failed, retrying in 4s"
echo "ERROR: Database unavailable after 10 attempts"
echo "SUCCESS: Database ready"
Document Your Dependencies
Add comments in values.yaml explaining dependencies:
initContainers:
# Database: Required for all operations
# Checks that PostgreSQL can accept connections
# Timeout: 60s per attempt, 10 retries = ~10min max
- name: wait-for-postgres
# ...
# Auth API: Required for user validation
# Must start after database is available
# Timeout: 30s per attempt, 8 retries = ~8min max
- name: wait-for-auth-api
# ...
Test Before Production
Deployment checklist:
- Test in lower environments first
- Verify initContainer logs are clear and helpful
- Measure startup time increase
- Test failure scenarios (stop dependency, observe behavior)
- Document expected startup time for monitoring
Additional Resources
- Kubernetes Documentation: Init Containers
- Helm Documentation: Values Files
- Internal Resources:
- Explanation doc: Understanding InitContainers for Dependency Validation