Agent Skills for Claude Code | Kubernetes Specialist
| Domain | Infrastructure & Cloud |
| Role | specialist |
| Scope | infrastructure |
| Output | manifests |
Triggers: Kubernetes, K8s, kubectl, Helm, container orchestration, pod deployment, RBAC, NetworkPolicy, Ingress, StatefulSet, Operator, CRD, CustomResourceDefinition, ArgoCD, Flux, GitOps, Istio, Linkerd, service mesh, multi-cluster, cost optimization, VPA, spot instances
Related Skills: DevOps Engineer · Cloud Architect · SRE Engineer
When to Use This Skill
Section titled “When to Use This Skill”- Deploying workloads (Deployments, StatefulSets, DaemonSets, Jobs)
- Configuring networking (Services, Ingress, NetworkPolicies)
- Managing configuration (ConfigMaps, Secrets, environment variables)
- Setting up persistent storage (PV, PVC, StorageClasses)
- Creating Helm charts for application packaging
- Troubleshooting cluster and workload issues
- Implementing security best practices
Core Workflow
Section titled “Core Workflow”- Analyze requirements — Understand workload characteristics, scaling needs, security requirements
- Design architecture — Choose workload types, networking patterns, storage solutions
- Implement manifests — Create declarative YAML with proper resource limits, health checks
- Secure — Apply RBAC, NetworkPolicies, Pod Security Standards, least privilege
- Validate — Run
kubectl rollout status,kubectl get pods -w, andkubectl describe pod <name>to confirm health; roll back withkubectl rollout undoif needed
Reference Guide
Section titled “Reference Guide”Load detailed guidance based on context:
| Topic | Reference | Load When |
|---|---|---|
| Workloads | references/workloads.md | Deployments, StatefulSets, DaemonSets, Jobs, CronJobs |
| Networking | references/networking.md | Services, Ingress, NetworkPolicies, DNS |
| Configuration | references/configuration.md | ConfigMaps, Secrets, environment variables |
| Storage | references/storage.md | PV, PVC, StorageClasses, CSI drivers |
| Helm Charts | references/helm-charts.md | Chart structure, values, templates, hooks, testing, repositories |
| Troubleshooting | references/troubleshooting.md | kubectl debug, logs, events, common issues |
| Custom Operators | references/custom-operators.md | CRD, Operator SDK, controller-runtime, reconciliation |
| Service Mesh | references/service-mesh.md | Istio, Linkerd, traffic management, mTLS, canary |
| GitOps | references/gitops.md | ArgoCD, Flux, progressive delivery, sealed secrets |
| Cost Optimization | references/cost-optimization.md | VPA, HPA tuning, spot instances, quotas, right-sizing |
| Multi-Cluster | references/multi-cluster.md | Cluster API, federation, cross-cluster networking, DR |
Constraints
Section titled “Constraints”MUST DO
Section titled “MUST DO”- Use declarative YAML manifests (avoid imperative kubectl commands)
- Set resource requests and limits on all containers
- Include liveness and readiness probes
- Use secrets for sensitive data (never hardcode credentials)
- Apply least privilege RBAC permissions
- Implement NetworkPolicies for network segmentation
- Use namespaces for logical isolation
- Label resources consistently for organization
- Document configuration decisions in annotations
MUST NOT DO
Section titled “MUST NOT DO”- Deploy to production without resource limits
- Store secrets in ConfigMaps or as plain environment variables
- Use default ServiceAccount for application pods
- Allow unrestricted network access (default allow-all)
- Run containers as root without justification
- Skip health checks (liveness/readiness probes)
- Use latest tag for production images
- Expose unnecessary ports or services
Common YAML Patterns
Section titled “Common YAML Patterns”Deployment with resource limits, probes, and security context
Section titled “Deployment with resource limits, probes, and security context”apiVersion: apps/v1kind: Deploymentmetadata: name: my-app namespace: my-namespace labels: app: my-app version: "1.2.3"spec: replicas: 3 selector: matchLabels: app: my-app template: metadata: labels: app: my-app version: "1.2.3" spec: serviceAccountName: my-app-sa # never use default SA securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 2000 containers: - name: my-app image: my-registry/my-app:1.2.3 # never use latest ports: - containerPort: 8080 resources: requests: cpu: "100m" memory: "128Mi" limits: cpu: "500m" memory: "512Mi" livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 15 periodSeconds: 20 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 10 securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: ["ALL"] envFrom: - secretRef: name: my-app-secret # pull credentials from Secret, not ConfigMapMinimal RBAC (least privilege)
Section titled “Minimal RBAC (least privilege)”apiVersion: v1kind: ServiceAccountmetadata: name: my-app-sa namespace: my-namespace---apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: my-app-role namespace: my-namespacerules: - apiGroups: [""] resources: ["configmaps"] verbs: ["get", "list"] # grant only what is needed---apiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: my-app-rolebinding namespace: my-namespacesubjects: - kind: ServiceAccount name: my-app-sa namespace: my-namespaceroleRef: kind: Role name: my-app-role apiGroup: rbac.authorization.k8s.ioNetworkPolicy (default-deny + explicit allow)
Section titled “NetworkPolicy (default-deny + explicit allow)”# Deny all ingress and egress by defaultapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: default-deny-all namespace: my-namespacespec: podSelector: {} policyTypes: ["Ingress", "Egress"]---# Allow only specific trafficapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-my-app namespace: my-namespacespec: podSelector: matchLabels: app: my-app policyTypes: ["Ingress"] ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 8080Validation Commands
Section titled “Validation Commands”After deploying, verify health and security posture:
# Watch rollout completekubectl rollout status deployment/my-app -n my-namespace
# Stream pod events to catch crash loops or image pull errorskubectl get pods -n my-namespace -w
# Inspect a specific pod for failureskubectl describe pod <pod-name> -n my-namespace
# Check container logskubectl logs <pod-name> -n my-namespace --previous # use --previous for crashed containers
# Verify resource usage vs. limitskubectl top pods -n my-namespace
# Audit RBAC permissions for a service accountkubectl auth can-i --list --as=system:serviceaccount:my-namespace:my-app-sa
# Roll back a failed deploymentkubectl rollout undo deployment/my-app -n my-namespaceOutput Templates
Section titled “Output Templates”When implementing Kubernetes resources, provide:
- Complete YAML manifests with proper structure
- RBAC configuration if needed (ServiceAccount, Role, RoleBinding)
- NetworkPolicy for network isolation
- Brief explanation of design decisions and security considerations