Contract Duration: 6 months contract
What You’ll Do:
Work across the full stack: from host memory and device behavior in Linux, to node agents and container runtime interactions, to Kubernetes resource orchestration, controllers, and cluster-level lifecycle management. This is a hands-on systems role for an engineer who is comfortable working across boundaries that are often split between kernel and cloud platform teams.
• Architect and implement end-to-end systems software for advanced platform and memory technologies across Linux and Kubernetes environments.
• Design solutions that span:
- Linux kernel and host OS integration
- user-space system services and node agents
- container runtime / kubelet interaction points
- Kubernetes APIs, controllers, CRDs, and orchestration workflows
• Develop software for resource discovery, reservation, activation, resize, release, rollback, and reconciliation in clustered environments.
• Build and maintain Kubernetes operators, controllers, and automation services using Go, client-go, controller-runtime, and related frameworks.
• Collaborate with Linux, firmware, and hardware teams to translate low-level platform capabilities into safe, observable, Kubernetes-manageable workflows.
• Work on Linux-side integration for areas such as memory lifecycle, hotplug behavior, NUMA awareness, cgroups, device/resource management, and system telemetry.
• Define interfaces between host software and cluster control-plane components, including state models, failure handling, and recovery behavior.
• Develop node-local software and control-plane services that coordinate host state, platform services, and Kubernetes objects.
• Drive architecture for reconciliation, idempotency, concurrency control, and fault recovery across distributed components.
• Create observability for controller behavior, node readiness, resource lifecycle progress, and failure analysis.
• Partner with internal teams across kernel, systems architecture, firmware, validation, and platform software to bring proof-of-concept software into robust internal infrastructure.
• Support bring-up, debugging, validation, and performance tuning in lab and pre-production environments. |