Platform Engineer
<div class="show-more-less-html__markup show-more-less-html__markup--clamp-after-5 relative overflow-hidden"> <p>Zalion is on a mission to eliminate repetitive procurement work through agentic AI. We’re building autonomous agents that operate deep within enterprise procurement — navigating messy data, legacy systems, and complex workflows to deliver real impact.</p><br/><p><strong>Join us early and help define how enterprise AI is done right.</strong></p><br/>Tasks<br/><p>You will:</p><br/><ul><br/><li><strong>Own our platform foundations end-to-end</strong> — from AWS architecture and IaC to CI/CD, observability, and incident readiness.</li><br/><li>Build and evolve <strong>secure, scalable AWS infrastructure</strong> (networking, compute, storage, IAM) optimized for reliability and cost.</li><br/><li>Design and maintain <strong>CI/CD pipelines on GitHub</strong> that are fast, repeatable, and developer-friendly (clear feedback loops, safe deploys, strong defaults).</li><br/><li>Define and operate infrastructure using <strong>Terraform</strong> — with clean modules, sensible standards, and automated validation.</li><br/><li>Improve <strong>developer experience</strong> through golden paths: templates, self-service environments, paved roads for deployments, and internal tooling that removes friction.</li><br/><li>Drive <strong>availability, scalability, and resilience</strong>: deployment strategies, rollbacks, capacity planning, DR thinking, and performance tuning.</li><br/><li>Implement pragmatic <strong>security-by-default</strong>: least privilege IAM, secrets management, secure supply chain, and guardrails that enable speed without compromising safety.</li><br/><li>Establish and refine <strong>observability and reliability practices</strong> (SLOs/SLIs, monitoring, alerting, postmortems, runbooks) that scale with the team.</li><br/><li>Partner closely with product engineering to reduce operational load and keep delivery velocity high as Zalion grows.</li><br/></ul><br/>Requirements<br/><ul><br/><li>Strong experience as a <strong>Platform / DevOps / Site Reliability Engineer</strong> in product teams shipping to production.</li><br/><li>Deep practical knowledge of <strong>AWS</strong>: networking, IAM, security controls, and designing for failure.</li><br/><li>Hands-on expertise with <strong>Terraform</strong>: modules, state strategy, DRY patterns, environment separation, and automated reviews.</li><br/><li>Solid CI/CD engineering experience with <strong>GitHub</strong>: pipeline design, artifact/versioning, deployment safety, and fast feedback loops.</li><br/><li>A strong mindset for <strong>reliability and operability</strong>: you think in failure modes, automation, and measurable outcomes (SLOs).</li><br/><li>Security awareness and discipline: you build <strong>guardrails</strong> that make the secure path the easy path.</li><br/><li>A <strong>builder mindset</strong>: you ship improvements, measure impact (lead time, deploy frequency, MTTR), and iterate.</li><br/><li>Comfort with <strong>ambiguity and ownership</strong>: you proactively identify platform bottlenecks and fix them without waiting for perfect specs.</li><br/><li><strong>4+ years</strong> experience in relevant roles (startup/scale-up experience is a plus).</li><br/></ul><br/>Benefits<br/><ul><br/><li>Build the platform behind agentic AI systems that run in real enterprise environments</li><br/><li>Massive autonomy, zero bureaucracy</li><br/><li>Immediate impact — your work accelerates every engineer and every release</li><br/><li>Modern stack, no legacy constraints</li><br/><li>Competitive salary + meaningful equity</li><br/><li>High-end equipment</li><br/></ul><br/><p><strong>🛠️ Tech Stack You’ll Work With</strong></p><br/><ul><br/><li><strong>AWS</strong> (core services; compute, networking, IAM, logging/monitoring, managed data services)</li><br/><li><strong>Terraform</strong> (modules, workspaces, validation, state management)</li><br/><li><strong>GitHub</strong> (Actions, CI/CD workflows, checks, release automation)</li><br/><li>Containers orchestration (e.g., <strong>ECS/Fargate</strong> and/or Kubernetes depending on evolution)</li><br/><li>Observability tooling (metrics, logs, tracing; e.g., Grafana/Prometheus/OpenTelemetry and friends)</li><br/><li>Security tooling (SAST/DAST, dependency scanning, secrets scanning, policy as code</li></ul> </div>