Job Description:
We are looking for a Senior DevOps Engineer to design, automate, and manage high-performance cloud infrastructure across multiple regions. You will own deployment pipelines, monitoring systems, security hardening, and reliability engineering for platforms handling real production workloads.
This role requires deep expertise in AWS, CI/CD, containerization, and modern observability stacks. If you thrive in fast-moving environments and build systems that scale cleanly, we want you.
Key Responsibilities:
Infrastructure & Architecture
-
Architect and manage scalable, secure AWS environments (EC2, ECS/EKS, RDS, CloudFront, WAF, S3, IAM, Route53).
-
Implement multi-region, high-availability designs with failover and disaster recovery strategies.
-
Design and maintain networking components including VPCs, subnets, gateways, firewalls, and load balancers.
CI/CD & Automation
-
Build, maintain, and optimize CI/CD pipelines (GitHub Actions, GitLab CI, or similar).
-
Automate infrastructure provisioning using IaC tools (Terraform, CloudFormation, Pulumi).
-
Implement automated testing, deployment workflows, and environment management.
Monitoring, Logging & Reliability
-
Deploy and tune observability stacks (CloudWatch, Grafana, Prometheus, ELK, Datadog).
-
Set up alerting, anomaly detection, and SLO/SLA metrics across all critical systems.
-
Perform capacity planning, performance optimization, and reliability engineering.
Security & Compliance
-
Enforce IAM best practices, network segmentation, secret management, and least privilege.
-
Integrate security scanning into CI/CD (SAST, DAST, container scanning, IaC linting).
-
Support NIST/SOC/ISO security initiatives (logging requirements, audit trails, hardening).
Containers & Orchestration
-
Deploy, scale, and optimize containerized applications (Docker, ECS, EKS, Kubernetes).
-
Maintain service mesh, ingress configurations, and container runtime security.
Collaboration & Leadership
-
Work closely with engineering teams to ensure deployment-ready architectures.
-
Mentor junior engineers and contribute to internal DevOps processes and playbooks.
-
Lead incident response, root-cause analysis, and reliability improvements.
Qualifications:
Required Skills
- 5+ years experience in DevOps, SRE, or Cloud Infrastructure Engineering.
-
Strong AWS expertise: EC2, ECS/EKS, VPC, IAM, RDS, S3, CloudFront, WAF, Route53.
-
Hands-on experience with CI/CD pipelines and automation tools.
-
Strong Terraform or CloudFormation experience.
-
Deep understanding of Linux systems, networking, and security fundamentals.
-
Expertise in Docker and container orchestration (ECS/EKS/Kubernetes).
-
Solid experience with monitoring tools (Grafana, Prometheus, Datadog, CloudWatch).
-
Strong scripting abilities (Python, Bash, Go, or similar).
-
Experience managing production systems at scale.
Preferred skills:
-
Experience with ML model serving (Triton, SageMaker, custom inference servers).
-
Experience with multi-region architectures.
-
Background with SOC 2, NIST 800-53, or ISO 27001 implementations.
-
Serverless experience (Lambda, Step Functions).
-
Performance benchmarking and load testing tools.

