About Us:
At Apolo, we're committed to simplifying AI/ML operations for organizations. By addressing the deployment challenges of AI/ML in varied environments, we provide cost-effective and hassle-free solutions. Our managed services and comprehensive tools allow businesses to focus on their core objectives, ensuring seamless AI integration and innovation without the operational complexity.
The Role:
We are looking for an Infrastructure Engineer who will be crucial in managing our product infrastructure. This role requires technical expertise, leadership qualities, and a proactive mindset to ensure our systems are secure, efficient, and in line with our product goals. Ideal candidates are resourceful, excel in problem-solving, and capable of working autonomously with minimal supervision
Requirements
● Extensive knowledge and hands-on experience with Kubernetes, including overall cluster administration.
● Proficiency with cloud service providers (AWS, GCP, Azure).
● Experience in managing bare metal infrastructure.
● Proficiency in Terraform for infrastructure automation.
● Expertise in Helm for package management.
● Strong foundation in Linux system administration, with skills in performance tuning, troubleshooting, and understanding operating system internals.
● Solid networking knowledge, including TCP/IP, DNS, load balancing, and firewall configurations, to ensure secure and efficient network operations.
● Expertise in container engines such as containerd and Docker, with practical experience in configuring, managing, and optimizing containerized environments.
● Proficiency in CI/CD practices, particularly with GitHub Actions.
Responsibilities:
● Oversee infrastructure across cloud, on-premise, and bare metal environments.
● Manage resources in multiple cloud service providers (AWS, GCP, Azure).
● Enhance observability across all environments.
● Implement and integrate solutions that align with our product goals.
● Streamline provisioning pipelines, focusing on the automation of manual processes.
● Apply Infrastructure as Code (IaC) principles using tools like Terraform and Helm.
● Facilitate certification processes and maintain compliance with industry standards.
● Implement robust security hardening practices.
Desirable Skills:
● Experience with CNI, Ingress Controllers, Service Meshes, Gateways.
● Experience with CSI, NAS, NFS and other related storage technologies.
● Prometheus / Thanos, Grafana and related tools.
● Proficient in Python for scripting and automation.
Benefits
What We Offer:
● Work remotely, ensuring time zones align for effective collaboration.
● Shape the product's direction and success by taking ownership of essential components.
● Solve complex and innovative challenges.
● Join a supportive and dynamic team environment.
● Receive a competitive salary and benefits package