Infrastructure / Automation / Middleware Engineer
Location: Remote (Worldwide)
Employment Type: Full-Time
Company: The Lustre Collective
About The Lustre Collective
The Lustre Collective is dedicated to advancing the Lustre file system, the leading open-source, parallel distributed file system designed for large-scale cluster computing. With deep expertise in scalable storage solutions, our team supports cutting-edge AI and HPC environments, on-premises and in the cloud. We are committed to vendor-neutral development, drawing inspiration from communities like OpenSFS and EOFS to drive innovation in Lustre technology.
As a tight-knit company with a fun, collaborative culture, we offer the unique opportunity to work alongside some of the smartest and most qualified file system and kernel engineers in the world. Our team thrives on innovation, open-source passion, and a supportive environment where ideas flow freely and work-life balance is prioritized. If you're excited about contributing to groundbreaking storage solutions in a dynamic, global team, we'd love to hear from you!
Job Overview
We are seeking a talented Infrastructue / Automation / Middleware Engineer to join our remote team. This role involves developing middleware such as API interfaces, monitoring systems, and high availability solutions to support Lustre deployments in distributed, high-performance environments.
Key Responsibilities
- Design and implement API interfaces and integration middleware for Lustre with HPC and cloud systems.
- Develop monitoring and logging solutions to track performance and health of Lustre deployments.
- Ensure high availability through clustering, failover mechanisms, and load balancing.
- Troubleshoot middleware issues in large-scale clusters, focusing on reliability and scalability.
- Collaborate on vendor-neutral innovations and AI-driven optimizations.
- Contribute to open-source middleware repositories and documentation.
- Participate in code reviews and team knowledge-sharing sessions.
Required Qualifications and Skills
- Strong proficiency in C, Rust, and Python for middleware development.
- Experience with middleware technologies like message queues (e.g., Kafka, RabbitMQ) and API frameworks.
- Knowledge of high availability practices, monitoring tools (e.g., Prometheus, ELK stack), and security.
- Familiarity with distributed systems and storage solutions like Lustre.
- Proficiency in Linux/Unix and networking protocols.
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
- Excellent problem-solving and collaboration skills.
Preferred Qualifications
- Experience supporting storage deployments in HPC or AI contexts.
- Contributions to open-source middleware or integration projects.
- Knowledge of containerization and cloud integrations.
- Background in performance tuning and fault tolerance.
What We Offer
- A fully remote position with flexible hours to accommodate global time zones.
- Competitive salary and benefits package available
- The chance to work in a fun, tight-knit culture where creativity and humor are encouraged—think virtual team-building events, hackathons, and a supportive community of experts.
- Opportunities for growth, including conferences, training, and leadership in open-source initiatives.
- A commitment to diversity, equity, and inclusion, welcoming candidates from all backgrounds worldwide.
If you're a passionate engineer ready to advance the future of Lustre and collaborate with world-class talent, apply today!
Contact Us to Apply