Site Reliability Engineer

My client is India's largest omnichannel platform and multi-platform tech company with expertise in retail tech and products in AI, ML, big data ops, gaming crypto, image editing and learning space.

Title: Site Reliability Engineer

Roles & Responsibility:

What will you do?

- Run the production environment by monitoring availability and taking a holistic view of system health.


- Improve reliability, quality, and time-to-market of our suite of software solutions

- Be the 1st person to report the incident.

- Debug production issues across services and levels of the stack.

- Envisioning the overall solution for defined functional and non-functional requirements, and being able to define technologies, patterns and frameworks to realise it.

- Building automated tools in Python / Java / GoLang / Ruby etc.

- Help Platform and Engineering teams gain visibility into our infrastructure.

- Lead design of software components and systems, to ensure availability, scalability, latency, and efficiency of our services.

- Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs are met and driving Problem Management for permanent remediation.
 

- Participate in on-call rotation to ensure coverage for planned/unplanned events.

- Perform other task like load-test & generating system health reports.

- Periodically check for all dashboards readiness.

- Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results.

- Working with your SRE and Engineering counterparts for driving Game days, training and other response readiness efforts.

- Participate in the 24x7 support coverage as needed Troubleshooting and problem-solving complex issues with thorough root cause analysis on customer and SRE production environments

- Collaborate with Service Engineering organizations to build and automate tooling, implement best practices to observe and manage the services in production and consistently achieve our market leading SLA.

- Improving the scalability and reliability of our systems in production.

- Evaluating, designing and implementing new system architectures.

Some specific Requirements:

- B.E./B.Tech. in Engineering, Computer Science, technical degree, or equivalent work experience

- At least 3 years of managing production infrastructure. Leading / managing a team is a huge plus.

- Experience with cloud platforms like - AWS, GCP.

- Experience developing and operating large scale distributed systems with Kubernetes, Docker and and Serverless (Lambdas)

- Experience in running real-time and low latency high available applications (Kafka, gRPC, RTP)

- Comfortable with Python, Go, or any relevant programming language.

- Experience with monitoring alerting using technologies like Newrelic / zybix /Prometheus / Garafana / cloudwatch / Kafka / PagerDuty etc.

- Experience with one or more orchestration, deployment tools, e.g. CloudFormation / Terraform / Ansible / Packer / Chef.

- Experience with configuration management systems such as Ansible / Chef / Puppet.

- Knowledge of load testing methodologies, tools like Gating, Apache Jmeter.

- Work your way around Unix shell.

- Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux / CentOS

- A focus on delivering high-quality code through strong testing practices.

Place of work

Antal International
Mumbai
India

Employer profile

In 1993, a visionary in London set out to create a better way to connect talented individuals with job opportunities. Fast forward 30 years, and that vision has grown into a worldwide network of over 800 consultants spanning 32 countries. As one of the top recruitment companies, we specialize in IT, Accountancy, Sales and Marketing, Engineering, and more, offering game-changing recruitment consultancy and talent acquisition services to companies of all sizes. Join us on this journey of growth! With our personalized approach to the hiring process, we aim to make finding the right job a positive and stress-free experience for you as a candidate. We understand that job searching can be overwhelming, so we offer our expertise every step of the way to help you navigate the process with ease. Our goal is to empower you to achieve your career aspirations and land the perfect job! At our core, we believe that our success is directly tied to the success of the candidates we work with!

Local radius

  • Navi Mumbai
  • Thāne
  • Borivli
  • Airoli
  • Powai
  • Artist Village
  • Mumbai
  • Mumbai
  • Mumbai
  • Mumbai



Job ID: 8434996 / Ref: 91caae7f91c2cb286fadca3136db9a55

Quick application

If the job appeals to you, don't hesitate and send in your application immediately - it might just be the dream job you're looking for.

Antal International

Employees
201-500
Industry
Other industries
Contact