Summary
Overview
Work History
Education
Skills
Certification
Awards
Timeline
Generic
Manpreet Singh

Manpreet Singh

Platform Engineer
Tokyo

Summary

Platform engineering professional with deep understanding of system architecture and cloud environments. Possesses history of driving infrastructure improvements and automating processes to enhance operational efficiency. Focused on collaborative team efforts and adaptable to dynamic project requirements, ensuring dependable and flexible performance. Skilled in infrastructure as code, containerization, and cloud services, valued for reliability and problem-solving abilities.

Overview

14
14
years of professional experience
7
7
Certifications

Work History

Platform Engineer

PayPay Card Corporation
Tokyo, Japan, Tokyo
02.2025 - Current
  • Engineered and deployed CI/CD pipelines using GitHub Actions Workflows to automate k6 performance and load testing on ECS services, proactively identifying bottlenecks and ensuring service scalability.
  • Established robust monitoring for cluster scaling by implementing Prometheus alerts specifically for EKS Karpenter, ensuring rapid response to resource provisioning issues.
  • Developed a one-click automation tool for project scaffolding and microservices deployment on EKS, significantly reducing setup time for development teams.
  • Designed and implemented the Knowledge Engineering Database (KEDB), improving issue resolution efficiency and self-service support.
  • Automated the knowledge engineering process by creating standardized GitHub Issue templates that automatically trigger KEDB updates or actions.
  • Managed and configured critical data services, including AWS KMS for encryption, AWS Secrets Manager, and multiple databases (DocumentDB, DynamoDB, Redis, MySQL).
  • Enhanced developer experience and productivity by creating a standardized Backstage template for internal tooling and service cataloging.
  • Fostered a culture of knowledge sharing by leading technical presentations and training sessions for cross-functional teams on infrastructure best practices.
  • Provided Tier 3 support and rapidly resolved complex infrastructure issues, minimizing system downtime and preserving high service reliability.
  • Drove Service Reliability Improvement by defining key SLIs and establishing strict 99.99% SLOs for 5 critical customer-facing microservices, maintaining target availability.
  • Proactively Managed Risk by implementing a multi-service Error Budget system, resulting in a 40% reduction in high-severity (Sev-1/Sev-2) production incidents within the first quarter of deployment.
  • Implemented Prometheus/Grafana visualization for real-time SLI tracking against SLOs, which reduced the Average Time-to-Detect (MTTD) for critical incidents by 25%.
  • Automated pipeline governance using the Error Budget metric, integrating the metric into CI/CD to automatically halt non-essential feature deployments when the budget dropped below 5%.

Senior Site Reliability Engineer

IBM India Software Labs
Pune, India
11.2023 - 02.2025
  • Made python-based AWS monthly cost report exporter, and this script was made part of Jenkins job which is scheduled weekly to generate reports for dev and prod QRadar spending in AWS and these reports are uploaded to S3 bucket.
  • Made python script which migrates single tenant's events data from old clickhouse backup file to new clickhouse. It downloads tenant backup inventory and reloads events per day from each shard.
  • Created python script to check the git-release from vault for all clusters that are not flagged as decommissioned.
  • Made script to monitor RabbitMQ major version upgrade status. During the RabbitMQ major version upgrade we can run this script and it reports if RabbitMQ brokers are upgraded and running with expected version.
  • Provide technical leadership and mentorship to junior members of Site Reliability Engineering team, fostering a culture of collaboration, innovation, and continuous learning.
  • Maintain comprehensive documentation of infrastructure configurations, deployment procedures, and incident response processes, facilitating knowledge sharing and onboarding of new team members.
  • Achieved a 22% reduction in non-production cloud spend by implementing scheduled decommissioning policies and leveraging EKS Karpenter to optimize cluster resource utilization during off-peak hours.
  • Shifted security left by implementing automated static analysis (SAST) and container vulnerability scanning into the CI/CD pipelines, blocking 15% of high-risk security flaws from reaching staging environments.
  • Standardized secret management and rotation across 40+ microservices using AWS Secrets Manager/KMS and Terraform, achieving 100% compliance with internal security audit requirements.

Cloud Support Engineer 2 - DevOps

Amazon Web Services
08.2021 - 10.2023
  • Primarily working on AWS managed Kubernetes and Docker services (EKS and ECS) but other services as well like Fargate, CloudMap, Xray, ECR etc.
  • Reduced downtime for clients by proactively monitoring and troubleshooting cloud-based issues.
  • Conducted training sessions for junior team members and new hires, fostering a culture of continuous learning and skills development.
  • Hands on experience with troubleshooting Kubernetes clusters and other AWS integrated services with k8s like AWS Load Balancers, EC2, S3, VPC, and IAM.
  • Created python script for taking backup of 5 microservices, which were handled via Custom Resource Definition in OpenShift.
  • Created python script for running ROSA (OpenShift on AWS) clusters version upgrade and checking status on successful completion of same.
  • Implemented AWS Load Balancer Controller Installation on AWS EKS Cluster with Terraform.
  • Installation of cluster autoscaler on EKS cluster with Terraform.
  • Setup of AWS EKS monitoring and logging using kubectl and terraform.
  • AWS EKS upgradation with zero downtime using Terraform.
  • Provisioning of multiple RDS instances using Terraform.
  • Setting up VPC peering connection between two VPCs using Terraform.
  • Creating ECS clusters with fargate tasks using terraform.
  • Hands on experience in troubleshooting and configuring Fully managed and Hybrid Kubernetes environments like EKS-Anywhere and ECS-Anywhere.
  • Worked on Kubernetes and its associated Open- Source projects like AWS Load balancer controller, AWS CSI EBS and EFS volume drivers, Cluster Autoscaler, VPA and HPA.
  • Troubleshooting Kubernetes cluster related issues with worker nodes, IAM Authorizations, RBAC, Service Accounts and Implementing IRSA, Optimizing and reserving kubelet resources via userdata.
  • Experience in working on AWS and its services like AWS IAM, VPC, EC2, ECS, EBS, EFS, RDS, S3, Lambda, ELB, Auto Scaling, Route 53, Cloud Front, Cloud Watch, Cloud Trail, SQS, and SNS.
  • Good understanding of OSI Model, TCP/IP protocol suite (IP, ARP, TCP, UDP, SMTP, FTP, and TFTP).
  • Experience in all aspects of software life cycle like Build/Release/Deploy with AWS tools like CodeBuild and Code Deploy and open-source tools like Jenkins.
  • Assisting customers with configuring and managing Kubernetes cluster control plane and data plane and handling escalations.
  • Actively involved in improving documentation, writing Kumo Articles.
  • Raising service level issues to development teams and raising public facing GitHub issues.
  • Identified issues, analyzed information and provided solutions to problems.

Technology Analyst

Airbus India Pvt Ltd
12.2019 - 08.2021
  • Testing and cost optimization of Abaqus application on AWS and on-premise servers.
  • Reduced technical debt by refactoring legacy code and implementing modern development methodologies.
  • Resolved complex technical issues through rigorous troubleshooting and root-cause analysis, minimizing downtime and disruptions to business operations.
  • POC for NFS4 ACLs.
  • Created sanity script using ansible adhoc commands.
  • Resolved issue of VDIs not able to join AD by creating ansible playbook.
  • Resolved blackscreen issues in RHEL 7.7 VDIs.
  • Created shell/bash script for printing files from linux servers.
  • Worked on GIT to manage source code for applications.
  • Defining, building and automating CI/CD build pipeline using Jenkins.
  • Deploying applications to VMs and VDI nodes.
  • OpenScap(Security Tool) Configuration.
  • SSSD Configuration and rollout to Devel, Val and Prod nodes.
  • Patching Servers via Ansible tower.
  • Implemented hpn-ssh for scientific computing environment.
  • Created a tool using shell scripting to disable ibus daemon and fast the processing of Hyperworks application.
  • Responsible for identifying, troubleshooting and resolving problems with the build process using Jenkins and ensures that the release has been accepted by all parties.
  • Periodically monitored logs for optimal performance in Splunk.
  • Containerizing applications using docker.
  • Configuring monitoring of servers in Splunk.
  • Worked on EC2, VPC, S3, IAM, Route53 services in AWS.
  • Worked closely with Developers, QA and project management for smooth scheduled releases.
  • Participated in application builds and deployments to Dev, QA, Preprod and Prod environments.
  • Involved in release process and deployed applications (WAR, EAR and JAR).
  • Troubleshoot build issues and coordinate with development team on resolving build issues.
  • Maintain knowledge base to track known issues and their resolutions.
  • Updating release note for every release.
  • Involved in documentation of all processes and procedures.
  • Joining into bridge calls and providing necessary information to teams are involved.

Technical Services Engineer

Fujitsu Consulting India Pvt. Ltd.
01.2016 - 12.2019
  • Monitored automated build and continuous software integration process to drive build/release failure resolution.
  • Provide End-to-End support of Linux servers to multiple clients as part of shared support.
  • Fostered strong relationships with clients through excellent communication skills when addressing their technical inquiries or concerns.
  • Contributed to successful project completions by serving as reliable point of contact for technical expertise.
  • Researched and identified new technologies and tools helping to grow agile development environment.
  • Installing and configuring Docker and running containers.
  • Installing, configuring and maintaining on-premise Kubernetes cluster.
  • Maintained security and mitigated threats as new ones were identified.
  • Built multiple server systems and security hardening.
  • Installation, configuration and maintenance of Linux OS and Open-source applications.
  • Experience in building Production Servers and validation for new build releases.
  • Managing Virtualisation Environment with OVM, Hyper-V & VMWare.
  • Perform Deployment and patch update to all Linux servers.
  • Perform OS patching, release updates and vulnerability fix.
  • Performance analysis and troubleshooting.
  • Managing storage volumes SAN (FC, ISCSI) and NAS.
  • Used LVM extensively for creating LUNs, building volume groups, and creating and maintaining file systems.

Senior System Engineer

ATOS India Private Ltd
11.2014 - 01.2016
  • Used LVM extensively for creating LUNs, building volume groups, and creating and maintaining file systems.
  • Worked with stakeholders to determine implementation and integration of system-oriented projects.
  • Reduced downtime for critical systems by proactively identifying potential issues and conducting preventative maintenance.
  • Managing storage volume – LVM with Red Hat Multipath.
  • Administration of NAS file for Atos customized SAP and application environment.
  • New server onboarding and server decommissioning.
  • Configured and implemented the automation tool in the client environment to reduce the manual and repeated works.
  • Works on high priority incidents and escalated incidents from L2.
  • Performed troubleshooting for various problems, logging calls with vendors for hardware issues.
  • Knowledge in ITIL roles and responsibilities.

Unix Administrator

HP India Sales Pvt. Ltd (On Payroll of IT Source)
05.2012 - 11.2014
  • Streamlined workflow processes, automating repetitive tasks with custom shell scripts and tools.
  • Improved system performance by optimizing Unix server configurations and streamlining processes.
  • Administration on HPUX 10i and 11i O.Ss.
  • Maintenance of HP-UX (IA & PA-RISC) and Superdome (SD32) servers.
  • Troubleshooting O/S (UNIX) related problems.
  • Reconfiguration of kernel parameters.
  • Advanced User/Group Administration.
  • Served as an escalation point for complex technical issues related to Unix administration, providing expert guidance to resolve incidents quickly while minimizing impact on endusers.
  • Reduced downtime by implementing effective backup strategies and disaster recovery plans for critical systems.

Education

B.E - Electronics & Communication

Institute of Information Technology & Management
India
07-2011

Skills

AWS Cloud

Redhat Openshift, ROSA

Ansible, Ansible Tower,

Git, GitHub Actions, Jenkins

Prometheus, Grafana, AWS Xray

Docker, Kubernetes, EKS, ECS, ECR, Fargate

Bash Scripting, Python, Terraform

ITIL, Agile Methodology, Micro-services Arch

OpenAI (GPT-4 / GPT-4o / APIs) Anthropic Claude, GitHub Copilot, Dify (LLM application development platform)

Certification

AWS Certified Solutions Architect Associate

Awards

Collaboration Hero Award - Airbus, DA Vinci Award for most innovations - Airbus, Spot Award for driving innovations - Airbus, Accredited Champion of service excellence - Fujitsu

Timeline

Platform Engineer

PayPay Card Corporation
02.2025 - Current

Senior Site Reliability Engineer

IBM India Software Labs
11.2023 - 02.2025

Cloud Support Engineer 2 - DevOps

Amazon Web Services
08.2021 - 10.2023

Technology Analyst

Airbus India Pvt Ltd
12.2019 - 08.2021

Technical Services Engineer

Fujitsu Consulting India Pvt. Ltd.
01.2016 - 12.2019

Senior System Engineer

ATOS India Private Ltd
11.2014 - 01.2016

Unix Administrator

HP India Sales Pvt. Ltd (On Payroll of IT Source)
05.2012 - 11.2014

B.E - Electronics & Communication

Institute of Information Technology & Management
Manpreet SinghPlatform Engineer