Call Me :
+1 510-838-0007

AVAILABLE FOR :

FULLTIME ROLES ONLY

Name

  • Sukhdeep Singh Kohli

Sex

  • Male

Address

  • Concord, California, USA

Email Address

Work Authorization

  • H-1B (Green Card Status : I-140 Approved)
Smiley face
I'm a competent and experienced Site Reliability engineer who's managed infrastructure and performance for high-traffic websites. I have added significant value to my employers by reducing downtimes and increasing performance.

I have detailed understanding of every area of Internet infrastructure and am always curious to learn new things. My areas of interest are cloud, automation and DevOps. Coming from the operations background, I am fascinated by the migration to the cloud. I have decided to focus on AWS as a cloud platform that I am going to master, since AWS is a rich and mature platform. I am especially interested in AWS migration strategies and using cloud to reduce costs and increase reliability.

Personally, I like to explore new places, hang out with friends, go out to restaurants, bars, malls, clubs, museums etc. I like to have a good time wherever I am and I love to laugh and make others laugh. In my time alone I enjoy keeping up with the news, reading all kinds of latest trends. Outside of work, My daily routine: Get up. Be awesome. Go back to bed.

My Skill
Kubernetes / EKS
75%
Amazon AWS
75%
Linux
70%
Docker (Containers)
70%
DevOps
65%
Python Scripting
60%
Terraform
70%
CI / CD
50%
Monitoring
60%
HP Loadrunner
80%
Apache jMeter
70%
jProfiler
65%
Dynatrace
60%
PHP
70%
CakePHP(MVC)
60%
MySQL
70%
HTML
80%
CSS
80%
Javascript/jQuery
75%
Photoshop
65%
Education

Master of Computer ApplicationsPunjab Techincal University, Jalandhar

July 2007 - June 2010

Three-year Master of Computer Application includes classes and examinations in Algorithms & Data Structures, Software Engineering, Computer Programming, Computer Architecture, Artificial Intelligence, Database Management Systems, Operating Systems and other related subjects.

Bachelor of CommercePunjab University, Chandigarh

July 2004 - March 2007

Three-year Bachelor of Commerce includes classes and examinations in Entrepreneurship, Economics, Accounting, Company Law and Auditing, Labor Laws, Banking, Insurance, Direct and Indirect taxes and other related subjects.

Professional Experience

Staff Site Reliability Engineer

June 2018 - Present

OKTA - TECHINICAL OPERATIONS (SAN FRANCISCO, CA, USA)

Designing, building, running and monitoring Okta’s production infrastructure. Responding to production incidents and determining how we can prevent them in the future. Triaging and troubleshooting complex production issues to ensure reliability and scalability. Identifying and automating manual processes. Developing and maintaining technical documentation, runbooks and procedures. Supporting a 24x7 online environment as part of an on-call rotation.

Responsibilities


  • Okta provides a highly critical cloud Identity Management service where the focus on 24x7 availability and security are paramount
  • Provisioning and maintaining high availability of all services hosted on AWS Cloud - Specific technologies which include EC2, ECS, VPC, S3, Load Balancers
  • Writing/Troubleshooting CHEF Cookbooks as CHEF is used for Configuration management and making sure Infrastructure is consistent across all environment.
  • Writing playbooks in Ansible for deployments since deployments have complex choreography given SLA for service is 99.999% uptime
  • Containerization of applications using Docker and using AWS ECS (Elastic Container Service) for its orchestration.
  • Capturing and provisioning new Infrastructure resources in Terraform so have repeatable method for creating/updating environments
  • Supporting a 24x7 online cloud service as part of On Call Rotation. Using best judgement to troubleshoot production incidents or escalating when we need more eyes and hands on outage or potential customer impact
ToolsChef, Ansible, Terraform, Docker, Elastic Search, Zabbix, Redis, MySQL
EnvironmentAmazon Web Services (AWS) / Linux

Cloud DevOps Engineer / Performance Engineer

Mar 2016 - May 2018

VERIZON WIRELESS – MESSAGING TEAM (WALNUT CREEK, CA, USA)

Worked on the Message+ application, a messaging platform with over 30 Million Subscribers. The App provides rich experience for over the top Voice Calls, Video Calls, Multi Line, Multi Device, Group Messaging, Send/Receive Money, Location, Giftcards etc and many other features which are not possible using traditional SMS/MMS protocols. Think of this as iMessage outside Apple supported for rich experience on Android, iOS and Web. This involved automating AWS based SaaS/Cloud, distributed micro-services architecture.

Responsibilities


  • Experienced in strategy and practical implementation of AWS Cloud-Specific technologies which include EC2, EBS, S3, VPC, ELB, ECS, Cloud Watch.
  • Experienced in Configuration Management tool - Ansible for provisioning AWS Resources
  • Migrating the existing monolith legacy system over to Cloud as microservices and make sure it aligns with “12 factor” methodology and geared for scalability.
  • Writing playbooks in Ansible for automating repetitive System Administration tasks and installation/upgradation/configuration of software packages
  • Containerization of applications using Docker and using AWS ECS (Elastic Container Service) for its orchestration.
  • Familiar with Continuous Integration process using Bamboo, Maven, GIT
  • Writing scripts in Shell/Bash and Python to automate Health Checks and other System tasks
  • Setup Monitoring tools such Elastic Search, Logstash and Kibana and Graphana.
  • xtensive experience in Performance tuning for Java Applications using APM Profiling tools like Dynatrace/jProfiler, Heap Analysis, Garbage collection overhead and analyzing thread dumps. Identifying Bottlenecks practicing best practices for Performance Engineering for end to end performance testing and analysis.
ToolsAnsible, Docker, Bamboo, Apache Kafka, ELK(Elastic Cache, Logstash, Kibana), Graphana, MongoDB, HBASE
EnvironmentAmazon Web Services (AWS) / Linux

Lead Performance Engineer

May 2015 - Feb 2016

GAP INC – NEXT GENERATION POS (SAN FRANCISCO, CA, USA)

Worked on mobile checkout system which gave each store associate the ability to help customers in checking out without waiting in queues at check-out counters. This helped in enhancing the customer experience and hence increasing the revenue for the stores

Responsibilities


  • As Performance Engineer, my job was to fine tune the system and provide recommendations so that the systems are stable and healthy on crucial days like Black Friday, Thanks Giving or Cyber Monday
  • Detect Performance degradation/issues with regards to CPU, MEMORY, STORAGE and NETWORK
  • Expert in simulating traffic and fine-tuning IBM Sterling Order Management System to handle capacity
  • Experience on HP LoadRunner components including VuGen, Controller, Load Generator, Agents, Analysis and Monitoring tools
  • Familiar with Electric Commander, Jenkins, GIT
  • Using CHEF for configuration management
ToolsOpenStack, Electric Commander/Jenkins, GIT, HP Loadrunner, Apache jMeter, Graphite
EnvironmentOpen Stack / Linux

Senior Performance / Capacity Engineer

Mar 2011 - May 2015

BESTBUY INC – DOTCOM, STORE, MOBILE (MINNEAPOLIS, MN, USA)

Worked in a central team meant for all performance and infrastructure related services. My role was in checking for Holiday readiness (Black Friday, Cyber Monday, Christmas, etc.) for system performance and scalability across all Bestbuy verticals like Bestbuy.com, BestBuy Retail Stores, BestBuy Mobile, GeekSquad etc

Responsibilities


  • Senior Performance Engineer leading the onshore team to ensure System scalability and reliability at Peak loads for entire BestBuy backend
  • Helping the Business to understand and mitigate risk and report for Holiday Readiness.
  • Managed the infrastructure capacity and Administration of core UNIX/Linux systems
  • Developed many Bash and Python scripts to automate routine tasks and reporting
  • Wide Range of Performance Testing Experience for applications/customizations ranging from IBM Sterling Order Management System, Oracle EBS, SAP, Oracle ATG,Retail Management System, Mobile Applications, Web Services (SoapUI) etc
  • Provided key metrics like TPS/RPS, Throughput, 95% Percentile, Average Response time and resource utilization patterns of different application layers
  • Interaction with application designers and business analysts to gather NFRS (Non Functional Requirements Specifications), create performance testplan/scripts/Scenarios and creating results reports.
  • Extensive experience in Performance tuning for Java Applications using APM Profiling tools like Dynatrace/jProfiler, Heap Analysis, Garbage collection overhead and analyzing thread dumps
ToolsSterling Order Management System, HP Sitescope, Nagios, Dynatrace
EnvironmentWindows / Linux

Performance Test Engineer

July 2010 - May 2011

AT&T INC – B2B BUSINESS PREMIER


Responsibilities


  • Senior Performance Analyst for AT&T B2B Business Premier
  • Interaction with application designers and business analysts to gather NFRS (Non-Functional Requirements Specifications), create Performance Scenarios and creating results reports.
  • Performance Benchmark Tests, Normal Load, Peak Load, Stress, Scalability and Endurance Tests.

DevOps & PHP Developer

Jan 2010 - July 2010

NET SOLUTIONS - MULTIPLE PROJECTS


Responsibilities


  • Developing Websites using PHP MVC Framework such as CakePHP
  • Certified Microsoft Certified Solutions Developer - Programming in HTML5 with JavaScript and CSS3 having strong experience in developing Websites in both CorePHP using xHTML, CSS, Javascript/jQuery
  • Creation of shell scripts to automate tasks and perform basic monitoring

Awards and Achievments

• Awarded "Accenture Celebrates Excellence Award" for three consecutive years. Received by < 3% Employees in Organization
• Responsible for leading Engineering Team for Large US Retail Client which improved performance, reduce downtime and saved the company approximately $500K.
• Created various assets which helped to reduce monotonous work by 80% and give more time
• Won various National/State Level Inter-College Tech Competitions including Microsoft Go Alive Challenge 2008 during college days

Certifications

AWS Certified Developer - Associate

Verification Link

AWS Certified Solutions Architect - Associate

Verification Link

AWS Certified SysOps Administrator - Associate

Verification Link

Red Hat Certified Specialist in Ansible Automation

Verification Link

Red Hat Certified Engineer

Verification Link

Red Hat Certified System Administrator

Verification Link

CKA: Certified Kubernetes Administrator

Verification Link

Hobbies / Interests

• Photography
• Travelling - Exploring new places
• Adventure Sports
• Reading - Technology Blogs/Forums, Latest Trends