Consulting Site Reliability Engineer
HCA, Hospital Corporation of America - Alexandria, TN

Consulting Site Reliability Engineer

card illustration
Oops! This job has expired, but don’t worry.
Explore other exciting job listings and take the next step in your career journey!
289
Site Reliability Engineer
jobs
show me
107
jobs in
Alexandria, TN
show me
22
jobs at
HCA, Hospital Corporation of America
show me

Job

Description

Salary

Skills

Benefits

Job Description

_HCAHealthcare ITG_

/Please click the link above to Watch our Identity Video to get a closer look at who we are and what we do!/ // Do you want to be a part of a family and not just another employee? Are you looking for a work environment where diversity and inclusion thrive? Submit your application today and find out what it truly means to be a part of a team. // You contribute to our success. Every role has an impact on our patients’ lives and you have the opportunity to make a difference. We are looking for dedicated professional like you to be a part of our team. Join us in our efforts to better our community!

At HCA Healthcare, you have options. You can choose from a variety of benefits to create a customizable plan. You have the ability to enroll in several medical coverage plans including vision and dental. You can even select additional al la carte benefits to meet all your needs. Enroll in our Employee Stock Purchase Plan (ESPP), 401k, flex spending accounts for medical and childcare needs, and participate in our tuition reimbursement and student loan repayment programs

JOB SUMMARY

The Consulting Site Reliability Engineer will function as a (SRE) Site Reliability Engineer. They will be responsible for the availability, performance, monitoring, and incident response, among other things, of the platforms and services of the Clinical Applications across HCA enterprise systems to ensure the stability and integrity.

The (SRE) Site Reliability Engineer, assists in the planning, design, and developing systems within the Clinical Applications department. Also, be involved with the installation, monitoring, maintenance, support, and optimization of all systems and applications management hardware, software, and communication links. These responsibilities will also require a strong subject matter knowledge across the server, network and monitoring application technologies, services and components. Position must keep updated on the latest monitoring technologies and services so that the enterprise systems environment is strategically enhanced to proactively meet the HCA customer/business requirements and maintain service levels.

This individual will primarily analyze and resolve Systems Management hardware and software problems in a timely and accurate fashion, and provide end user training where required. Position will interact daily with the field, customer support, design and build, business partners, vendors, other IT&S staff/management and end users/customers on developing alerting solutions that meet HCA business requirements.

Responsibilities: The Consulting Site Reliability Engineer must exhibit communication skills to maintain positive business relations with employees, customers, business partners, peers and other IT&S personnel. Some of those expectations are: * Accountable for the application ecosystems availability, latency, performance, efficiency, change management, monitoring and incident response * Aid in application release review and implementation * Continually quantify and seek to improve failures and availability * Focus on reliability, resiliency, and maintainability of the application ecosystem * Participate and consult in Application Roadmap discussions to ensure High Availability (HA) and Automation are built-in early on * Liaison between technology, and product development / operations team(s); 50% Operations / 50% Development * Promote a cohesive culture and reduce organizational silos through shared ownership * Consult and advise on key areas such as alerting & monitoring, troubleshooting processes, strategic goal alignment, and roadmap * Event Management – Responsible for defining KPIs, thresholds, and ensuring configuration of alerts * Responsible for trending incidents, event, problems, and changes to identify trends and implement solutions to address root cause * Leader on problem efforts and post mortem efforts to address root cause for assigned technology * Define Application and Technology SLAs to create and implement proactive alerting strategies * Work with the High Availability (HA) team to conduct routine assessments, implement appropriate SLAs tracking, and data driven assessment models where possible * Collaborate with Service Management Organization (SMO) teams to complete necessary actions for support and automation for monitoring, performance and capacity planning, lifecycle & configuration management and disaster recovery * Assist in efforts to implement long-term break/fix solutions and other lesson learned initiatives * Bring efficiencies to the application ecosystem through measures such as automated testing and deployments * Assist in the documentation of system architecture, key processes, and other key problem solving techniques * Fully understand and maintain documentation the system architecture and all dependencies Ability to Manage Up with Peers and Partners: Must be able to PM needs for key project initiatives and support initiatives. Ability to develop and provide dates and a plan related to the system is important for this role. Field impacts and other integrations must be considered.

Qualifications: * Works with Project Engineering to ensure the reliability and maintainability of new and modified installations. The Reliability Engineer is responsible for adhering to the Life Cycle Asset Management (LCAM) process throughout the entire life cycle of new assets. * Participates in the development of design and installation specifications along with commissioning plans. Participates in the development of criteria for and evaluation of software/systems. * Participates in the final check out of new installations. This includes factory and site acceptance testing that will assure adherence to functional specifications. * Guides efforts to ensure reliability and maintainability of equipment, processes, utilities, facilities, controls, and safety/security systems. * Professionally and systematically defines, designs, develops, monitors and refines an/Asset Maintenance Plan/that includes: o Value-added preventive maintenance tasks o Effective utilization of predictive and other non-destructive testing methodologies designed to identify and isolate inherent reliability problems * Provides input to a Risk Management Plan that will anticipate reliability-related, andnon-reliability-related risksthat could adversely impact day-to-day Application operation. * Develops engineering solutions to repetitive failures and all other problems that adversely affect plant operations. These problems include capacity, quality, cost or regulatory compliance issues. To fulfill this responsibility the Reliability Engineer applies: o Data analysis techniques that can include: + Reliability modeling and prediction + Six Sigma (6σ) Methodology o Root-cause Analysis (RCA) o Failure Reporting, Analysis and Corrective Action System * Works with Production to perform analyses of assets including: o Asset Utilization o Overall Equipment Effectiveness o Remaining useful life o Other parameters that define operating condition, reliability and costs of assets * Provides technical support to production, maintenance management and technical personnel * Minimal Travel required. * On Call work will be required. EXPERIENCE * * 7 years *EDUCATION * * High School Graduate/Equivalent * College Graduate Preferred *OTHER/SPECIAL QUALIFICATIONS * Understand how to read network packet captures * Experience with Infrastructure-as-Code * Experience with algorithms, data structures, complexity analysis and software design * Experience supporting hybrid server environments (on premise, AWS, Azure, etc.) * Passion, positive attitude, engagement and desire to take over challenging assignments as part of a team to make things WORK * Microsoft SQL * Oracle * Web Server (Tomcat, Apache, IIS etc.) * Mirth * Virtualization (Hyper-V, VMWare) * eJabber * F5 Loading Balancing * DHCP, DNS, and LAN design, deployment, and troubleshooting PHYSICAL DEMANDS/WORKING CONDITIONS * No major physical demands in this role – primarily a sedentary role HCA Healthcare is a comprehensive healthcare network where 265,000 people across more than 1,800 care facilities are all committed to creating a positive impact every day. It’s an organization that exists to give people healthier tomorrows. Our scale enables caregivers to deliver great outcomes for patients and gives colleagues unparalleled opportunities to learn and grow. Most importantly, as a part of HCA Healthcare we’re connected to something bigger, which means more resources, more solutions and more possibilities for everyone who walks through our doors. What matters most to our diverse and talented colleagues is giving people the absolute best healthcare possible. Every day, we seek to raise the bar higher, not just for ourselves, but for healthcare everywhere.

Be a part of an organization that invests in you. We are actively reviewing applications. Highly qualified candidates will be promptly contacted by our hiring managers for interviews. Submit your application and help us raise the bar in patient care!

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Job: *Information Technology

Title: Consulting Site Reliability Engineer

Location: Tennessee-Nashville-Corporate Main Campus

Requisition ID: 10207-30451

This job was posted on Sat Jan 11 2020 and expired on Tue Jan 28 2020.
Find out how you match this company
puzzle icon
avatar-of-creator

Site Reliability Engineer Interview Questions & Answers

What is the role of a Site Reliability Engineer?

Answer

A Site Reliability Engineer is responsible for ensuring the reliability, availability, and performance of a company's systems and infrastructure.

avatar-of-creator

About the Site Reliability Engineer role

Engineers Site Reliability Engineer

In the role of Site Reliability Engineer, we are responsible for finding out solutions of severe technical problems. We put in our efforts and coordinate with others to quickly come over these technical problems. We act as the software development experts of the company. We evolve the software system to improve and increase its reliability. We also leverage modern tools for generating reliability. We oversee the designing, coding, and testing processes of software. Being Site Reliability Engineers, we aim to deliver error-free systems that can work faster and efficiently. We also train and manage the team working under us.

Core tasks:

  • monitoring the application performance
  • ensuring the safety of overall software
  • troubleshooting the issues that occur in any software
  • managing and supporting on-premise development environments
  • sharing knowledge on the best practices with our juniors
289 Site Reliability Engineer jobs in Alexandria, TN
See more jobs
puzzle icon
Learn more about Site Reliability Engineer job title
Similar jobs in the area

Similar jobs