Senior Service Management Analyst
Drive the optimization of technology operations though system/services performance monitoring, data and analysis. Monitor operation to ensure established service level agreements/objectives (SLA/OLAs) are met. Collect data, analyze and report on performance, usage and trends within assigned portfolio. Synthesize findings to draw correlations between performance symptoms and existing or potential issues, identify exceptional conditions and opportunities for improvement. Escalate and document issues and events in accordance with established procedures and processes. Responsibilities
- Responsible for developing an Event Management process.
- Responsible for providing the ability to detect events, make sense of them and determine the appropriate control action is provided
- Associate one or more events with a single cause in determining root cause analysis
- Use technology monitoring tools to monitor assigned environments and/or technical assets and identify/detect performance and behavior outside of established standards or SLA/SLOs.
- Evaluate current and/or potential impact deviation may cause to infrastructure, delivery or services.
- Event Trends and pattern analysis, identified during analysis of Event logs, which suggests that improvements to the infrastructure are needed.
- Identifying top talkers CIs and collaborate with respective stakeholders to mitigate the impact.
- Identify the actual correlation between high severity incidents and the event/Auto-gen occurred against it
- Alert appropriate team (per incident and event management processes) to provide warning/notification that a threshold has been reached, something has changed, or a failure has occurred.
- Candidate should have the ability to look at the different infrastructure components while working with the application team to suggest and automate their day to day tasks.
- Candidate should be able to look at the daily event and incident queue to identify the automation opportunities and work towards the implementation from the SPOC from the application team.
- Candidate should work closely with Problem Management team to identify the repetitive RCAs which can be automated with the help of application teams.
- Collaborate with management and/or Process Owner to determine reporting to drive utilization, efficiency and effectiveness of technology systems and/or services.
- Create reporting templates and dashboards to consistently and succinctly convey trends, consumption, and performance to leadership, business partners and other technology teams (as applicable).
Alert & Document
- Document concerns and findings, collect all pertinent data (to include comparison of exception data and normal data) and ensure incident/event tracking tools are up-to-date (per established guidelines and procedures).
- Use experience, expertise and data analysis to collaborate with manager and team members in the identification of corrective action to increase efficiency, improve performance and meet or exceed consumption targets.
- Experience in handling Infrastructure level issues, deep understanding of network devices, storage, web tier architectures, databases, application support. Required Qualifications
Bachelor's degree in Computer Science, IT, MIS, Math or related field; or equivalent work experience.
- Demonstrated ability to clearly and persuasively communicate (verbal and written) ideas, issues and recommendations.
- Experience in working with clients/customers located at different geographies of the world
- Around 5 years of experience in a Infrastructure technology operations organization. Person should understand Server, Storage, Database, Networks domain etc.
- Around 4 Yrs. of proficiency in any scripting language preferably PowerShell, Python, Ansible.
- Strong, proven ability to multi-task Preferred Qualifications
- - ITIL Foundation certification.
- Strong analytical ability with proven proficiency in synthesizing data into meaningful and digestible data points and actions.
- 5-7 years of experience on ITSM tools like ServiceNow, Tivoli based tools.
- Strong attention to detail.
- Experience with event correlation and interpretation, utilizing various monitoring tools including (Dynatrace, SumoLogic, ScienceLogic, HP Business Service Management, HP SiteScope, PRTG, SevOne, SolarWinds)
- Previous experience in an incident management role