Manager, Reliability,Engineering Manager, Reliability,Engineering …

Texas Capital Bancshares Incorporated
in Richardson, TX, United States
Permanent, Full time
Be the first to apply
Texas Capital Bancshares Incorporated
in Richardson, TX, United States
Permanent, Full time
Be the first to apply
Manager, Reliability,Engineering

At Texas Capital Bank, we are driven by a single-minded and unwavering mission: to serve business and the individuals who run them. We use a consultative approach and innovative technologies to develop new ideas that give the bank and our clients a competitive advantage. We partner with our customers to push the boundaries of what’s possible—together.

Headquartered in Dallas, Texas Capital Bank has offices in Austin, Fort Worth, Houston, Richardson, Plano and San Antonio, and we serve clients in a variety of industries from coast-to-coast.

We are on the Forbes Best Banks in America list, and were named a top place to work by The Dallas Morning News, Houston Chronicle and San Antonio Express-News.

Texas Capital Bank is seeking a Manager over our Operations Center and Site Reliability Engineering.  You will be leading the team responsible for all of Texas Capital Bank’s application, system, and cloud monitoring systems.  Additionally you will be responsible for the day-to-day operations of the IT Command Center.  In this role you will be required to think creatively to develop a programmatic monitoring data collection and event correlation system.  Successful candidates will need to be able analyze complex problems and produce effective solutions that will decrease the time to identify and resolve issues. 

  • Lead and manage the day-to-day operations of the IT Command Center which consists of a traditional NOC and the team responsible for enterprise monitoring.
  • Manage and lead the effort to automate and transform manual processes
  • Build, develop, and deploy Ansible into the production environment for operations consumption
  • Create and maintain multiple Ansible modules leveraging Python
  • Provide development guidance and operational system support for all monitoring and event handling applications
  • Produce and maintain both team and system performance metrics and reports
  • Collaborate in the creation of standards, polices, and process all aspects of the monitoring systems.
  • Analyze current monitoring systems effectiveness and create action plans to improve or expand visibility into the system or application performance
  • Design, develop, and integrate the monitoring and automation tool to meet the bank and IT footprint growth
  • Participate in new system planning and deployments to determine the best monitoring strategy
  • Collaborate with the line-of-business owners to understand application and workflow processes in order apply automation to common manual tasks
  • Create and maintain service maps linking an application to the underlying infrastructure
  • Create and maintain team knowledge documentation
  • Work with various IT organizations to create custom dashboards, alerts, metrics, and reports
  • Review and analyze hard to monitor systems, applications or business workflows and present creative solutions
  • 2+ years of IT Operations management experience
  • 4+ years experience administrating various infrastructure and application monitoring tools
  • 4+ years ServiceNow experience
  • 3+ years utilizing various scripting languages such as Python, Javascript, C#, and Ruby
  • Understanding of programming languages such as .Net, C++, and Java
  • Must be familiar with AI Operations technologies
  • Must have the ability to integrate various applications
  • Experience leading small agile teams
  • Knowledge of configuration management tools such as Ansible, Chef, or Puppet
  • The ability to analyze new software and hardware systems and determine the most elegant solution
  • A strong understanding of various technologies such as Servers, Storage, Networking, Load Balancing, DNS, DHCP etc.
  • Working knowledge of applications and the various IT components such as database, web, Exchange, storage, IIS etc.
  • Strong knowledge of the ServiceNow
  • Must demonstrate strong sense of urgency regarding solving end-user issues
  • Experience with cloud environments and a strong understanding of hybrid environments
  • Flexibility to work various shifts required