Skip to main content

This job has expired

Senior High Perfomance Computer Cluster System Engineer

Employer
UNITED ARAB EMIRATES UNIVERSITY
Location
Abu Dhabi, United Arab Emirates
Closing date
3 Dec 2020

Job Description

HPC cluster systems engineer is responsible for managing and supporting all HPC systems and Grid system, for the University data center and distributed locations.

  • Solves HPC and Grid related problems on a daily basis.
  • In support of change management within the data center, provides the CSC with information about the HPC systems.
  • Daily verifies all HPC Systems by using the monitoring tools and proactively intervenes to solve problems.
  • Analyze solutions components, understand systems integration challenges and identify technology gaps.
  • Resolve / propose solutions to above gaps to reach future performance targets and functionality requirements.
  • Prototype features and perform integration checkout of various software components, and collaborate with component developers and solutions architects.
  • Develop / drive validation test content and evaluate systems components.
  • Engage with industry partners as required to identify and investigate best-known methods used in the HPC community and apply those methods.
  • Collaborate with architects and developers to define architectural requirements for high-end HPC clusters.
  • Responsible for system integration and validation of UAEU HPC clusters.
  • Responsible of monitoring all HPC and Grid services.
  • Co-ordinates work with vendors for support.
  • Tests and deploys HPC systems.
  • Knowledge of IT Service Management frameworks.
  • Maintains accurate and comprehensive documentation diagrams of the enterprise HPC system, backup infrastructure, communications flow, and routing.
  • Other duties as assigned.

Minimum Qualification

  • Bachelor degree required in Computer Engineering/Science
  • 3-6 years of experience
  • HPC Cluster Administration
  • Advanced RED Hat Linux Administration

Preferred Qualification

  • Knowledge of server hardware components, diagnostics and replacing them defective items.
  • Good communication skills & Report Writing Skills.
  • Must be able to work under pressure in a fast-paced work environment.
  • Must be able to work flexible hours including evenings, weekends, holidays and overtime as required, should be available 24/7 on-call in case of major services outage.
  • Strong problem solving, testing, and network troubleshooting skills
  • Cluster solutions integration and administration
  • Linux operating systems and OS components for HPC clusters
  • Cluster provisioning, systems management, resource management middleware
  • Cluster interconnect fabrics and software stack
  • HPC Cluster storage solutions
  • Parallel programming models for HPC clusters

Division Information Technology Division-CIO

Department Infrastructure&Core Techno. Section

Job Close Date open until filled

Job Category Staff

Salary 11000 to 23000 AED

Get job alerts

Create a job alert and receive personalised job recommendations straight to your inbox.

Create alert