Senior High Perfomance Computer Cluster System Engineer

Employer: UNITED ARAB EMIRATES UNIVERSITY
Location: Abu Dhabi, United Arab Emirates
Salary: 11000 to 23000 AED
Closing date: 20 Sep 2019

Academic Discipline: Computer Science, Engineering & Technology
Job Type: Professional Services, Other Professional Services
Contract Type: Permanent
Hours: Full Time

Job Description

HPC cluster systems engineer is responsible for managing and supporting all HPC systems and Grid system, for the University data center and distributed locations.

Solves HPC and Grid related problems on a daily basis.
In support of change management within the data center, provides the CSC with information about the HPC systems.
Daily verifies all HPC Systems by using the monitoring tools and proactively intervenes to solve problems.
Analyze solutions components, understand systems integration challenges and identify technology gaps.
Resolve / propose solutions to above gaps to reach future performance targets and functionality requirements.
Prototype features and perform integration checkout of various software components, and collaborate with component developers and solutions architects.
Develop / drive validation test content and evaluate systems components.
Engage with industry partners as required to identify and investigate best-known methods used in the HPC community and apply those methods.
Collaborate with architects and developers to define architectural requirements for high-end HPC clusters.
Responsible for system integration and validation of UAEU HPC clusters.
Responsible of monitoring all HPC and Grid services.
Co-ordinates work with vendors for support.
Tests and deploys HPC systems.
Knowledge of IT Service Management frameworks.
Maintains accurate and comprehensive documentation diagrams of the enterprise HPC system, backup infrastructure, communications flow, and routing.
Other duties as assigned.

Minimum Qualification

Bachelor degree required in Computer Engineering/Science
3-6 years of experience
HPC Cluster Administration
Advanced RED Hat Linux Administration

Preferred Qualification

Knowledge of server hardware components, diagnostics and replacing them defective items.
Good communication skills & Report Writing Skills.
Must be able to work under pressure in a fast-paced work environment.
Must be able to work flexible hours including evenings, weekends, holidays and overtime as required, should be available 24/7 on-call in case of major services outage.
Strong problem solving, testing, and network troubleshooting skills
Cluster solutions integration and administration
Linux operating systems and OS components for HPC clusters
Cluster provisioning, systems management, resource management middleware
Cluster interconnect fabrics and software stack
HPC Cluster storage solutions
Parallel programming models for HPC clusters

Division Information Technology Division-CIO

Department Infrastructure&Core Techno. Section

Job Close Date open until filled

Job Category Staff

Salary 11000 to 23000 AED

Senior High Perfomance Computer Cluster System Engineer

Get job alerts