High Performance Computing System Administrator

California, United States
10 Jun 2020
End of advertisement period
10 Aug 2020
Contract Type
Full Time

Position: HPC Systems Specialist

Job Site: 255 Panama Street, Polya Hall, Stanford, CA. 94530


Design, conceptualize, develop, optimize integrate and operate petascale multi-user High Performance Computing (HPC) parallel files systems and storage. Trouble-shoot highly complex problems for which analysis and resolution require extensive knowledge of many diverse system components, (ie: kernel-level knowledge of the Lustre filesystem, operating system internals, device drivers and physical hard disk and controller architecture, Infiniband network architecture, protocols and hardware). Systems administration, operation and management of multi-user heterogeneous HPC computing cluster systems, including multiple generations of graphical processors (GPUs) and processors. Configuration, management and optimization of job scheduling and resource management tools as well as ensuring compliance with Stanford's Minimum Security guidelines.  Develop, maintain and extend open source software utilities to (a) automate HPC computational cluster, filesystems and storage and administrative processes, policies and notifications; (b) analyze, monitor and alert regarding performance and/or utilization thresholds; (c) provide business intelligence to inform future investment and technical implementations; (d) ensure systems and services are both accessible, performant and extensible. 

Requirements:   Bachelor’s degree in computer science, computer engineering, computational science or related and 10 years’ of experience.  Employer will also accept a master’s degree in the same fields of study and 8 years of experience

Special Requirements: 

  • Multi-petabyte, multi-user scale Lustre file systems 
  • Design of lan and wan Lustre networks (lnet) and lnet routers administration 
  • Design, deployment and administration of Infiniband low latency networks fabrics 
  • Linux kvm (kernel-based virtual machine) and single root input/output virtualization (sr-iov) expert (design and dministration of kvm/sr-iov systems) 
  • Message passing interface (mpi) development and operation 
  • Slurm workload manager administration 

Proof of authorization to work in U.S. is required if hired.

The job duties listed are typical examples of work performed by positions in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties as assigned.

Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of his or her job.

Stanford is an equal employment opportunity and affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law.

Additional Information

  • Schedule: Full-time
  • Job Code: 4770
  • Employee Status: Regular
  • Grade: M
  • Requisition ID: 86731

Similar jobs

Similar jobs