Skip to main content

This job has expired

Research Fellow in Generative Audio AI

Surrey, United Kingdom
£36,024 to £38,205 per annum
Closing date
21 Apr 2024

Job Details

Vision, Speech & Signal Processing

Location:  Guildford
Salary:  £36,024 to £38,205 per annum
Post Type:  Full Time
Closing Date:  23.59 hours BST on Sunday 21 April 2024
Reference:  013724

The University of Surrey is a global community of ideas and people, dedicated to life-changing education and research.

We are ambitious and have a bold vision of what we want to achieve - shaping ourselves into one of the best universities in the world, which we are achieving through the talents and endeavour of every employee.  

Our culture empowers people to achieve this aim and to collectively, and individually, make a real difference.  

 The role

Applications are invited for a Research Fellow (RF) position for 12 months within the Centre for Vision Speech and Signal Processing (CVSSP), University of Surrey, UK, to work in the area of generative AI for audio generation with text and video prompts. 

 The post is funded by a leading generative AI startup. The focus will be to develop generative machine learning models and signal processing algorithms for sound generation, given prompts from text and/or video.  This work is built on the recent contributions of CVSSP in the generative AI models for audio generation, such as AudioLDM and Re-AudioLDM, with a focus on scaling up the models with additional datasets and extending the models to include more modalities such as video. 

 The post-holder will be based in CVSSP, and work under the direction of the Principal Investigator Prof Wenwu Wang, with co-supervision by Prof Mark Plumbley, and in collaboration with the industrial partner.

 About you

The post-holder is expected to have a PhD degree (or equivalent) in the area of machine learning, generative AI, acoustic signal processing, cross-modal processing among audio, text and video, or a related area in electronic engineering, applied mathematics, computer science, and statistics. The post-holder is expected to have strong analytical skills and programming skills in Python, Matlab or C/C++.  Preference will be given to those who have experience on generative AI models, audio generation, cross modal translations (such as, text to audio, video to audio), but candidates who have experience in machine learning and audio-visual processing are welcome to apply.  

 How to apply

 Please submit a CV and cover letter with your application, on the University website. For informal inquiries, please contact Prof Wenwu Wang (Email:; Web:

Please note, interviews scheduled to take place week commencing 29th April. 

CVSSP is an International Centre of Excellence for research in Audio-Visual Machine Perception and AI, with over 180 researchers. The Centre has state-of-the-art audio and video capture and analysis facilities supporting research in real-time video and audio processing and visualisation. CVSSP has a compute facility with 200 GPUs and >2PB of high-speed secure storage.

Further details:    Job Description    

Please note, it is University Policy to offer a starting salary equivalent to Level 3.6 (£34,980) to successful applicants who have been awarded, but are yet to receive, their PhD certificate.  Once the original PhD certificate has been submitted to the local HR Department, the salary will be increased to Level 4.1 (£36,024).


Company info

Get job alerts

Create a job alert and receive personalised job recommendations straight to your inbox.

Create alert