Data Science Fellow
Faculty of Arts and Sciences
Institute for Quantitative Social Science (IQSS)
This posting is for a Data Science Fellow to participate in the design of the Automated History Archive. Many of the biggest challenges that our society faces have their roots in the past, and history can provide fundamental insights into their causes and potential solutions. However, vast amounts of historical quantitative data that could shed light on important issues remain locked in hard copy due to prohibitive curation costs. Historical data are often scattered irregularly amongst text in the original publications. Commercial OCR software performs poorly when tables are irregular, often requiring the user to manually denote the structures by drawing boxes. Off the shelf tools for table assembly using clustering machine learning methods do not exist.
The Automated History Archive will automate the conversion of historical quantitative images into classified, machine-readable datasets on a large scale and deposit these in a collaborative, open source data platform. Building on our initial successes, the fellow will play a core role in developing algorithm prototypes that integrate computer vision tools that can recognize data structures in the raw images with machine learning techniques for classifying digitized table fragments.
The position provides a unique opportunity for a promising young scholar – planning to pursue a PhD in Engineering, Computer Science, or a quantitative Social Science – to be immersed in a top-notch research environment. The initiative is housed in Harvard’s Institute for Quantitative Social Science (IQSS), which is dedicated to understanding and solving society’s greatest problems through bold and collaborative social and data science. The fellow will work closely with the PI, Professor Melissa Dell. The fellow will be an active participant in the Harvard research community and will have opportunities to develop their own research agenda on issues related to the initiative.
There are two open positions with a one year term, with a potential opportunity for extension (conditional on funding availability and performance). The start date is flexible.
Applicants should have experience working with machine learning methods for image data. Beyond this, it is imperative that applicants have an interest in advancing methodology for non-standard use, towards automated extraction of structured data from large datasets of historical document scans. The Data Science fellowship requires innovating methods, not simply applying existing tools. The position requires a Bachelors degree.
Please do NOT apply via ARIES. Only applicants who follow the application instructions will be considered.
Interested candidates should send a CV, transcript, and one letter of reference to email@example.com. The subject line should contain the phrase “IQSS fellow application.” Candidates who advance to the next round will be required to complete a programming exercise, submit a research statement, and participate in a Skype interview.
Please see Special Instructions section.
Equal Opportunity Employer
We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability status, protected veteran status, pregnancy and pregnancy-related conditions or any other characteristic protected by law.