A System for Scalable and Reliable Technical-Skill Testing in Online Labor Markets

The emergence of online labor platforms, online crowdsourcing sites, and even Massive Open Online Courses (MOOCs), has created an increasing need for reliably evaluating the skills of the participating users (e.g., “does a candidate know Java”) in a scalable way.  Many platforms already allow job candidates to take online tests to asses their competence in a variety of technical topics. However the existing approaches face many problems.  First, cheating is very common  in online testing without supervision, as the test questions often “leak” and become easily available online along with the answers.  Second, technical-skills, such as programming, require the tests to be frequently updated in order to reflect the current state-of-the-art.  Third, there is very limited evaluation of the tests themselves, and how effectively they measure the skill that the users are tested for.  In this article we present a platform, that continuously generates test questions and evaluates their quality as predictors of the user skill level.  Our platform leverages content that is already available on question answering sites such as Stack Overflow and re-purposes these questions to generate tests.  This approach has some major benefits: we continuously generate new questions, decreasing the impact of cheating, and we also create questions that are closer to the real problems that the skill holder is expected to solve in real life.  Our platform leverages the use of Item Response Theory to evaluate the quality of the questions.  We also use external signals about the quality of the workers to examine the external validity of the generated test questions:  Questions that have external validity also have a strong predictive ability for identifying early the workers that have the potential to succeed in the online job marketplaces.  Our experimental evaluation shows that our system generates questions of comparable or higher quality compared to existing tests, with a cost of approximately $3 to $5 dollars per question, which is lower than the cost of licensing questions from existing test banks, and an order of magnitude lower than the cost of producing such questions from scratch using experts.