title.  In MTurk We Trust: Issues with Crowdsourcing in Research

date. 2.20.2018

author. Rachelle Prince (rachelle.prince@wayne.edu)

Amazon is largely thought to be an efficient online retailer where you can buy anything your heart desires, from A to Z. For many researchers, however, the dominating online retailer has another reason to be celebrated. In 2005, developers launched the online service known as Amazon's Mechanical Turk (MTurk) in hopes of creating an on-demand, scalable workforce consisting of Workers (registered users who complete Human Intelligence Tasks, or HITs, for payment) and Requesters (registered users who provide HITs to be completed in exchange for payment). Today, the MTurk website boasts that it gives Requesters access to more than 500,000 Workers from 190 countries who hold a variety of skill sets and capabilities. Likely one of the online service's most attractive features is that it promises a quick turnaround and accurate results while offering a lower cost structure compared to traditional sampling pools (including university student and panel samples). The service is often used by businesses, market researchers, and scientists alike.

As a result of MTurk's recent appeal to researchers, the behavioral research landscape is changing. Traditional sampling methods that rely on university student samples or expensive research panels have been shown to be comparably inattentive, low-quality and expensive. A more reliable, cost-friendly participant recruitment tool was needed, and Amazon's MTurk answered the call. Although the crowd-sourcing service has its own set of shortcomings, most researchers agree that the service is equal (if not superior) to traditional samples insofar that its Workers are more attentive and eager to please -- if the price is right, of course. As an active social science research team, we have used MTurk for smaller tasks like pretesting and scale-building, and our experience with the service has been great overall. But we have encountered some interesting issues along the way...

On the topic of compensation, it's important to note that although there is no Amazon-set benchmark to acceptable payment per HIT, most scholars affiliated with a reputable institution would be wise to follow the payment guidelines graciously outlined by fellow researchers in an online forum hosted by the Society for Personality and Social Psychology: The maximum payment mentioned in the forum was $0.25 per minute and the minimum was $0.05 per minute. Another frequent comment was when all else fails (or if your team, department, or Institutional Review Board (IRB) doesn't already have established guidelines for MTurk payments), many researchers suggested average minimum wage as your benchmark: for reference, in 2017, the average pay rate was $0.1379 per minute. Notably, compensation may vary depending on required effort, risks, benefits, and tasks you're asking people to do.


There are so many important discussions about what is ethical to pay online workers (see articles from Wired and The Atlantic). From the investigator point of view, it's tough to pay for a big sample if you pay each individual participant a lot of money. Then again, we need to fairly compensate participants for their time and effort. There are some minor things you can do to help Workers out: (1) Be honest in your time estimates--don't say a survey will only take 5 minutes when you know that it can take up to 15 minutes; (2) Tell them what you expect in the actual task, be specific with your instructions (specific question types, etc.). Believe it or not, Workers often use the information you give them to gauge whether they really can provide quality work.

Once you've collected your responses, as the Requester, you must choose to either accept (pay) or reject (not pay) each Worker's submission. But note: if you've committed to paying Workers and outlined this in your consent documents, it would likely be considered unethical to reject work, even if it is sub-par. According to MTurk's policies on the matter, Requesters can and should omit payment if: "a worker submits a wrong or unacceptable answer" (e.g. if a Worker did not follow clear instructions). Although Amazon and IRBs may not see eye to eye when it comes to participant compensation policies, Workers do care about their approval scores and payment, meaning if you choose to reject their work, be sure to provide helpful feedback so they can become better Workers. But if the same Worker continually submits unacceptable answers to your HITs, or if they are not responding to feedback, training, or other communication you should block them since rejecting work does affect your Requester score and the Worker's score. By blocking a Worker on MTurk, the system will prevent them from completing any of your HITs in the future.

Another option is to qualify your Workers. The MTurk feature that has been particularly celebrated among our research team is the "Qualify Workers" option. Once a subset of the Worker population completes your task, as the Requester, you have the ability to assign those Workers a qualification type, created and managed by you. For instance, if you choose to pay every Worker who completed your survey pretest and now you want to parse out eligible candidates from ineligible ones you can split that Worker sample into groups by qualifying certain paid Workers as "Eligible" and other paid Workers as "Ineligible". Then for future surveys, longitudinal studies in particular, you can stipulate that only Workers who have the "Eligible" qualification may participate. Or, reversely, you can stipulate that only Workers who have NOT been granted the qualification of "Ineligible" may participate in your study -- this option would make your study available to the Workers who have been assigned the "Eligible" qualification along with new or other Workers in the MTurk population. There are many ways to approach Qualifications in MTurk, so for more details on this process please visit this link.

One final tip: build a network and communicate with your participants. Workers often have a lot to say--everything from helpful tips that might make your survey run smoother, or even catching typos you missed in your questions. As Dr. Elizabeth Stoycheff discovered in her 2016 study about payment and retention in longitudinal studies, cash isn't always king in this online marketplace for crowdsourced work. According to Dr. Stoycheff, building interpersonal trust mattered more to Workers than the payment figure alone. 

In MTurk We Trust. 

© 2020 SMART Labs @ Wayne. Photos by Jake Mulka & Ryan Tong