I did a lot of the work involved in one of the most prolific users of Mechanical Turk for a long time, and we had a very mixed bag of results.
The biggest challenge for using Mechanical Turk is how to decide whether a particular worker’s answer is correct. If a computer program could judge the answer, you wouldn’t need a human in the first place. If someone is going to decide on each response, then it would probably be easier for them to just do the work in the first place.
Amazon ran into this problem first. The first big set of tasks was to choose a photograph that best showed the storefront for a business. Many people tried to earn quick bucks by either picking any photo at random or writing a script to submit a random answer.
We eventually came up with a scoring system that rated workers on their agreement with other workers. A computer program compared previous results with answers to decide whether we should trust a particular answer. If the computer program couldn’t decide, it referred the matter to a human (me) for authoritative judging. It took a while until it could judge all the answers without human intervention, but it eventually got there.
MT is a failure only in the sense that it hasn’t revolutionized the relationship between computers and humans, which is a pretty tall order. There just aren’t that many tasks that are easier for a human than for a computer AND which can be farmed out to potentially unreliable workers.
It’s a success in that for the problem spaces for which the above qualities are both present, MT works great. Reading numbers off of documents, figuring out the name of an album from a photograph of the cover, etc. are all the sorts of tasks that MT does well in.
MT has also been used for things that it’s not so good, such as naming your “top 3” of something or other. (This was one of Amazon’s seed tasks.)
We thought that most of the MT workload would come from overseas, namely China, Korea, and Indonesia, where paying 1 cent for a few seconds of work might be a good deal for both parties. However, we found that most of our workers were in the US, and as a group, they really wanted to get paid a lot for doing very little. There were some exceptions, of course, but most of my correspondents were somewhat indignant that they were not able to make a living off of determining not-so-subtle characteristics in data.
Give it a few years, then look again.