A Comparative Study on Annotation Quality of Crowdsourcing and LLM via Label Aggregation¶

Arxiv Link - 2024-01-18 07:23:51

Abstract¶

Whether Large Language Models (LLMs) can outperform crowdsourcing on the data annotation task is attracting interest recently. Some works verified this issue with the average performance of individual crowd workers and LLM workers on some specific NLP tasks by collecting new datasets. However, on the one hand, existing datasets for the studies of annotation quality in crowdsourcing are not yet utilized in such evaluations, which potentially provide reliable evaluations from a different viewpoint. On the other hand, the quality of these aggregated labels is crucial because, when utilizing crowdsourcing, the estimated labels aggregated from multiple crowd labels to the same instances are the eventually collected labels. Therefore, in this paper, we first investigate which existing crowdsourcing datasets can be used for a comparative study and create a benchmark. We then compare the quality between individual crowd labels and LLM labels and make the evaluations on the aggregated labels. In addition, we propose a Crowd-LLM hybrid label aggregation method and verify the performance. We find that adding LLM labels from good LLMs to existing crowdsourcing datasets can enhance the quality of the aggregated labels of the datasets, which is also higher than the quality of LLM labels themselves.

Socials¶

X

🚀 Exciting developments in the world of AI and NLP! Can Large Language Models (LLMs) outperform crowdsourcing on data annotation tasks? 🤔📊

Recent studies have delved into this question by comparing the performance of individual crowd workers and LLM workers on specific NLP tasks. However, there's a new perspective to explore! 🌟

Check out this research paper that investigates utilizing existing crowdsourcing datasets for evaluating annotation quality from a different angle. 📝 The findings reveal that incorporating LLM labels can enhance the quality of aggregated labels, surpassing the quality of LLM labels alone. 📈

Dive deeper into the study and explore the proposed Crowd-LLM hybrid label aggregation method for optimizing data annotation tasks. 🧠

Read more about this intriguing research here: http://arxiv.org/abs/2401.09760v1

#AI #NLP #LLMs #DataAnnotation #Crowdsourcing #Research #TechInnovation

🚀 Exciting findings in the world of AI and NLP! Can Large Language Models outperform crowdsourcing for data annotation tasks? This study delves into the comparison between individual crowd workers and LLM workers, along with proposing a Crowd-LLM hybrid label aggregation method. Discover the results here: http://arxiv.org/abs/2401.09760v1 #AI #NLP #LLMs #Crowdsourcing #TechResearch 🤖📊

A Comparative Study on Annotation Quality of Crowdsourcing and LLM via Label Aggregation¶

Abstract¶

Socials¶

PDF¶