Gamification of Crowdsourced Corpus Cleaning


Alternative text - include a link to the PDF!

Related Projects


Creator/Artist: Vishnu Priyan

Category: Interaction Design

Document: P2 Project

Batch: 2016-2020

Source: India,   IDC IIT Bombay

Period:  2019-onwards

Medium: Report pdf

Supervisor: Prof. Anirudha Joshi


Detailed Description

Swarachakra Tamil is a touchscreen keyboard for the Android Smartphone Operating System. Swarachakra maintains a list of all words typed using the keyboard by the users. This is a steadily growing database of conversational, daily-use words that potentially can be used for research and other purposes, like chatbot design, predictive text, autocorrect, etc. But at the moment, this is raw data without any metadata and has errors in it. The errors include, but are not limited to, spelling mistakes, non-existent words, and words from other languages. These errors in the database need to be identified, tagged, and rectified (Correction/Removal), which will be referred to as cleaning henceforth. This database is to be cleaned before it can be used for any other purposes.