Unveiling Sockpuppets: A Systematic Review on Feature Selection for Automated Sockpuppet Detection in Wikipedia


  • CHIOH LEE Langley High School, McLean, VA
  • AKHIL RACHAKONDA Chantilly High School, Chantilly, VA
  • Mihai Boicu Department of Information Sciences and Technology, George Mason University, Fairfax, VA




While social media platforms provide an open space for opinion sharing, they face credibility challenges from users who create multiple deceptive identities, or sockpuppet accounts, to engage in discussions to manipulate public opinions giving false impressions of the majority. With virtually no defensive measures in place, social media platforms have been exploited to manipulate public sentiment, jeopardizing their credibility. Accordingly, eight recent Wikipedia-focused sockpuppet detection studies were reviewed to analyze and compare inter-cluster and intra-cluster feature selection for automated sockpuppet detection methods. Early sockpuppet detection studies focused on stylistic verbal features. This method was followed by the implementation of non-verbal features. Recent studies tend to focus on a fusion of non-verbal and contextual verbal features, which can identify patterns within the text. The analysis reveals that these shifts in feature selection are accompanied by changes in experimental research methodologies, including the notable shift from Support Vector Machine to Random Forest (RF) as the detection model and the adoption of K-Fold Cross Validation for improving performance estimation consistency, with the most effective solution being the combination of RF with non-verbal features. These findings highlight the importance of feature selection and changes in identifying features over time to effectively detect sockpuppet accounts.





College of Engineering and Computing: Department of Information Sciences and Technology