Catching Sockpuppetry in Wikipedia - A Systematic Literature Review of Detection Methods

Authors

  • PRIYANSHU GHOSH Mission San Jose High School, Fremont, CA
  • ANANYA ARVIND Newbury Park High School, Newbury Park, CA
  • Mihai Boicu Mihai Boicu

DOI:

https://doi.org/10.13021/jssr2023.3856

Abstract

Malicious users often resort to sockpuppet accounts to vandalize, defraud, or spread misinformation on online platforms, bypassing security metrics and violating the platform's terms of service. Wikipedia, a crowdsourced encyclopedia, is frequented by such challenges with sockpuppetry, resulting in consequences such as misinformation, vote stacking, false majority opinions, and undisclosed paid editing. Wikipedia lacks an internal detection system for these sockpuppet accounts, relying on inefficient manual detection and banning processes, enabling malicious users to evade suspensions through unbanned accounts. Our research compares the two-leading proposed sockpuppet detection methods—Support Vector Machine (SVM) and RF (Random Forest) algorithms—and finds that the RF model displays superior performance at a general level. Considering feature selection, scale of detection, and performance metrics, unique strengths were found for both. Regarding scale of detection, for example, the RF model thrives when operating with the cluster approach, whereas the SVM model performs best with the combined model approach. Looking exclusively at metrics, the papers analyzed contained their highest reported accuracy for SVM at 92.6%, whereas RF was able to generate 99.8% accuracy. All factors considered, the RF model yields the best results overall for sockpuppet detection on Wikipedia when trained with an emphasis on nonverbal features. 

Published

2023-10-27

Issue

Section

College of Engineering and Computing: Department of Information Sciences and Technology

Categories