Unmasking Search Engine Spam: Machine Learning Techniques for Detecting Algorithmically Generated Content
Keywords:
Search Engine Spam, Spam Detection, Machine Learning, Text Generation, Markov Chains, Spam Filtering Algorithms, Automated Text Generation, Natural Language Processing, Language Patterns, Duplicate Content Detection, Non-Dictionary FeaturesAbstract
We propose a new way to spot search engine spam created by computer programs. Our method looks at how often certain writing styles and genres appear in a text. We use machine learning to automatically identify spam based on these patterns. Experiments show that our method can successfully detect spam created by a common type of text generator called a Markov chain.