What Makes A Good Content Generator?
This is the foundation of Black Hat. Years ago, Black hat SEO consisted of throwing up pages with a keyword or phrase repeated hundreds of times. As search engines became more advanced, so did their spam detection. We evolved to more advanced techniques that included throwing random sentences together with the main keyword sprinkled around. Now the search engines had a far more difficult time determining if a page was spam or not. In recent years however, computing power has increased allowing search engines a far better understanding of the relationship between words and phrases. The result of this is an evolution in content generation. Content generators now must be able to identify and group together related words and phrases in such a way as to blend into natural speech.
One of the more commonly used text spinners is known as Markov. Markov isn't actually intended for content generation, it's actually something called a Markov Chain which was developed by mathematician Andrey Markov. The algorithm takes each word in a body of content and changes the order based on the algorithm. This produces largely unique text, but it's also typically VERY unreadable. The quality of the output really depends on the quality of the input. The other issue with Markov is the fact that it will likely never pass a human review for readability. If you don't shuffle the Markov chains enough you also run into duplicate content issues because of the nature of shingling as discussed earlier. Some people may be able to get around this by replacing words in the content with synonyms. I personally stopped using Markov back in 2006 or 2007 after developing my own proprietary content engine. Some popular software that uses Markov chains include RSSGM and YAGC both of which are pretty old and outdated at this point. They are worth taking a look at just to understand the fundamentals, but there are FAR better packages out there.
So, we've talked about the old methods of doing things, but this isn't 1999, you can't fool the search engines by simply repeating a keyword over and over in the body of your pages (I wish it were still that easy). So what works today? Now and in the future, LSI is becoming more and more important. LSI stands for Latent Semantic Indexing. It sounds complicated, but it really isn't. LSI is basically just a process by which a search engine can infer the meaning of a page based on the content of that page. For example, lets say they index a page and find words like atomic bomb, Manhattan Project, Germany, and Theory of Relativity. The idea is that the search engine can process those words, find relational data and determine that the page is about Albert Einstein. So, ranking for a keyword phrase is no longer as simple as having content that talks about and repeats the target keyword phrase over and over like the good old days. Now we need to make sure we have other key phrases that the search engine thinks are related to the main key phrase.
This brings up the subject of duplicate content. We know what goes into a good content generator, but we have the problem of creating readable yet unique content. Let's take a look at duplicate content detection.
Post a Comment