Life After Google Penguin
In looking back at my recent posts
here it seems, though not by design, there was a theme emerging. Have a look...
·
Google Penalty or Algorithm Change: Dealing With Lost Traffic – Explains the difference between penalties, filers and dampeners (as
well as things you can do).
·
SEO & Google: The Ugly Truth – Discusses spammy and quality SEO.
·
Negative SEO: Looking for Answers from Google – Deals with the seeming potential for problems created by competitors.
And that was all pre-Penguin no less.
Seems my Spidey-sense was tingling. The world of search engine optimization
just keeps getting more convoluted. Now more than ever, very little is clear.
To date I have not
touched upon the Penguin update because, well, we
just didn't know. There wasn't enough data to say much. Of course that really
hasn't changed, but there are a few things we can certainly look at to help
better understand the situation at hand.
But let's give it a go anyway shall
we?
A Name is Just a Name
The first thing we
need to consider is that there are numerous Google algorithm updates, some of
which aren't named. In the weeks before the infamous Penguin rolled out,
there was a Panda hit and another link update. The three of them, being within
a five-week period, makes a lot of the analysis problematic.
And that's the point worth
mentioning. Don't try too hard to look for dates and names. Look more to the
effects.
We're here to watch the evolution of
the algos and adapt accordingly. Named or not, doesn't matter. Sure, it can be
great for diagnosing a hit, but beyond that, it means little.
Regardless of the
myriad of posts on the various named updates, none of us really know what is
going on. That's where the instinct part of the job
comes in. Again, knowing the evolution of search, goes a long way.
What is Web Spam?
To understand how web spam is
defined, you need to look at how search engineers view SEO. While there are
many, I like this:
“any deliberate human action that is meant to trigger an unjustifiably
favorable relevance or importance for some web page, considering the page's
true value.” (from Web Spam Taxonomy, Stanford)
And:
“Most SEOs claim that spamming is only increasing relevance for queries
not related to the topic(s) of the page. At the same time, many SEOs endorse
and practice techniques that have an impact on importance scores to achieve
what they call "ethical" web page positioning or optimization. Please
note that according to our definition, all types of actions intended to boost ranking, without improving the true value of a page, are considered
spamming.” (emphasis mine)
Well la-dee-da huh?
We can intimate that Google has eased that stance by trying to define white hat and black
hat, but at the end of
the day any and all manipulation is seen in a less than favorable light.
The next part of
your journey is to establish in your mind what types of activities are commonly
seen as web spam. Here's a few:
·
Link manipulation: Paid links, hidden, excessive reciprocal, shady links etc.
·
Cloaking: Serving different content to users and Google.
·
Malware: Serving nastiness from your site.
·
Content: Spam/keyword stuffing, hidden text, duplication/scraping.
·
Sneaky JavaScript
redirects.
·
Bad neighborhoods: Links, server, TLD.
·
Doorway pages.
·
Automated queries
to Google: Tools on your site,
probably a bad idea.
That's about the core of the main
offenders. To date with the Penguin update, people have been mostly talking
about links. Imagine that... SEOs obsessed with links!
However, we should
go a bit deeper and surely consider the other on-site aspects. If not on yoursite, then on the site links are
coming from.
On-site Web Spam
Hopefully most
people reading this, those with experience in web development and SEO (or
running websites), don't use borderline tactics with their sites. We do know
there is certainlyelements of on-site with both the Penguin and Panda updates... so it's worth
looking at.
Here are some common areas search
engines look at for on-site web spam:
·
Domain: Some testing has
shown that .info and .biz domains are far more spam laden than more traditional
TLDs.
·
Words per
page: Interestingly it seems spam pages have more text than non-spam pages (although
over 1,500 words, the curve receded). Studies have shown the spam sweet spot to
be in the 750-1,500 word region.
·
Keywords in
title: This was mentioned in more than a few papers and
should be high on the audit list. Avoid stuffing; be concise.
·
Anchors to Anchor
text: In other studies engineers looked at the ratio of
text, to anchor text on a page.
·
Percentage of
visible text: This involves hidden text and nasty
ALT text. What percentage of text is actually being rendered on the page.
·
Compressibility: As a
mechanism used to fight keyword stuffing, search engines can also look at
compression ratios. Or more specifically, repetitious or content spinning.
·
Globally popular
words: Another good way to find keyword stuffing is to compare the words on
the page to existing query data and known documents. Essentially if someone is
keyword stuffing around given terms, they will be in a more unnatural usage
than user queries and known good pages.
·
Query spam: By looking at the
pattern of the queries, in combination with other signals, behavioral data
manipulation would become statistically apparent.
·
Phrase-based: looking for
textual anomalies in the form of related phrases. This is like keyword stuffing
on steroids. Looking for statistical anomalies can often highlight spammy
documents.
·
Globally popular
words: Another good way to find keyword stuffing is to compare the words on
the page to existing query data and known documents. Essentially if someone is
keyword stuffing around given terms, they will be in a more unnatural usage
than user queries and known good pages.
(some snippets
taken from my post "Web Spam; the Definitive Guide")
And yes, there's
actually more. The main thing to take from this is that there are often many ways that the search engines look at
on-site spam, not just the obvious ones. Once more, this
is about your site and the sites linking
to you.
A lot of on-site web spam that's a
true risk, will be from hacking. Sure, your CMS might be spitting out some
craziness, or your WordPress plug-in created a zillion internal links, but
those are the exceptions. If you're using on-site spam tactics, I am sure you
know it. Few people actually use on-site crap post-Panda, many times it's the
site being hacked that causes issues. So be vigilant.
Link Spam
Is the Penguin
update all about links? I'd go against the grain and say no. Not only do we
have to consider some of the above elements, but also there seems to be an
element of 'trust' and authority at
play here as well. If anything, we may be seeing a shift away from the traditional PageRank model
of scoring, which of course many may perceive as a penalty, due to links.
But what is link spam? That answer
has been a bit of a moving target over the years, but here are some common
elements:
·
Link stuffing: Creating a ton of
low-value pages and point all the links (even on-site) to the target page. Spam
sites tend to have a higher ratio of these types of unnatural appearances.
·
Nepotistic links: Everything from
paid links to traded ones, (reciprocal) and three-way links.
·
Topological
spamming (link farms): Search engines will look at the
percentage of links in the graph compared to known "good" sites.
Typically those looking to manipulate the engines will have a higher percentage
of links from these locales.
·
Temporal anomalies: Another area
where spam sites generally stand out from other pages in the corpus are in the
historical data. There will be a mean average of link acquisition and decay
with "normal" sites in the index. Temporal data can be used to help
detect spammy sites participating in unnatural link building habits.
·
TrustRank: This method has
more than a few names, TrustRank being the Yahoo flavor. The concept revolves
around having "good neighbors". Research shows that good sites link
to good ones and vice versa.
(some snippets
taken from my post "Web Spam; the Definitive Guide")
I could spend hours
on each of these, but you get the idea. With many people are theorizing about
networks, anchor texts, etc... the larger picture often evades us. There are so
many ways that Google might be dealing with 'over optimization' that we're not talking about.
The last 18 months
or so we have seen a lot of changes including the spate of unnatural-linking messages that went out. Again, Penguin or not doesn't matter. What matters is
that Google is certainly looking harder at link spam, so you should be too.
It wouldn't hurt to
keep a tinfoil hat handy as well… Look no further than this Microsoft patent
that talks about spying on SEO forums. Between that and
the fact that SEOs write about their tactics far and wide, it's not exactly
hard for search engineers to see what we're up to.
How Are We Adapting in a Post-Penguin World?
What's it all mean? Well I haven't a
bloody clue. Anyone who says they've got it sorted, likely needs to take their
head out of a certain orifice.
What you should do
is become more knowledgeable in how search engines work and the history of Google. Operate from intelligence, not ignorance.
Have you considered the elements
outlined in this post when analyzing data and trying to figure out what's going
on? I know I didn't. It was researching this post that reminded me of the
myriad of various spam signals Google might look at.
Here's some of my thinking so far:
·
It really is a
non-optimized world: Don't try too hard
for that perfect title. Avoid obsessing over on-page ratios. You don't need
that exact match anchor all the time, in fact you don't even need a link (think named entities). In many ways,
less-is-more is the call of the day.
·
Keep a history: Be sure to always track everything. And when doing link profile or other
types of forensic audits, compare fresh and historic data (such as in
Majestic).
·
Watch on-site
links: From internal link
ratios to anchors and outbound links, they all matter. From spam signals to
trust scoring, they can potentially affect your site.
·
Faddish: Another interesting thing, how much it plays into things we know not,
was that Google might have an issue of the tactic du jour.
·
Watch your
profile: In the new age of SEO it likely pays to be tracking your link profiles. If something
malicious pops up, deal with it and make notes of dates and contact attempts.
·
On site: Hammer it and make
it squeaky clean. The harder links get, the more one needs to watch the
on-site. Schedule audits more frequently to watch for issues.
·
Topical-relevance: When looking at links think about topical-relevance. Are the links
coming from sites/pages that are overly diverse (and have weak authority)?
·
Link ratios: Watch for a low spread in anchor texts as well as total links vs.
referring domains (lower the better, it means less site-wide links generally).
·
Cleaning up: When possible look
at link profiles and clean up suspect links. And I wouldn't wait until you get
an unnatural linking message or tanked rankings.
We've seen a ton of data (this one is interesting) since this all
went down and while there are common elements, nothing is conclusive (again,
there have been a spate of updates). What is more important is to understand
what Google wants and where they're headed. It's just another step in the long
road of search evolution, don't get caught up in the names.
Taking the easy way out rarely works
for success in life. SEO is no different.
Understand how a threshold might be
used. This thing of ours is like the old story of the two of us in the
woods when a hungry bear appears. I don't have to outrun the bear; just you.
Ensure your strategy is within a safe threshold and it should work out just
fine.
It's About Time
To close out there is the one part of
this that keeps nagging; history. If you've been squashed by the recent updates
(including Penguin) it may not entirely be about recent activities. There is a
sense that Google is indeed keeping a history and that this may be playing into
the large scheme of things.
Some of the most interesting Google
patents were the series on historical elements. Be sure to go back and read
some of these older posts:
·
Spam detection using historical
factors
·
Link builders guide to Historical
rankng factors
·
Understanding historical ranking
factors for content creation/management plans
·
Do link spammers leave footprints?
Sure, they're 3-4 years old, but it
is probably some of the more telling parts of the mindset change many in the
world of SEO need.
Source Article - SearchEngineWatch.com