Untrusted content, nofollow, etc.
Technology, Weblogs August 26th, 2004 |Phil Ringnalda pointed to an idea that Ian Hickson just tossed out while brainstorming ways to battle the ever-increasing issue of comment spam.
I’m thinking that HTML should have an element that basically says “content within this section may contain links from external sources; just because they are here does not mean we are endorsing them” which Google could then use to block Google rank whoring. I know a bunch of people being affected by Web log spam would jump at that chance to use this element if it was put into a spec.
Personally, I’d love to be able to wrap the comments section of my individual entry pages in something like this — and actually, it reminds me a lot of a technique I used to use when I had my website running on my own webserver. At the time, I had a good number of pages that weren’t part of the weblog, so rather than using MovableType’s built-in search engine, I used the Fluid Dynamics Search Engine.
FDSE is a very solid system, and one of the things I liked was an extra FDSE-specific tag that allowed an author to designate sections of a page that the search engine would ignore when performing its page scan. In addition to respecting the standard meta tags of index, noindex, follow and nofollow for a full page, FDSE also allows you to use those tags within HTML comments to section off areas of a page that should be treated differently from the page as a whole.
For instance, on my individual entry archive pages, the only real important content as far as a search engine is concerned is the entry itself. As the sidebar in my design is repeated on every page on the site, there’s really no great reason for a search engine to include that text in the database for every page, so I would wrap the entire sidebar inside a noindex, nofollow declaration.
I’d also do the same for things like the TrackBack section headers that appear on every page. As they are repeated on every single archive page, trying to search for an actual discussion on TrackBack is nearly impossible — but when I was using the FDSE and hid that section header from the search engine, it was very easy for me find discussions about TrackBack, as FDSE was only indexing the actual content of each page, rather than every little bit of text that the page contained.
I’ve wished for a long time that Google either supported a way to do the same thing, or just adopted FDSE’s method. According to FDSE’s author, he submitted his technique to Google as a suggestion quite a few years ago, but nothing more was ever heard about that.
Maybe Ian’s suggestion will get something moving in this direction again. Here’s hoping, at least.
iTunes: “Never Say Never (Hot Tracks)” by Romeo Void from the album Edge, The Level 1 (1995, 5:47).
[See also: Prior art for ‘nofollow’ blocking | Search improvements | Help search engines index your site | Transitioned | Getting in Google’s good graces ]






August 26th, 2004 at 12:36 pm
Google hinting
Phil Ringnalda and Ian Hickson are thinking about extending HTML with search engine hinting tags. I totally agree with this, as I mentioned before. The questions are whether the attribute should be on a block or on individual links, whether we should h…
August 29th, 2004 at 3:43 pm
funny. I’d swear I followed a link off of your site in the last month, and came upon somone’s blog that had a PHP solution for exactly this issue. So, naturally, I can’t find it now. Essentially, he was implementing a ‘user-agent’ block within his .css tags, as ‘searchable’ and ‘non-searchable’.
Naturally, now I can’t find it, so here’s some Googled links:
January 17th, 2005 at 4:28 pm
Its a great idea.Put it in blogsnow just now: http://www.blogsnow.com/nofollow.html
January 18th, 2005 at 10:37 pm
rel=“nofollow” : Massive weblog anti-spam initiative
Six Apart has announced in co-operation with Google, Yahoo, MSN Search and other blog vendors a massive joint anti-spam initiative based on the HTML link type rel=’nofollow’.