Thursday, January 18, 2007

how to verify ownership to Yahoo on blog server

The mischief caused by 404.php template is not a problem for authentication, when I stumbled into Yahoo!'s Site Explorer. To claim your site, you were instructed to place a special file with special content under / of the site. This way, one GET will do and have no problem with customized 404 pages. The latter is pretty common in use for sites managed by a CMS (Content Management System) or blogging servers. I wonder how come the smart engineers at Google decide to do two GET instead...

Once I added the required file on my blog server's / and clicked to continue, the next page asked me to keep the file there for 24 hours, till Yahoo's bots take their sweet time to crawl, literally! To a sharp contrast, Google's webmaster tool authenticates site ownership real-time, and sucks in sitemaps real-time too!

Placing a special file under / is the only way to authenticate your ownership on Yahoo Site Explorer. For millions of hosted sites (blogs or otherwise) whereas content owners don't have access to the /, they'd be out of luck. For now, at least. Hopefully when Yahoo! Site Explorer comes out of beta, they'd come up with a way to authenticate sites whose content owner have content-level access (META tokens, maybe?) instead of file-level access.

In comparison with Google's web master tools, Yahoo's site explorer is so spartan right now. Its own blog hasn't been updated for a few months now. I guess it is real beta then.


jackOfAllTrades said...

WOW, 48-hours later, no sign of Yahoo bot fetching the little magic file for site ownership authentication.

On Yahoo's Site Explorer comments line, it considers requests for a speedy verification resolved, since "an employee has reviewed the request".
Is this developers' way of saying "what do stupid users know!"

jackOfAllTrades said...

Two and half a day later, instead of instantly or within 24 hours as required, Yahoo! Site Explorer finally came and grabed the special file for site ownership authentication:
-rw-r--r-- 1 root root 33 Jan 18 15:38 y_key_30638354d55a24.html - - [21/Jan/2007:06:18:08 -0500] "GET /y_key_30638354d55a24.html HTTP/1.1" 200 1166 124 258 "-" "Yahoo! Slurp/Site Explorer"

Funny thing is the same bot come to my site more than once a day since the first day the site got open to the Internet. So, it is not really about saving bandwidth or lack of computing power. This reminded me of one of my former colleagues. The then manager of a data warehouse team eloquently stated that his team didn't have enough (brain) cycles, when asked to switch to use gzip in a handful of shell scripts wherein the compression mechanism was already parameterized.