Category Archives: search engines

Digital Marketing: What I read & pay attention to

(Last updated: 11-May-2011)

I recently completed a job application that wanted to know what websites I read & pay attention to. Here are some of the websites & web pages I provided:

John Battelle’s Searchblog – The man who wrote the book on Search. His blog on the online industry is continually sharp & thought-provoking. Battelle writes really well. I don’t always agree with what he might be saying but that is a healthy thing.

Wired – Do you ever wish there was a daily newspaper just about tech? This is probably the closest thing to it.

Search Engine Land – It is kind of an unfortunate name. But they have a number of good search marketing writers & Danny Sullivan knows search better than anyone I know of.

Google blogs – There are a lot of these so I’m just going to list them:

Conversion Rate Experts – the leaders in conversion rate optimisation. Particularly worth paying attention to for their case studies which give useful insight into their conversion optimisation process.

SEOmoz Daily SEO Blog – While some of what SEOmoz says publicly is often couched in caveats e.g. “this may mean”, “this might suggest”, the need for this is partly driven by their highly visible position in the SEO industry & the trouble with stating absolutes. Their blog is essential reading for SEO news & tactics.

NYTimes – US-centric news but it is better than, say, Fox. Their Magazine section occasionally does great long form pieces.  The Critics Best Of videos are good too.

Articles

Video

UPDATED:
Occam’s Razor by Avinash Kaushik – Avinash is a Google Analytics Evangelist but he also seems to do things like consulting/speaking on web analytics/writing books on web analytics. He is a guru on web analytics & his blog posts over the years have been critical to educating me about Google Analytics, metrics to ignore & metrics to pay attention to.

An interesting answer to an interesting question

Q: “It used to be that I could limit what strangers saw about me to almost nothing. I could not show my profile picture, not allow them to “poke” or message me, certainly not allow them to view my profile page. Now, even my interests have to be public information. Why can’t I control my own information anymore?”

Answer from Elliot Schrage, vice president for public policy at Facebook: “Joining Facebook is a conscious choice by vast numbers of people who have stepped forward deliberately and intentionally to connect and share. We study user activity. We’ve found that a few fields of information need to be shared to facilitate the kind of experience people come to Facebook to have. That’s why we require the following fields to be public: name, profile photo (if people choose to have one), gender, connections (again, if people choose to make them), and user ID number. Facebook provides a less satisfying experience for people who choose not to post a photo or make connections with friends or interests. But, other than name and gender, nothing requires them to complete these fields or share information they do not want to share. If you’re not comfortable sharing, don’t.”

Link: Facebook Executive Answers Reader Questions [nytimes.com]

The anatomy of Google

Only nine years late, via Speaking Freely, I am reading the paper ‘The Anatomy of a Large-Scale Hypertextual Web Search Engine‘ (a.k.a Google) by Sergey Brin and Larry Page.

I liked this bit about the Google crawler interrupting an online game:

It turns out that running a crawler which connects to more than half a million servers, and generates tens of millions of log entries generates a fair amount of email and phone calls. Because of the vast number of people coming on line, there are always those who do not know what a crawler is, because this is the first one they have seen. Almost daily, we receive an email something like, “Wow, you looked at a lot of pages from my web site. How did you like it?” There are also some people who do not know about the robots exclusion protocol, and think their page should be protected from indexing by a statement like, “This page is copyrighted and should not be indexed”, which needless to say is difficult for web crawlers to understand. Also, because of the huge amount of data involved, unexpected things will happen. For example, our system tried to crawl an online game. This resulted in lots of garbage messages in the middle of their game! It turns out this was an easy problem to fix. But this problem had not come up until we had downloaded tens of millions of pages. Because of the immense variation in web pages and servers, it is virtually impossible to test a crawler without running it on large part of the Internet. Invariably, there are hundreds of obscure problems which may only occur on one page out of the whole web and cause the crawler to crash, or worse, cause unpredictable or incorrect behavior. Systems which access large parts of the Internet need to be designed to be very robust and carefully tested. Since large complex systems such as crawlers will invariably cause problems, there needs to be significant resources devoted to reading the email and solving these problems as they come up.

Source: ‘The Anatomy of a Large-Scale Hypertextual Web Search Engine‘ Brin/Page, p. 10

It is also interesting to note the beginnings of Google Book Search in the acknowledgements:

The research described here was conducted as part of the Stanford Integrated Digital Library Project, supported by the National Science Foundation under Cooperative Agreement IRI-9411306. Funding for this cooperative agreement is also provided by DARPA and NASA, and by Interval Research, and the industrial partners of the Stanford Digital Libraries Project.

Source: ‘The Anatomy of a Large-Scale Hypertextual Web Search Engine‘ Brin/Page, p. 16

Note also their thoughts on the relationship of search engines and advertising:

Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is “The Effect of Cellular Phone Use Upon Driver Attention”, a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

Source: ‘The Anatomy of a Large-Scale Hypertextual Web Search Engine‘ Brin/Page, p. 18