Search Engine Watch speculates that Google might start using WHOIS information in their ranking of web pages. A recent patent application filed by Google, Information retrieval based on historical data, hints at the possible use of domain information in the ranking of results:
- Domain registration could be used as a way to determine the “document inception date,” or an age associated with a page.
- The expiration date of a domain could indicate the “legitimacy” of a document, with short term registrations indicating more questionable pages.
- Changes, and the frequency of changes, in registration information, including contact information, hosting companies, and more, could also raise warning flags.
- Information about name servers, and other sites on those name servers could also play a role in a ranking score:
A “good” name server may have a mix of different domains from different registrars and have a history of hosting those domains, while a “bad” name server might host mainly pornography or doorway domains, domains with commercial words (a common indicator of spam), or primarily bulk domains from a single registrar, or might be brand new.
There is some debate as to whether this is a proper use of WHOIS domain information (and general privacy concerns about the availability of WHOIS data persist), but to me this discussion points to the broader realization that Google’s algorithms are not some kind of magical and neutral nugget of code absent of biases.
Relying on domain information as described above reveals how biases are embedded in search engine algorithms. The relationship between a page’s expiration date and its legitimacy, or labelling a server as “bad” if it mainly hosts pornography might make logical sense, but they remain subjective decisions – biases. These biases are then programmed into the algorithm. I can’t repeat it enough: search engine algorithms are not neutral. Indeed all computer systems contain biases [PDF] with particular political and ethical consequences.