-
Transparent Search
last modified May 19 by tomlowenhaupt
A trustable search engine is an essential component of an effective civic life. All sectors of society benefit when an Internet search tool accurately presents available resources. This page focuses on developing such search tools for the .nyc TLD.
----------------------------------------------------------------------------------------------------------------------------------------------------------
Finding Search
(Photo courtesy Library of Congress.)
Designing Transparency
On Search Complexity
A real-world web search engine, such as Google's or Microsoft's, has literally thousands or tens of thousands of ranking signals, updated or introduced multiple times during a single day. Additionally, the permutations are near infinite, as the major search engines are constantly running concurrent experiments in an effort to dynamically tune the system with real user queries and user happiness. Moreover, modern search engines such as Google go so far as to customize and personalize each result on the fly for each individual user, meaning that there is no canonical ranking to begin with. And all of this is predicated on top of a very unpredictable and continuously changing corpus of crawled data, with more and more of it arriving in near real-time.
Given all of that as context, I wouldn't even know where to begin to try and make the ranking process visible to the user. There is no one algorithm, and no one corpus, no one frozen point in time, no way to even explain the ranking process to lay-people to begin with. A worthy challenge, but I'm not sure how practically it could be done.
...a search expert
Given all that complexity, it's not going to be easy to create a transparent search engine. But given our limited goal - to serve the civic needs of New York City - perhaps a more focused engine is practicable. And let Google and its competitors take care of the rest.
-----------------
On Custom Search
What will the effect be of cse's, custom search engines, on popularity and visit volumes of TLD domains? You are probably familiar with google.com/cse, I 'built' several search engines myself, also with rollyo, and I, as well as colleagues, are very happy with the new search engines - producing better results than 'google general' because I make subselections of e.g. 50 really relevant sites.
NL, my country, is only small, things are difficult to judge, whether and how soon cse's will be common good. But for Germany I just found: http://www.suchmaschinen-datenbank.de/ - a databank with search engine for Germany depending on theme etc. (or whether you want words or pictures - I find far more and more relevant pictures on bing than google) .
-----------------
In Search, But You May Not Find, Adam Raff’s December 2009 New York Times Op-Ed, he makes a case about the growing importance of neutral search engines.
We differ with Adam in two ways: he uses the term Search Neutrality to describe what we see as the need for "transparent"search. Probably mostly semantics, but from a development perspective, we suspect that transparent search will prove a far easier metric than Mr. Raff’s search neutrality. And where Adam's focus is on Google's purported corporate shenanigans' impact on the existence of a level business playing field, ours centers on the impact the Google (and other) search engine’s lack of transparency might have on civic affairs.
Examples of the potential impact of an opaque search engine on the city's governance realm include:
- Imagine for a moment what will happen when Google “winner$” begin running for public office. How are we to trust its opaque search algorithm during the rough and tumble of an election campaign? Then the relationship between link and ballot voting would become apparent. Poisoned elections might result.
- And on a more mundane level, in the coming years we’re likely to see Google confronting city zoning regulations for a variance to build inspirational office space for its expanding enterprises. How would Google rank the activities of organizations leading the opposition? Would individual opponents of such actions be able to locate the organizing opposition? Or would opposition organizations be custom coded to land on page 13? (See sidebar for the difficulty of assessing responsibility in this area.)
Several solutions are possible:
- The existing engines might find it difficult to operate under a darkening shadow of suspicion about the fairness of their secret algorithms, forcing them to move toward transparency.
- With the growing awareness of the importance of transparency, particularly in the civic realm, search transparency evaluation and rating firms might arise - perhaps like a Moody's or S&P.
- Or a transparent search engine might be developed, possibly from the smoldering code of Wikia Search. What might a transparent "search.nyc" look like on the inside and out?
Components of a Transparent search.nyc Engine
The main components of the Wikia Search effort (from Wikipedia) were: a web search engine, a wiki to host so-called mini-articles, and a social network service. The following begins a review of those and other components of an effective civic search engine.
Mini-Articles
A prominent feature of the Wikia search engine was to be human-written mini articles. These were to be short articles about the topics given by their title. They were hosted by a Wikia wiki.
Whenever a search query is issued with mini prefix, e.g., mini sports, the results page presented the wiki mini article with a name that matched the search query. If no matching article existed, the searcher was given the opportunity to write a new one.
Social Networking Features
The user interface was tied in with a social network application, called foowi. Users could create an account for the application and fill in a profile. The system linked the wiki login to the social network login. Profile functions include Status, Basic profile, White board, Albums, Friends, Personal and Work.
Algorithm Matrix
Another component might be an algorithm matrix where the various rules that facilitate the search and ranking service, and their relationship, are presented for review and improvement.
Education
Educating the public about the quality and role of search.
Civic Hoarding
Ed Teal, a candidate for Marshall County Sheriff in Guntersville, Alabama, has filed a lawsuit against his opponent, incumbent Scott Wall’s, chief deputy, Doug Gibbs. In Teal’s lawsuit in U.S. Federal Court, he claims that in January he went to register a website for his campaign and discovered that almost all of the web addresses he could use, for example www.edtealforsheriff.com, had already been registered. Read the article.Transparent Search Resources
- Hits Algorithm
- Google Custom
- Opensearch.org
- Clay Shirky on Algorithmic Authority
- New Course: Search Engine Society