A trustable search engine is an essential component of a competitive business sector, an effective civil society, and an informed public. Everyone benefits when an Internet search accurately presents all available resources. This page explores developing city-friendly search features.
(Photo courtesy Library of Congress.)
On Search Complexity
"A real-world web search engine, such as Google's or Microsoft's, has literally thousands or tens of thousands of ranking signals, updated or introduced multiple times during a single day.
Additionally, the permutations are near infinite, as the major search engines are constantly running concurrent experiments in an effort to dynamically tune the system with real user queries and user happiness. Moreover, modern search engines such as Google go so far as to customize and personalize each result on the fly for each individual user, meaning that there is no canonical ranking to begin with. And all of this is predicated on top of a very unpredictable and continuously changing corpus of crawled data, with more and more of it arriving in near real-time.
Given all of that as context, I wouldn't even know where to begin to try and make the ranking process visible to the user. There is no one algorithm, and no one corpus, no one frozen point in time, no way to even explain the ranking process to lay-people to begin with. A worthy challenge, but I'm not sure how practically it could be done."...a (wants to be anonymous) search expert
Given all that complexity and the success and wealth advancing the extant search model, it's not going to be easy to create a transparent search engine. But perhaps a limited goal - to serve the civic needs of New York City - with a more focused civic-engine is practicable. And let the Googles compete for a broader market.
On Custom Search
"What will the effect be of cse's (custom search engines) on popularity and visit volumes of TLD domains? You are probably familiar with google.com/cse, I 'built' several search engines myself, also with rollyo, and I, as well as colleagues, are very happy with the new search engines - producing better results than 'google general' because I make subselections of e.g. 50 really relevant sites.
NL, my country, is only small, things are difficult to judge, whether and how soon cse's will be common good. But for Germany I just found: http://www.suchmaschinen-datenbank.de/ - a databank with search engine for Germany depending on theme etc. (or whether you want words or pictures - I find far more and more relevant pictures on bing than google)."
Directories Vs Search Engines
- Categorized by webmasters or directory staff. Sometimes staff are subject experts; sometimes not.
- Best for browsing purposes but can be searched.
- Require significant human effort to develop and maintain.
- Examples of popular directories include Open Directory Project and Librarians Index.
- Examples of subject-specific directories include Hardin MD: Medical Information.
- The value of directories: 1. limited peer review and 2. serendipitous discovery while browsing through categories.
- Automated programs, called robots, spiders, worms, etc. search and index web sites.
- Some index words in the title, URL, introductory paragraphs, or full-text of all documents on a web site. Some use a combination of these words and phrases, all of which are entered into the search engine's database.
- The "spiders" search in different ways - and different parts of the Internet. The same search with different engines will yield different results.
- Examples of popular search engines include Google, Ask, and AllTheWeb.
- The value of search engines: 1. BIG! and 2. full-text searching.
Who decides what you see when you use Google or another search engine?
Search and Civic LifeIn a December 2009 New York Times Op-Ed, Search, But You May Not Find, Adam Raff criticized Google for unfairly ranking his firm in its search listing and called for neutral search engines. He views a neutral engine as one providing an equal opportunity for all those offering a product or service to be listed at the top of a search results page.
Raff's gripe is with purported Google shenanigans that put their product at the top a search result page and his further down. Being further down on a search results page means fewer customers for his firm.
While we agree with Mr. Raff's goal of "search neutrality," we view the question from a regulatory perspective and believe "transparent search" would prove a far easier to enforce.While a level field is essential, our concern extends from business to the impact Google (and other) search engines' opacity might have on civic affairs.
Here are some examples of situations where an opaque search engine might impact the city's governance realm:
- Listing Bias 1 - Imagine the operator of a dominant search engine confronting a city's zoning regulations when seeking to build an "inspirational" office space for its new campus. Might opposing views to the proposed development be coded (or be suspected of being coded) to appear low in a results listing, where the public would be unlikely to locate them?
- The On Search Complexity sidebar presents the difficulty of assessing responsibility in a situation like this.
- See this paper by Epstein and Robertson on influencing elections via search manipulation.
- Listing Bias 2 - A November 2012 truth-out.org article about a Google search for "Robert Howarth" raises a significant issue about the standards of what might be called Listing Bias.
As the intermediary between individual Net users and the information they desire, search performs a classic role of journalism. The truth-out.org article describes an instance where discussion about an important public policy issue, the safety of Hydrological Fracturing, or fracking, is influenced by the design and presentation of an internet search.
In this instance, ANGA, a trade organization supportive of fracking purchased a link on the Google search results page (see below) that offered a contrary view to that of the search target, Robert Howarth, an expert in the field. In its presentation of the search results, Google presented the trade organization's purchased link at the top the results listing. That purchased link is also highlighted by a grey background color. And above the highlighted "Experts Speak on Howarth" link is small print it states "Ad related to Robert Howarth."
The above Google search for "Robert Howarth" was made November 23, 2012.
Such placement design needs to be reviewed under fairness rules that advance journalism standards into search. Here in the U.S. the First Amendment - "Congress shall make no law ... abridging the freedom of speech, or of the press..." - hinders regulation and industry standards seem the best medicine.
But it might be argued that there are parallels between the impact of technology on the interpretation of the First Amendment, perhaps with parallels drawn with the Second Amendment. Americans are all too familiar with a decades long controversy about that Amendment's guaranteeing a citizen's right to bare arms: Did the Founding Fathers intend that citizens be allowed to own and use powerful automatic weapons? The corresponding First Amendment question might be: Was big data, big money, and search imagined by the Founding Fathers?
So for the immediate future, regulation of search journalism is unlikely. But with the creation of a transparent "search.nyc" vital to our city's effective operation, a trusted entity must be identified to oversee its development.
- Mass Algorithmic Filtering - Another limitation society faces from the opaque search engines arises from algorithmic filtering or AF. In simplest terms, AF has firms setting up filters that feed you information based on the code in a computer program or algorithm. Often these filters sort out vital information because its thought it might offend the recipient. But because the code running these filters is not available to end users, they get a skewed or barren world view.
- Trust - Imagine situations where candidates for public office have links to search engine firms. Will voters trust an opaque search algorithm during the rough and tumble of an election campaign? In this instance the relationship between link and ballot voting would become apparent.
Approaches to Transparent Search
Is it possible that New Yorkers will assume responsibility for curating their city's Net presence?After all, the city's 8,200,000 residents possess the detailed and current knowledge of their city's people, places, and things. And the technology and governance tools to create a crowd sourced resource are available - witness the astounding collaboration that is Wikipedia. But tuning the tools, engaging, and orchestrating the citizenry to collaborate in such an endeavor would require a spark to civic awareness as to the value our cumulative knowledge holds.
Another possibility is a city-friendly, hybrid directory-search engine. A starting point for this might be the smoldering code of Wikia Search - a short-lived open source search engine launched in 2004. Wikia had 3 main components: a web search engine, a wiki to host so-called mini-articles, and a social network service.
- Mini-Articles - A prominent feature of the Wikia search engine was to be human-written mini articles. These were to be short articles about the topics identified by their title and hosted by a Wikia wiki. Whenever a search query is issued with mini prefix, e.g., mini sports, the results page presented the wiki mini article with a name that matched the search query. If no matching article existed, the searcher was given the opportunity to write a new one. 8,000,000 editors of a wiki-search engine.
- Social Networking Features - The user interface in Wikia was tied in with a social network application, called Foowi. Users could create an account for the application and fill in a profile. The system linked the wiki login to the social network login. Profile functions include Status, Basic profile, White board, Albums, Friends, Personal and Work.
- Algorithm Matrix - Another component might be an algorithm matrix where the various rules that facilitate the search and ranking service, and their relationship, are presented for review and improvement.
- Education - Educating the public about the role search plays in accessing our city's resources and how that affects our economic well being and the quality of life.
- Existing search engines (Google, Bing...) might find it difficult to operate under a shadow of suspicion about the fairness of their secret algorithms, forcing them to move toward transparency.
- Existing search firms might create a 'civic' search engine to address the civic concerns of the city.
- Or perhaps the growing awareness of the importance of transparency, particularly in the civic realm, search transparency rating firms might arise - perhaps like a Moody's or Standard & Poor's - pushing search providers in the direction of transparency and accountability.
Beyond the transparency benefits that homegrown search offers, economic benefit arises as search based advertising revenue stays local.
While search engines will be needed for the foreseeable future, we expect a paradigm shift with the arrival of multiple new Top Level Domains beginning in 2014. The shift will arise from a more intuitive domain name system and its impact on our ability to locate an Internet resource.
We see intuitive names as a key advantage that will evolve with the activation of the .nyc TLD. The potential impact of intuition becomes clear when one looks at the dotNeighborhood names. But before going into that let's take in a short history of names.
Family or surnames are used by most of our planet's residents. The diversity of names varies by nation, culture, and ethnic group. For example, in South Korea more than 1/2 the population have the Kim, Lee, and Park surnames, with only about 250 names used by the nations' 50 million residents (see Wikipedia.) This differs markedly from Germany where there are as many as 1,000,000 surnames for is 80 million residents.
Internet surnames or TLDs were first decided upon in the mid 1980s. Less than a dozen were deemed satisfactory for what was thought of a an experimental system. Later in the 1980's more than 200 additional names were added, one per country. In 2013 ICANN began issuing new TLDs that should quadruple the pool of available TLDs (the current list is here). And in 2018 a new TLD application process will open again with 10s of thousands of TLDs likely to be sought. One might say we're moving from a Korean to a German name structure.
Getting back to the dotNeighborhood names...
With some exceptions (.org, and .net the biggest) the .com TLD is where most U.S. domain names can be found. And with so many having the same last name - .com - a dependence arose for "search engines" to help people locate a desired website or computer. The .com world might be thought of as like a Kim in Korea, but far more severe. The only reasonable route to find a site with a .com name requires using a search engine.The flood of "meta" sites about sites has caused difficulty in locating even simple things by name. Intuition has been destroyed by the .com ocean.
Transparent Search Resources
- Wikia Search - A history lesson in transparent search.
- The Structured Search - This 80 minute Google video touches on Freebase (20 million canonical terms) and its compound value types.
- YaCy - Imagine if, rather than relying on the proprietary software of a large professional search engine operator, your search engine was run by many private computers which aren't under the control of any one company or individual. That's what YaCy does. It is a free search engine that anyone can use to build a search portal for their intranet or to help search the public Internet. "When contributing to the world-wide peer network, the scale of YaCy is limited only by the number of users in the world and can index billions of web pages. It is fully decentralized, all users of the search engine network are equal, the network does not store user search requests and it is not possible for anyone to censor the content of the shared index."
- Directories vs Search - For the pros and cons of search vs directories, see these articles from Dartmouth and DirectoryOne. See Directories vs Search in sidebar for more on this.
- Google Antitrust Case - Proposed Remedies for Search Bias..., May 2012, Journal of Internet Law.
- Lead Generators foil Search - A New York Times article on Google's difficulty creating reliable search.
- Clay Shirky on Algorithmic Authority - On trusting community consensus.
- New Course: Search Engine Society
- Hits Algorithm
- Google Custom
- Datacollaboratives - On ways to share data while considering privacy needs (2015)