semantic web

Google the Gate Keeper

A reminder that Google doesn’t really search “the web,” just a relatively narrow slice of it. From Threat Level:

The homepage of Pirate Bay disappeared from Google’s search results Friday, after Google allegedly received a DMCA takedown notice targeting the site.

The move is unexpected because, while the Pirate Bay is rife with pirated material, the site’s spare landing page contains no content to speak of — just links, a logo and a search box. By law, DMCA notices are targeted to specific infringing content.

I increasingly hear the students I work with (and a good deal of the faculty) use Google as a synonym for the web, much as how Kleenex is has become another word for tissue. It’s similar with Googling and  surfing (e.g. one might say “I was Googling David Bowie last night” when they were actually surfing Bowie fansites with little or no use of Google). Of course, no such equivalence exists — Google is a gated community. There is a boundary drawn between the regions of the web that Google (and other major search engines) will index, and the regions they won’t. What they don’t index, we likely don’t see.

That there is proprietary decision-making behind what information is — and is not — indexed, and that we — as a society — are increasingly loosing our ability to even recognize this indexing is a cause for great concern. Expecting Google to make their gate keeping an open and transparent process is ludicrous. Google is for profit, and dreaming up a contorted “free-market” rational for how it could be in Google’s best business interest to be transparent is a dead end. Google makes billions by controlling access to information, and they aren’t going to give that up. Why should they?

But what if there were non-profit, or even for profit, search engines that focused on identifying and indexing all the information Google (et al) isn’t? At a minimum, having such options might at least make people conscious of the fact that the web is bigger than Google suggests.

Berners-Lee on the “insidious” quality of vertical integration

Tim Berners-Lee, inventor of the World Wide Web, on the “insidious” quality of vertical integration:

The Web’s infrastructure can be thought of as composed of four horizontal layers; from bottom to top, they are the transmission medium, the computer hardware, the software, and the content. … I am more concerned about companies trying to take a vertical slice through the layers than creating a monopoly in any one layer. A monopoly is more straight forward; people can see it and feel it, and consumers and regulators can “just say no.” But vertical integration — for example, between the medium and content — affects the quality of information and can be more insidious.

— Weaving the Web, p130

outtake: governing the semantic web

Another outtake from the article Cindi Katz and I have been writing on the relationship between U.S. children and young people and their technological environments in the post-9/11 security state:

In their pursuit of both national and homeland security as well as the creation of new markets, the state and corporations are engaging the free-flowing horizontal communication which takes place in cyberspace, with the aim of reworking its architecture into a Semantic Web. The Semantic Web has been primarily conceptualized and developed by Tim Berners-Lee, the computer scientist who invented the World Wide Web. The Semantic Web can be understood as a sustained indexing of cyberspace, whereby information is semantically coded in order to be processed and interpreted, across various platforms and programs, through “automated” analysis. To semantically code and then circulate this data, Web ontologies are developed and adopted which rationalize and categorically conform information in order to establish relationships. Most prominent of these ontologies is the Web Ontology Language (OWL). As cyberspace is semantically codified, both the state and corporations have moved to develop methodologies to utilize the Semantic Web for more efficient surveillance – often framed as “data mining” or “market research.” Particularly notable has been the Department of Homeland Security’s “Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement” (ADVISE) program, defined as, “a data mining tool under development intended to help the Department of Homeland Security analyze large amounts of information. It is designed to allow an analyst to search for patterns in data—such as relationships among people, organizations, and events—and to produce visual representations of these patterns” (United States Government Accountability Office 2007). In reformatting cyberspace, the Semantic Web makes information more locative, circulatory and integrable. In doing so, this reformatting enhances cyberspatial navigation but also erodes the qualities of cyberspace that have functioned to protect the privacy and anonymity of cyber-surfers.

NOTE This “outtake” and its relation to the larger paper, from which it was eventually cut, were inspired by two earlier posts: “what they want is an automatic feed” and (young) person of interest.

(young) person of interest

What would it look like if we were to situate young people in the growing semantic web? A 2007 report from the U.S. Government Accountability Office (GAO) took a look at some of the data mining programs currently underway at the Department of Homeland Security. In their report, GAO offer a “Typical Semantic Graph” which represents the “data relationships and linkages” of a particular “person of interest” which can now be generated through a process called “semantic graphing.” GAO’s report defines semantic graphing as “a data modeling technique that uses a combination of ‘nodes,’ representing specific entities, and connecting lines, representing the relationships among them.”

So what might a “Typical Semantic Graph” for a young person of interest look like? Part work, part play – here is GAO’s “typical semantic graph for a person of interest” compared to my “typical semantic graph for a young person of interest”:

(young) Person of Interest :: GAO & GTD

GTD + WordPress + Akismet