thaxis: December 2013

Tuesday, December 31, 2013

The End of Stupidity

That the links between web pages, rather than the words on the pages, are a good guide to quality (roughly, the more links a page has, the better it is) was, as we've discussed, the key insight that propelled Google into the limelight on the Web. But this makes the Trivial Content objections all the more puzzling: certainly, there are lots of quality Web pages on the Web, and the success of Google seems to lie in the confirmation that, mostly, it finds them for us. And it does, much of the time. But the Achilles Heel of Google's ranking system--ordering the results of a search with the best first--is in the same insight that made it so popular.

Popular. That's the Achilles Heel. Simply put, the results on the first page of a Google search are the ones everyone else on the Web thinks are "good." But we don't know anything about all those other Web users, except that they liked the content (by linking to it) that we're now getting. To make the point here, in the academic journal situation (which inspired PageRank, remember), we know lots about the authors of the articles. We know for instance, that if Person A references Article Z, written by Person B, that both A and B are published authors in peer reviewed journals--they're experts. Hence if we collect all the references to Article Z by simply counting up the experts, we've got a good idea of the value of Z to the community of scholars who care about whatever Z's about (Z's topic). Since we're dealing with expert authors, counting them all up (recursively, but this is a detail) makes a ton of sense.

Skip to the Web, now, and first thing that goes is "expert." Who's to say why someone likes Web page Z? Who's to say, if Person A likes Z, and Person B likes Z, and so on, that anyone is an expert about "Z" at all? The Web case is different than the academic article case, then, because the users have no intrinsic connection to the content--they're not credentialed in any measurable way as experts or authorities on whatever Web page Z's about. Lots of anonymous folks like Z; that's what we know.

This feature of Web ranking has a number of consequences. One is that large, commercial sites tend to end up on the first page of Google results. If I query "hiking boots", I'm likely to see lots of Web sites for big stores trying to sell me hiking boots, like REI, or Timberland, or what have you. Of course, many Web users simply want big commercial web sites (and not, say, a blog about hiking boots, or an article about the history of hiking boots). Most people using the Web want what most people linking things on the Web want--this is just to say that what's popular is by and large what most people want (a truism). This is why Google works, and for the very same reason it's why it doesn't (when in fact it doesn't).

The next consequence is really a corollary of the Big Business consequence just noted. We can call this the "Dusty Books" objection, because it's about content that is exactly what you want, but isn't exactly the most popular content. This'll happen whenever you're looking for something that not a lot of people think about, or care about, or for whatever reason isn't popular enough to get a high ranking. It's a dusty book, in other words, like the book you find hidden away on a shelf of the library, last checked out three years ago, say, with dust on its cover from disuse. Only, that's what you're looking for, it turns out. You'll never see the dusty books in Google searches. This is the point; if you think about how Google works for a second, it's an obvious point too. Dusty books, by definition, aren't popular. They're the Web pages that you want, but never find, and there are lots of them. Think for another second about Google and you'll see the deeper problem, too: works so well most of the time for popular content means that some of the time it doesn't work at all. All that popular, unwanted content, is guaranteed to keep your dusty book hidden forever, back on the tenth or hundredth page of search results (and who looks at those?). Google, in other words, gives us what we want whenever it's what everyone else wants too; if it's just what you want, all those other people on the Web are now your enemies. They're hiding your dusty book from you.

But what could we want, that's not popular? Oh, lots of things. If I'm thinking of driving Highway 101 from Washington to California, say, I may want a big travel planner site telling me where the hotels are, or the camping grounds, or I may want a personal blog from someone who can write, who's actually driven the route, and can tell me all sorts of "expert" things that commercial Web sites don't bother with. This fellow's blog may or may not be popular, or linked to a big travel site, so it's a crap shoot if I find it with Google (even if it's popular as a homegrown blog, it isn't popular compared to Trip Advisor).

Faced with this scenario, many people take to a blog search engine like Google Blog Search, or Technorati, or Ice Rocket (Google Blog Search is probably the best). Only, the popularity-as-quality approach screws this up too, if you're looking for the expert opinion from the experience traveler of 101 who writes a personal and informative blog. Why? Because the most linked to stories about "Highway 101" are a litany of traffic accidents in local newspaper articles (somehow considered "blogs" by Google Blog Search). For instance, the second result for the query "driving Highway 101" to Google Blog Search is: "Woman killed on Highway 101 near Shelton." And lest we think this is a fluke, the third result is "Can toll lanes on Highway 101 help pay for Caltrain?", and the fourth is the helpful "Man Who Had Heart Attack in Highway 101 Crash Dies in Hospital." Clearly, what's popular to Google Blog Search has little to do with what our user interested in driving 101 has in mind. (Incidentally, the first result is three paragraphs from northwestopinions.com about the Christmas light show on 101 every year. At least "northwestopinions.com" might be a find.)

What's going on here? Well, you're getting what everyone links to, that's what. The more interesting question is how we've all managed to be in the dark about the limitations of the approach that we use day in and day out. Even more interesting: exactly how do you find good blogs about driving Highway 101 (or hiking boots, or lamp shades, or whatever)? Well, most people "Google around" still, and when they happen upon (in the search biz: "discover") an interesting site, or a portal site like Fodors or Trip Advisor, they save the URL or remember how to find it again. Mostly, they just miss dusty books, though.

To continue with the Dusty Books metaphor, and to see the problem in a different way, imagine the public library organized according to popularity, rather than expertise on the topic, or authority (books that are published are ipso facto books with authority). Someone wrote a definite history of 101, or the guide to driving 101, but it's so detailed that most people don't bother to read it. They get the lighter version, with the glossy cover. Ergo, the definite guide just disappeared from the library shelf. It's not even a dusty, seldom read book, it's simply not there anymore (this is akin to being on page 1323, say, of a Google search). This is swell for all those 101 posers and dilettantes, but for you, you're really looking for the full, 570 page exposition on 101. This is a ridiculous library, of course, because (we're all tempted to say, in chorus) what else is a library for, but to give you all the expertise and authoritative books on a topic? Who cares what's darned popular? Indeed. Returning then to the Web world, it's easy enough to see the limits of the content we're getting (and why, most of the time, we're all happy with it). Put it another way, the Web is skewed toward Trivial Content--every time what's popular trumps what's substantive, you get the popular. (To be sure, when what's popular is also substantive--say, because "popular" expositions of Quantum Mechanics are those written by Scientific American writers, or MIT professors--there's no problem.)

But is this why Google is making us stupid? Well, sort of, yes. It's easier to see with something like "politics" or "economics", say. If Web 2.0 liberated millions of people to write about politics, and Google simply delivers the most popular pages on this topic for us, then generally speaking all the "hard" discussions are going to fall off of the first page of a Google search. "Popular politics" on the Web isn't William Jennings Bryan, it's usually a lot of surface buzz and griping and polarization. Good versus evil. Good guys, bad guys. Doomsday predictions and everything else that crowds seize upon. True, large media sites like the New York Times will pop up on the first page of a query about "health care crisis." This is a consequence of popularity too (same reason that Trip Advisor shows up with hotel prices on your Highway 101 search). But if you're looking for interesting, informed opinions at their in the public (say, from good bloggers or writers), you don't care about the NYT anyway. Since Google doesn't care about the quality of an article, whatever has shock value is likely to be what you get for all the rest. We might say here that, if Google isn't actively making us stupid for Trivial Content reasons alone, if we're already uninformed (or "stupid"), it's not helping us get out of this situation by directing us to the most thoughtful, quality discussions. It's up to us to keep looking around for it, full of hope, as it were. (And, if we don't know what to look for, we're likely to think the Google results are the thoughtful ones, which explains why half my friends in the programming world are now conspiracy theorists, too. Four years of learning to program a computer in "real" college, and their politics on the Web, and that's what you get. Alas.)

To sum this up, then, the full answer to the question we began with ("is Google making us stupid?") is something like, yes. While we didn't address all the reasons, we can blanket this with: it's a Crappy Medium with Lots of Distractions that tends to encourage reading Trivial Content. Mostly, then, it's not helping us become classically trained scholars, or better and more educated in the contemplative and thoughtful sense. I've chosen to focus mostly on Trivial Content in this piece, because of the three, if you're staying on the Web (and most of us will, me included), improving the quality of search results seems the most amenable to change. It takes, only, another revolution in search. While it's outside the scope of this article to get into details (and like Popper once said, you can't predict innovation, because if you could, you'd already have innovated), a few remarks on the broad direction of this revolution are in order, by way of closing.

Search.next()

Google's insight, remember, was that the links between Web pages, and not only the words in pages were good guides to quality. It's interesting to note here that both the method Google replaced (the old Alta Vista search approaches that looked at correlations between words on a page and your query words) and its PageRank method relay on majority rules calculations. In the old-style approach--what's called "term frequency - inverse document frequency or tf-idf calculation--the more frequent your query terms occur in the terms of the documents, the higher the rank they receive. Hence, "majority rules" equals word frequency. In the Google approach, as we've seen, "majority rules" equals link-to frequency. In either case, the exceptions or minorities are always ignored. This is why Google (or Alta Vista) has a tough time with low frequency situations like sarcasm: if I write that "the weather here is great, as usual" and it's Seattle in December, most human readers recognize this as sarcasm. But sarcasm isn't the norm, so mostly your query about great weather places in December will take you to Key West, or the Bahamas. More to the point, if I'm looking for blogs about how the weather sucks in Seattle in December, the really good, insightful blog with the sarcasm may not show up.

So interestingly the Google revolution kept the same basic idea, which is roughly that converting human discourse or writing into computation involves looking for the most-of-the-time cases and putting them first. Human language is trickier and more interesting and variegated than this approach, of course, which is the key to understanding what may be next in search. Intrinsic quality is a property of the way a document is written. Many computer scientists avoid this type of project, feeling it's too hard for computation, but in principle it's a syntactic property of language (and hence should be translated into computer code). Consider the following writing about, say, "famous writers who visited or lived in Big Sur, California."

Exhibit A

"I heard lots of really good writers go to Big Sur. This makes sense to me, because the ocean is so peaceful and the mountains would give them peace to write. Plus the weather is warm. I can imagine sitting on the beach with a notepad and writing the next great novel at Big Sur. And later my girlfriend and I would eat S'mores and build a fire. My girlfriend likes to camp, but she doesn't hike very much. So when I write she'd be at the camp maybe I don't know. Anyway I should look up all the writers who went there because there must be something to it."

What's wrong with Exhibit A? Nothing, really. It's just, well, trivial. It's Trivial Content. But why? Well, the author doesn't really say that much, and what he does say is general and vague. He doesn't seem to know much about Big Sur, except that it's located near the ocean and it's forested, and other common pieces of knowledge like that you can camp and hike there. He also doesn't seem to know many details (if any) about the writers who've spent time in Big Sur, or why they did. In short, it's a vague piece of writing that demonstrates no real knowledge of the topic. Enough of Exhibit A then.

Exhibit B

"BIG SUR, Calif. — The road to Big Sur is a narrow, winding one, with the Pacific Ocean on one side, spread out like blue glass, and a mountainside of redwood trees on the other.

The area spans 90 miles of the Central Coast, along Highway 1. Los Angeles is 300 miles south. San Francisco is 150 miles north. There are no train stations or airports nearby. Cell phone reception is limited. Gas and lodging are pricey."

"Venerated in books by late authors Henry Miller and Jack Kerouac, it's no wonder then that Big Sur continues to be a haven for writers, artists and musicians such as Alanis Morissette and the Red Hot Chili Peppers, all inspired by a hybrid landscape of mountains, beaches, birds and sea, plus bohemian inns and ultra-private homes."

"In the 1920s, American poet Robinson Jeffers meditated about Big Sur's "wine-hearted solitude, our mother the wilderness" in poems like "Bixby's Landing," about a stretch of land that became part of Highway 1 and the towering Bixby Bridge 13 miles south of Carmel. (Part of the highway near that bridge collapsed due to heavy rains this past spring, followed by a landslide nearby; the roadway reopened recently.)"

"Among literary figures, Miller probably has the strongest association with the area. "Big Sur has a climate all its own and a character all its own," he wrote in his 1957 autobiographical book "Big Sur and the Oranges of Hieronymus Bosch." "It is a region where extremes meet, a region where one is always conscious of weather, of space, of grandeur, and of eloquent silence."

Miller, famed for his explicit novel "Tropic of Cancer," lived and worked in Big Sur between 1944 and 1962, drawn to the stretch of coast's idyllic setting and a revolving cadre of creative, kind, hard-working residents."

What's better about Exhibit B? Well, it's specific. Qualitatively, the author (Solvej Schou, from the AP. The full story appears in the Huffington Post) has specific facts about Big Sur and about the writers who've spent time there. The paragraphs are full of details and discussion that would, presumably, be appreciated by anyone who queried about writers at Big Sur. But quantiatively, or we should say here syntactically, the paragraphs are different than Exhibit A too. Exhibit A is full of common nouns ("camp", "hike", "ocean", "writers") and it's relatively devoid of proper nouns that pick out specific places or people (or times, or dates). Also, there are no links going out of Exhibit A--not links to Exhibit A, but links from Exhibit A--to other content, which would embed the writing in a broader context and serve as an external check on its content. Syntactically, there's a "signature" in other words, that serves as a standard for judging Exhibit B superior to Exhibit A. Key point here is "syntactic", because computers process syntax--the actual characters and words written--and so the differences between the two examples are not only semantic, and meaningful only to human minds. In other words, there's a perfectly programmable, syntactic "check" on page quality, it seems, which is intrinsic to the Web page. (Even in the case of the links we mentioned in Exhibit B, they're outbound links from the document, and hence are intrinsic to the document as well.)

In closing, I'd like to make a few broadly philosophical comments about the terrain we've covered here with our discussion of intrinsic quality above. If you've spent time reading about "Web revolutions" and movements and fads (they're usually "revolutions") from thinkers like Shirky or any of a number of Web futurists, you're always led down the road toward democratization of content, and the "wisdom of crowds" type of ideas, that tend naturally to undervalue or ignore individual expertise in favor of large collaborative projects, where content quality emerges out of the cumulative efforts of a group. Whereas group-think is terrible in, say, entrepreneurial ventures (and at least in lip service is bad in large corporations), it's all the rage for the Web enthusiasts. I mentioned before that an iconoclast like Lanier calls this the "hive mind" mentality, where lots of individually irrelevant (if not mindless) workers collectively can move mountains, creating Wikipedia, or developing open source software like Linux. The Web ethos, in other words, doesn't seem too inviting for the philosophical themes introduced here: a verifiable check on document quality (even if not perfect, it separates tripe like Exhibit A from something worthy of reading like Exhibit B), and along with it some conceptual tip of the hat to actual expertise. It doesn't seem part of the Web culture, in other words, to insist that some blogs are made by experts on the topics they address, and many others are made by amateurs who have little insights, knowledge, or talents. It's a kind of Web Eliticism, in other words, and that seems very un-Web-like.

Only, it's not. Like with the example of Yelp, where a reviewer has a kind of "circumstantial" expertise if they've actually gone to the cafe in the Mission District and sat and had an Espresso and a Croissant, there's expertise and authority stamped all over the Web. In fact, if you think about it, often what makes the Web work is that we've imported the skills and talents and knowledge in the real world into the cyber realm. That's why Yelp works. And so the notion of "authority" and "expertise" we're dealing with here is relatively unproblematic. No one gripes that they don't like their car mechanic to be an "expert" for instance; rather, we're overjoyed when the person who fixes our ailing Volvo actually does have mechanical expertise--it saves us money, and helps assure a successful outcome. Likewise we don't read fiction in the New Yorker because we think it's a crap shoot if it's any better than someone could write, pulled off of the street outside our apartment. Not that New Yorker fiction is someone "better" in an objectionable, elitist way (or that the woman walking her dog out on the street couldn't be a fantastic short story writer), but only that the editors of the New Yorker should (we hope) have some taste for good fiction. And same goes for the editorial staff of the New York Times, or contributing writers to, say, Wired magazine.

We're accustomed to expecting quality in the real world, in other words, and so there's nothing particularly alarming about expecting or demanding it online, too. For, from the fact that everyone can say anything about anything on the Web (which is the Web 2.0 motto, essentially), it simply doesn't follow that we all want to spend our day reading it. For one, we can't, because there's simply too much content online these days. But for two, and more importantly, we don't want to. First, because life is short, and we'd rather read something that improved or enlightened or even properly amused or entertained us. And second, because, as the recent backlash against the Web culture from Carr, Lanier, and others suggest, it's making us stupid. And, of course, life should be too short for that.

Monday, December 30, 2013

The Triumph of Triviata

Almost as soon as user generated content became an acronym, two rival interpretations appeared among cultural critics and technologists and seemingly everyone else. On the one hand, someone like Web guru turned NYU professor Clay Shirky (Here Comes Everybody, Cognitive Surplus) seized on the democratizing, collaborative possibilities of the social, Web 2.0 movement. Whereas Big Media once told everyone what was important (epitomized in antediluvian declarations like Cronkite's "and that's the way it is"), the Web was making it possible now for us to tell each other what we cared about; what was important. To someone like Shirky, or Stanford law professor Lawrence Lessig (Free Culture), or Harvard technology theorist Yochai Benkler (The Wealth of Networks), it seemed that the Web was a kind of information liberation movement, destined to make all those passive readers of yesterday tomorrow's writers and trend setters and innovators. It wasn't simply that we had more options with UGC--more things to look at and to enjoy--it was that we had an entire, revolutionary, technological means for large-scale social change and improvement. "What gives?" was missing the point, and borderline nonsensical. "What's next?" was the only relevant question. As the popular Microsoft ad of the time put it (ironically referring to someone sitting at a computer): Where do you want to go today? The answer, to the Web 2.0 enthusiasts and visionaries, was a resounding anywhere.

On the other hand, folks began noticing before long that much of the content generated by all these newly liberated creators wasn't worth much, to put it bluntly. The LA Times attempted to capitalize on the new Web culture by allowing anyone to comment and even contribute to its stories; this lasted a few days, until the sheer magnitude of silliness and irrelevance and tastelessness peppering its woebegone pages forced an about face, and they discontinued the feature in disgrace (albeit quietly). Other media giants like the New York Times or the Wall Street Journal of course launched "Web 2.0" online versions with comments sections, but they were notably safeguarded from the "mob rules" type of scenario that embarrassed the LA Times. In general, it became apparent that while anyone could say anything and publish it online, editorial standards in the traditional sense were more, not less, necessary in such an environment.

Blogging became ubiquitous, entering into our lexicon shortly after appearing first as "Web logs", and gave voice to the common person, to be sure. But most blogs were silly missives written by uninformed amateurs who either borrowed from actual reporting to regurgitate or expound on ideas and stories, or simply neglected serious discussion altogether, journalistic or otherwise, in favor of mindless off-the-cuff chatter about their significant others, their sports cars, or other desiderata that few others found worthy of reading. A few blogs became important in serious discussions; most of the millions of others were scarcely worth knowing about. Still, they were, all of them, "published" on do-it-yourself blogging platforms like Live Journal or Google's Blogger, and it was all readable to anyone who cared, and all UGC. Similar observations apply here to amateur videos on YouTube, to "mashing up" content like songs by combining existing artists' singles, and on and on. In short, sans the social change rhetoric, "UGC" was largely what one might expect, by the end of the 2000s: lots of amateurish, often inaccurate, often mendacious, and rarely publishable (in the traditional sense) written and multi-media content, everywhere. Crap, in other words.

The sobering reality of Web 2.0 when judged by traditional media standards should not, in retrospect, have been much of a surprise. Viewed statistically, any large sample of the population will generally not happen to be award-winning journalists, novelists, musicians, or movie makers. That's life. But what was, perhaps, a surprise were the success stories, like Wikipedia. Here, anonymous users collaborated in an open "Wiki" environment to produce encyclopedia entries, and as the project exploded in the early 2000s, with some famous exceptions, the quality of the articles appearing on Wikipedia seemed to confirm, not challenge, the idea that there could be "wisdom in crowds", and that Shirky et al really were prescient in seeing the transformative social potential of Web 2.0. Fair enough. But notwithstanding the successes, there was a deeper problem emerging that would pose more fundamental challenges to the technological revolution of the Web. To see it clearly and at its root, we'll need to return to the issue of search, and to Google search in particular.

Whoops! Idiocracy

In the last section, we surveyed the rise of search, focusing on (who else?) Google, and saw how Google's insight about human judgments in HTML links propelled Web search into the modern era. In this vein, then, we can see the beginning of the entire social revolution (roughly, from Web 1.0 to Web 2.0 and on) as a story of the beginning of "real" Web search with Google's PageRank idea. Yet we ended this feel-good section back where we started, with all the original worry about the Web making us stupid, a view given recent voice by folks like Carr and Lanier, and even more recently with the latest The Atlantic Cities article on the dangers of photo sharing; fretting now about our memories and memory formation in the Instagram-age (always, alas, worried about our brains online). What gives? This is our question.

Before answering it, though, it'll be helpful to review the general landscape we've been traversing. Back to the beginning, then, we have:
(1) Increasingly, smart people are worrying about the downside of modern technological culture (basically, "Web culture"). Indeed, studies now emerging from cognitive psychology and neuroscience suggest that there's a real, actual threat to our cognitive selves on the Web (our brains and brain activities like memory, attention, and learning).
(2) As a corollary of (1), the picayune dream of something like instrumentalism--we use a technology as we wish, and it doesn't really change us in the process--is almost certainly false with respect to Web culture.
(3) From (1) and (2), the Web seems to be changing us, and not entirely (or even mostly, depending on how moody one is) for the better.
(4) But the Web seems like the very paragon of progress, and indeed, we've been at pains in the last section to explain how the Web (or Web search with Google) is really all about people. It's all about people-smarts, we've argued, and so how can something about us turn out to be bad for us? Isn't the "Web" really just our own, ingenious way of compiling and making searchable and accessible all the content we think and write and communicate about, anyway?
(5) And so, from (1)-(4), we get our question: what gives?

That's our summary, then. And now we're in a position to address (5), or at least we've got enough of a review of the terrain to have a fresh go at it now. To begin, let's make some more distinctions.

More Distinctions (or, Three Ways the Web Might be Bad). These are general points about Web culture, and we might classify them roughly as (1) Bad Medium (2) Distracting Environment, and (3) Trivial Content.

(1) Bad Medium
For years, people have noted in anecdotes and general hunches or preferences the differences between physical books and electronic Web pages. Back in 2000, for instance, in the halcyon days of the Web, noted researchers like John Seeley Brown (who admittedly worked for Zerox) and Paul Diguid argued in The Social Life of Information that "learning" experiences from printed material seem to be of a qualitatively different sort then "learning" experiences we get from reading lighted bits on an artificial screen. Books, somehow, are more immersive; we tend to engage a book, where we're tempted reading text on a Web page to skim, instead. We might call this an umbrella objection to taking the Web too seriously, right from the get go, and I think there's some real teeth in it. But onward...
(2) Distracting Environment
Much of Carr's points in his original Atlantic article "Is Google Making Us Stupid?" and later in his book The Shallows are (2) type objections. Roughly speaking, you can view Carr's point (and the research he points to that suggests his point is valid) as something akin to the well-known psychological result that people faced with endless choices tend to report less intrinsic satisfaction in their lives. It's like that on the Web, roughly. If I can read my email, take in a number of tweets, get Facebook updates, field some IM, and execute a dozen searches all in fifteen minutes, it's hard to see in practical terms how I'm doing anything, well, deep. Any real cognitive activity that requires focus and concentration is already in pretty bad straights in this type of I-can-have-anything-all-the-time information environment. And, again, for those tempted to play the instrumentalist card (where we argue that in theory we can concentrate, we just need to discipline ourselves online), we have a growing number of brain and behavioral studies surfacing that suggest the problem is actually intrinsic to the Web environment. In other words, we can't just "try harder" to stay on track (though it's hard to see how this would hurt); there's something about our connection to information on the Web that actively mitigates against contemplation and concentration of the kind required to really, thoroughly engage or learn something. As Carr summarizes our condition, we're in The Shallows. And since we're online more and more, day after day, we're heading for more shallows.
(3) Trivial Content
Much of Lanier's arguments in his You Are Not a Gadget are explorations of (3). Likewise, someone like former tech-guy Andrew Keen advances objections of the Trivial Content sort in his The Cult of the Amateur. As I think Lanier's observations are more trenchant, we'll stick mostly to his ideas. Trivial Content is really at the heart of what I wish to advance in this piece, actually, so to this we'll turn in the next section.

Whoops! Idiocracy

Enter "Search"

You can throw around some impressive numbers talking about the Web these days: a trillion Web pages (so says Wired founder Kevin Kelly), and as of this writing 1.59 billion of them indexed on search engines. Google, of course, is the story here--as much today as a decade ago. When the company debuted it's "BackRub" search engine on Stanford University's servers back in the late 1990s, within a year the software was VC funded and moving out of its academic roots and into commercial tech stardom. Since then, many of the needle-in-haystack worries about finding information on the exponentially growing World Wide Web have become largely otiose. Why? Because, generally speaking, Google works.

But like many great ideas, the Google recipe for Web search is somewhat paradoxical. On the one hand, Google--as a company and as a search technology--is the paragon of science, engineering, and numbers. Indeed, the math-and-science ethos of Google is part of it's corporate culture. Visit the Googleplex--the sprawling campus in Mountain View, California where Google is headquartered--and you'll get a sense that everything from employee work schedules to seemingly cosmetic changes on its homepage to geek-talk about algorithms is subject to testing, to numbers. Google is data-driven, as they say. Data is collected about everything--both in the company and on the Web--and then analyzed to figure out what works. Former CEO Eric Schmidt remarked once, tellingly, about his company that "in the end, it's all just counting." And it is, of course.

On the other hand, though, what propelled Google to stardom as a search giant (and later as an advertising force) was the original insight of founders Larry Page and Sergei Brin--two Stanford computer science students at the time, as we all now know--that it's really people, and not purely data, that makes Google shine. PageRank, coined after it's inventor Larry Page, is what made Page's pre-Google "BackRub" system so impressive. But PageRank wasn't processing words and data from Web pages, but rather links, in the form of HTML back-links that connected Web page to Web page making the Web, well, a "web."

Page's now famous insight came from his academic interests in the graph-theoretic properties of collections of academic journal articles connected via author references, where the quality of a particular article could be judged by (roughly) examining references to it from articles with authors having known authority and credentials on the same topic. Page simply imagined the then-nascent World Wide Web as another collection of articles (here: Web pages) and the HTML links connecting one to the other as the references. From here, the notion of "quality" implicit in peer-reviewed journals can be imported into the Web context, and he had the germ of a revolution in Web search.

Of course it worked, and almost magically well. When Page (and soon Brin) demo'd the BackRub prototype, simple queries like "Stanford" or "Berkeley" would return the homepages of Stanford University or The University of California at Berkeley. (Yes, that's pretty much it. But it worked.) It's a seemingly modest success today, but at the time, Web search was a relatively unimportant, boring part of the Web that used word-frequency calculations to match relevant Web pages to user queries. Search worked okay this way, but it wasn't very accurate and it wasn't very exciting. Scanning through pages of irrelevant results was a commonplace.

Most technologists and investors of the day therefore pictured search technology as a mere value-add to something else, and not a stand alone application per se. The so-called portal sites like Yahoo!, which used human experts to collect and categorize Web pages into a virtual "mall" for Web browsers and shoppers were thought to be the present and future of the Web. Search was simply one of the offerings on these large sites.

But the human element used by Yahoo! to classify Web pages was much more powerfully captured by Page and Brin algorithmically--by computer code--to leverage human smarts about quality to rank Web pages. And this is the central paradox--while Google became the quintessential "scientific" company on the Web, it leaped to stardom with an insight that was all too human--people, not computers, are good at making judgments about content and quality. And of course, with this insight, the little BackRub system bogging down Stanford's servers quickly became the big Google search giant. Suddenly, almost over night, search was all the rage.

Putting it a bit anachronistically, then, you could say Google was, from the beginning, a social networking technology--or at least a precursor. The idea that the intelligence of people can be harnessed by computation led to more recent tech "revolutions" like Web 2.0. For instance, in tagging systems like de.licio.us (now owned by Yahoo!), users searched people generated tags or "tagsonomies" of Web pages. Tagging systems were a transitional technology between the "Good Old Fashioned Web" of the late 1990s with its portal sites and boring keyword search (like Yahoo!), to a more people-centered Web where what you find interesting (by "tagging" it) is made available for me, and you and I can then "follow" each other when I discover that you tag things I like to read. Once this idea catches on, social networking sites like My Space and later Facebook are, one might say, inevitable.

So by the mid-2000s, user generated content (UGC) like the earlier de.licio.us, a host of user-driven or "voting" sites like Digg (where you could vote for or "digg" a Web page submitted on the site), and large collaboration projects like Wikipedia were simply transforming the Web. Everywhere you looked, it seemed, people were creating new and often innovative content online. As bandwidth increased, visual media sites for sharing photos and videos (e.g., YouTube) emerged, and within it seems months, becoming major Web sites. And as Web users linked to all of this UGC, and Google's servers indexed it, and it's PageRank-based algorithms searched it by exploiting the human links, Google's power was growing by almost Herculean proportions. Like the Sci-Fi creature that gets stronger from the energy of the weapons you use to shoot it, every fad or trend or approach that took fire on the Web translated ineluctably into a more and more powerful Google. By the end of the 2000s, it seemed every person on the planet with an Internet connection was "googling" things on the Web, to the tune of around 100 billion searches per month.

Excepting, perhaps, the idea of a perfect being like God, every other idea has its limits, and Google is no exception. Enter, again, our troubling question: how, if the Web is driven increasingly by human factors, and Google leverages such factors, can Google be making us stupid (as Carr puts it)? Why need we be assured we're not "gadgets" (as Lanier puts it)? If all this tech is really about people anyway, what gives? "What gives?" is a good way of putting things, and it's to this question that we now turn.

Wednesday, December 18, 2013

Continuation of Things Past

The prior post is rough and this one promises to be choppy. Some notes towards an article, that's all.

Deconstructing the Web

(1) The Web paradox is something like: once you start treating people like information processing systems--and I'll explain how this works with the cognitive-social model on the Web--"deeper" and core creative intellectual acts lie outside your scope. So the paradox is that all the information at your finger tips leads, in the end, to having less knowledge. It's sort of like a law of human thinking, some comparison at least metaphorically to a law of thermodynamics, where you can't get something for free. You want lots of information? You have the Web. You want, as Carr puts it, concentration and contemplation? You have to get off of the Web.
(2) None of this really matters--even if you accept the thesis here--if you have an instrumentalist view of technology; you won't see the danger or the problem. But part of my argument is that there is no such thing as instrumentalism; the Web is paradigmatically non-instrumentalist. In fact, you can go "realist" about the non-instrumentalism of the Web and point to actual brain science: our brains are literally changing. So it's not a philosophical debate. It's true.
(3) Getting all the positives of endless information without succumbing to the underlying cognitive-social information processing model is the Big Question. There are two ways to approach this.
(a) Introduce a distinction between Web use and "full" or "natural" human thought and action. A good example here is the distinction between using a network to discover a physical book (say, on Amazon), and actually reading and absorbing what the book says (say, by buying it and then reading it in the physical world).
(b) Change the Web. This is an intriguing possibility, and I think there are a number of promising routes here. Most of the thoughts I have on this matter involve a principle I "noticed" a few years ago on expertise. Call it the "natural world" principle or I'll think of a better title, but here are some examples to motivate it:
(1) Someone writes a blog about driving Highway 101, which he does every summer.
(2) Someone writes a review on Yelp about the French cafe in the Mission District in San Francisco, and the reviewer spent the afternoon at the cafe just last week.
(3) Someone writes an article on Heisenberg's Uncertainty Principle or Sartre's Being and Nothingness on Wikipedia, and the person has a degree in mathematics or physics or just took a course on French Existentialists at the University of Kentucky (or wherever).

Revolution Cometh
In all of these examples, there's a principle of knowledge at work, and underlying this principle, there's one of, say, effort. Someone did some actual work in every example. For instance, the fellow with the travel blog actually drove the highway (it's long, it takes time). Or, the customer at the cafe actually went there, and sat down, and ordered an Espresso and a Croissant. The effort principle underlies the knowledge principle because, well, it takes effort to know things about the world. And whenever people know things about the world and translate this knowledge into bits of information online, like with all communication we can learn (if not experientially, at least cognitively) from those bits, by reading them. In this guise nothing is really that different than fifty years ago; it's like looking at Microfiche, say. Doing research. Learning.

But the effort principle is inextricably tied to the knowledge principle, and this is where this model departs from the current Web model. For instance, something like "Web 2.0", or what Lanier pejoratively calls the "hive mind", pulls the effort and knowledge principles apart. Here, a bunch of anonymous "Web resources" (people online) all chip in little bits of effort to make a finished product. Like, say, a Wikipedia entry. The big fallacy here is that there's something from nothing--no one ever really knows a ton about quantum mechanics, or atheistic existentialism. The focus here is not on what an individual might know (an "expert") but rather on what many anonymous non-experts might collectively "know." And this is where all the trouble starts; for the information processing model that gives rise to the negative conclusions of a Carr or a Lanier (or a New Yorker article about Facebook) is ideally suited to the cognitive-social model that ignores physical-world-expertise and the effort it takes in favor of anonymous Web resources. If information is processed, hive-like, by so many resources, then--like any information processing device--the process is what ultimately matters, not the knowledge from experts. Expertise emerges, somehow, out of the process of information processing. Indeed, that what we call "expertise" is actually structural, and exploitable by algorithms, is precisely the idea driving the mega-search company Google. We'll get to Google later.

So to conclude these thoughts for now, what's driving the negative conclusions of Lanier-Carr (to put their conclusion memorably: "the Web is making us stupid") is our participation in an information processing model that is more suited for computers than for people. As this is becoming our cognitive-social model, of course we're getting stupider, to the extent in fact that computation or information processing is not a complete account of human cognitive-social practices. This point is why someone like Lanier--a computer scientist at Berkeley--can ask "Can you imagine an Einstein doing any interesting thinking in this [Web] environment?" He's point out, simply, that innovation or true creativity or let's say "deep" things like what Einstein did have little in common with much of what passes for "thinking" on the Web today. It's not just that lots of people are online and many people aren't Einsteins; it's that lots of people are online and they're all doing something shallow with their heads without even realizing it. As Carr puts it so well in The Shallows, they're surfing instead of digging into ideas; skimming longish text for "bullet points", jumping from titillating idea to idea without ever engaging anything. And, echoing Heidegger again, as the Web isn't simply an instrument we're using, but it's in fact changing us, the question before us is whether the change is really good, and whether the cognitive-social model we're embracing is really helpful.

All the way back to the beginning of this, then, I want to suggest that far from steering us away from the Web (though this simple idea actually has legs, too, I think), what's really suggestive is how to encourage the knowledge-effort principle in the sorts of technologies we design, implement, and deploy online. I use Yelp, for instance. I use it because someone who actually visits a restaurant is a real-world "expert" for purposes of me choosing to spend an hour there. It all lines up for an online experience, in this case. They did the work, got the knowledge, and even if they're no Einstein, they're an expert about that place in the physical world (that cafe in San Francisco, with the great Espresso).

And likewise with other successes. Wikipedia doesn't "work" relative to a traditional encyclopedia like Britannica because the "hive mind" pieced together little bits of mindless factoids about quantum theory, arriving at a decent exposition of Heisenberg's Uncertainty Principle (magic!). It works because of all those little busy bees online, one of them had actual knowledge of physics (or was journalistic enough to properly translate the knowledge about physics from someone who did).

But again, the problem here is that the Web isn't really set up to capture this--in fact much of the Web implicitly squelches (or hides) real-world categories like knowledge and effort in favor of algorithms and processing. When Google shows you the top stories for your keywords "health care crisis", you get a virtual editorial page constructed from the Google algorithm. And when you key in "debt crisis" instead (you're all about crises this morning, turns out), you get another virtual editorial page, with different Web sites. Everything is shallow and virtual, constructed with computation on the fly, and gone the moment you move to the next. You're doomed, eventually, to start browsing and scanning and acting like an information processor with no deeper thoughts yourself. So it's a hard problem to get "effort" and "knowledge" actually built into the technology model of the Web. It takes a revolution, in other words. And this starts with search.

Search is the Alpha and Omega

Tuesday, December 17, 2013

Help! The Web is Making Me Stupid (and I like it)

Nicholas Carr wrote a book in 2012 about how the Web threatens (yes "threatens", not "enhances") cognitive capabilities like concentration and learning. His book, appropriately titled The Shallows, started out as an article that appeared in the Atlantic in 2008, appropriately titled Is Google Making Us Stupid? In that article--and subsequently and in more depth in The Shallows--Carr suggested that the Web is "chipping away [our] capacity for concentration and contemplation." [Reader: "What's this about the Web? Oh no! Wait, a text. Who's Facebooking me? Check out this video! Wait, what's this about the Web? Who's making us stupid??? Lol."] Yes, maybe Carr has a point.

And he's not alone in sounding an increasingly vocal alarm about the potential downside of all this immersion in modern online technology--the Web. After his provocative Atlantic article, a spate of other books and articles (many of them published, ironically, on the Web) started appearing: the seminal You Are Not a Gadget in 2010 by computer scientist Jaron Lanier, and missives on the dangers of social networking, like the Is Facebook Making Us Lonely? a couple of years later, in 2012, (again in The Atlantic) or the New Yorker's How Facebook Makes Us Unhappy earlier this year.
And the trend continues. Witness the Atlantic Cities latest warning shot about the explosion of online digital photographing, How Instagram Alters Your Memory. Peruse this latest (remember--if only you can--that you won't read it that deeply) and you'll discover that as we're running around capturing ubiquitous snapshots of our lives--from the banal to the, well, less banal--we're offloading our memory and our natural immersion in natural environments to our digital devices. Study after study indeed confirms a real (and generally negative) link between cognitive functioning and use of Web technologies. And yet, we're all online, with no end in site. What gives?
We can ask the "what gives?" question in a slightly different way, or rather we can break it into a few parts to get a handle on all this (somewhat ironically) surface discussion of the Web and us. To whit:
(a) Assuming all these articles--and the scientific studies they cite--are on to something, what makes the "Web" translate into a shallow "Human" experience? What is about modern digital technology that generates such an impovishered cognitive-social climate for us?
As a corrolary to (a), we might ask the slightly self-referential or Escher-like question about why the "Web" just seems so darned opposite to most of us: why does it seem to enhance our "smarts" and our abilities from doing research based on Web searches to capturing moments with digital photography for Instagram. Why, in other words, are we in the semi-delusional state of thinking we're increasing our powers overall, when science tells us that the situation is much different? While we seem to gain access to information and "reach" with Web use, we appear to be losing "richness"--capacities that are traditionally associated with deep thinking and learning? (Capacities, in other words, that we would seem to require, more so today than perhaps ever.)
(b) Swallowing the hard facts from (a), what are we to do about it? At least two scenarios come to mind: (1) "Do" less technology. Go Amish, in other words. Or failing that, read an actual book from time to time. Couldn't hurt, right?
(2) Change technology or our relationship to technology itself. This is an intriguing possibility, for a number of reasons. One, as no less than the philosopher Heidegger once commented (in typical quasi-cryptic fashion), viewing any technology as merely instrumental is the paragon of naivete. We make technology, then it goes about re-making us, as [] once remarked. The words are more true today than ever. And so, if we're stuck with technology, and it's true that the affects of technology on us is ineliminable (there is no true instrumentalism), then it follows that our salvation as it were must lie in some changes to technology itself. This scenario might range from tinkering to revolution; it all depends on our innovativeness, our sense of a real and felt need for change, and of course our ability to concentrate on the problem long enough to propose and implement some solutions (please, Google, don't make us stupid so quickly that we can't solve the problem of Google making us stupid...).

In what follows, then, I'm going to take a look at (a) in a bit more detail. The aim here will be to convince the reader beyond any reasonable doubt that there really is a problem, and that we're headed in the wrong direction, appearances to the contrary (perhaps). And secondly I'll be arguing that there's something like a creative and forward-looking at least partial solution to (b); namely, that once we understand the cognitive-social model we're implicitly adopting when (over) using the Web, we can re-design parts of the Web itself in ways that help mitigate or even reverse the damage we're doing, and in the process (and with a little serendipity) we might also help accelerate or usher in a tech revolution. It's exciting stuff, in other words, so I hope we can all concentrate long enough to... (apologies, apologies).

On (a) - What's up with that?

thaxis

Popular Posts

Tuesday, December 31, 2013

The End of Stupidity

Monday, December 30, 2013

The Triumph of Triviata

Whoops! Idiocracy

Enter "Search"

Wednesday, December 18, 2013

Continuation of Things Past

Tuesday, December 17, 2013

Help! The Web is Making Me Stupid (and I like it)

My Blog List

Search This Blog

What's thaxis?

Labels

Blog Archive