Patrice Riemens on Tue, 7 Apr 2009 22:12:49 +0200 (CEST) |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
<nettime> Ippolita Collective: The Dark Side of Google, Chapter 6 (part 1) |
NB this book and translation are published under Creative Commons license 2.0 (Attribution, Non Commercial, Share Alike). Commercial distribution requires the authorisation of the copyright holders: Ippolita Collective and Feltrinelli Editore, Milano (.it) Ippolita Collective The Dark Side of Google (continued) Chapter 6. Quality, Quantity, Relation (part 1) The Rise of Information The information society is heterogeneous in the extreme: it uses network communication systems like telephony, digitalised versions of broadcast [*N1], pre-Web traditional media, like dailies, radio or television, and Internet-born ones like e-mail or P2P exchange platforms, all this with gay abandon, and even without an afterthought. But a closer look reveals that all these systems are based on one single resource: information. Now within the specific domain of search engines, and thus of information retrieval, one can state that what consists information is represented by the sum total of all extant web pages [*N2]. The quantitative and qualitative growth of all these pages and of their content has been inordinate and continue to be so. That comes from the fact that it has become so {unbelievably} easy today to put up content on the Web. But contents are not isolated islands, they take shape within a multiplicity of relationships and links that bind together web pages, websites, issues, documents, and finally the contents themselves. Direct and unmediated access to this mass of information is well-nigh impossible, even as a simple play of thought: it would be like 'to browse through the web manually'. This is the reason why there are search engines, to filters the Web's complexity and to serve as interface between the information and ourselves, by giving us search results we are happy with. In the preceding chapters, we have reviewed the principal working tools of a search engine, that is the instruments Google, and other search {companies}, have put in place to scan through web pages, to analyse and order them with the help of ranking algorithm, to archive them on appropriate hardware supports, and finally to return a result to the users according to their search queries. The quantity of stored web pages in memory is thus crucial to estimate the technical and economic potency of a search engine. The larger its 'capital' of checkable web pages, the higher a search engine will score on fiability and completeness of its returns, {but} this obviously within the limits of the specified context. Yet, however enormous a search engine's 'pages capital' may be, it will, and could, never be entirely complete and exhaustive, and no amount of time, money or technology invested in it could change that. It is absurd to think that it would be possible to know, or, at a more down-to-earth level, simply to copy and catalogue all the Internet. It would be like the pretense to know the totality of the living world, including its constant mutations. The information storage devices used by search engines like Google are like vessels: let's imagine we'd have to fill an enormous vessel with diminutive droplets (think all the pages who constitute the Web's information). Assuming that our vessel is able to contain them all, then our task would be to capture and identify them all, one by one, in a systematic and repetitive manner. But if on the other hand, we'd think there are more droplets then our vessel can contain, or that we cannot fathom an algorithm to capture them all, or that the capture may be possible but will be slow, or even that the whole task may be hopelessly ... endless, then we would need to switch our tactics. Especially as our data-droplets change with time, pages get modified, and resources are jumping from one address to another... At this stage, we might decide to go only for the larger droplets, or to concentrate our efforts on those place where most droplets fall, or we could chose to collect only those droplets that interest us most, and then try to link them together in the way we think is the most relevant. As search engines {companies} continue to go after the {holy grail of} cataloguing 'everything' {on the Net}, it might be better to take a more localised approach to the Web, or to accept that for any given 'search intention', there may well be many answers possible, and that among all these answers some may be 'better', because they conform to specific demands regarding [either?] speed [or?] and completeness. One should always keep in mind that the quality of results is dependent upon our subjective perception when it comes to being satisfied with a search return. And in order to accept or to reject a search return, it is essential to apply our critical faculties and to be conscious of the subjectivity of one's viewpoint. In order to establish the trajectory one is really interested in, it is necessary to assume the existence of a closed and delimited network, a kind of world that is bounded only by our own personal requirements, yet always realising that this concerns a subjective localisation, which is neither absolute nor constant in time. [I am not completely happy with this, but then the French text... etc] >From an analytical point of view, charting a network means being able to partition the network for examination into sub-networks, which amounts to creating little localised and temporary worlds (LCWs Localised Closed Worlds) each containing at least one answer to the search that has been launched. Without that many searches would go on with no end in sight, especially since the {amount of} data to be analysed go well beyond the ability of a human person to capture them all: hence this would be a non-starter. Conversely, altering and specifying the query, and refining one's vantage point, will generate a trajectory that is more concordant with the departure point [of the search?]. By looking at the Web as a closed and localised world we also accept that the very dynamic of birth, growth and networked distribution of information ({even} happening while this information may already have become invalid) is an 'emergence' phenomenon, which is neither fortuitous, nor with[out?] a cause. Emergence [*N3] is a occurrence which can be described in mathematical terms as an unexpected and imprevisible outburst of complexity. But it is foremost an event that generates situations which cannot be exhaustively described. To analyse and navigate an 'emerging universe' like the Web demands a permanent repositioning of oneself. This not only determines a 'closed and localised world' of abilities and expectations, but also the opening up towards new avenues of exploration (other worlds are always possible, outside one's own closed one), and thus the appreciation that results can only and always be {fragmented and} incomplete. Quantity and quality Indexation by way of pages accumulation is a quantitative phenomenon, but does not in itself determine the quality of information on the Web; there the prime objective is to capture all pages, not to make a selection. The relationships between the pages give rise to emergence because they are generated on basis of a simple criterion, links existing between them. The quality of information springs hence forth from their typology, and is determined by their ability to trace trajectories, without bothering about a need to capture 'all' information available [?]. Quality therefore depends mostly on making a vantage point explicit through a particular search trajectory: basically, it are the surfers, the pirates, the users of the web who determine, {but also} increase the quality of information by establishing links between pages. The power of accumulation of Google's algorithms is useful to achieve this, but is insufficient in itself. The evaluation of the pages' content has been outsourced to algorithms, or rather to the companies controlling them. The {whole} Google phenomenon is caused by our habit to trust an entity with apparently unlimited power that is able to offer us the opportunity to find 'something' interesting and useful within its own resource 'capital', which itself is being peddled as 'the whole Web'. However, the limits of this {allegedly} miraculous offer are occulted: no word about was not in that 'capital', or only in part, and especially not about what has been excised from it. The thorny ethical and political problem attenant to the management and control of information still refuses to go away: who is there to guarantee the trustworthiness of an enterprise whose prime motive is profit, however 'good' it may be? Even though considerable economic resources and an outstanding technological infrastructure are put to the task of constantly improving the storage and retrieval of data, the political question that constitutes the accumulation of data {by one single actor} cannot and should not be sidestepped. Google represents an unheard of concentration of private data, a source of immense power, which is yet devoid of any transparency. It is obvious that no privacy law can {address and} remedy this situation, and that it would be even less the case through the creation of ad hoc national or international instances /towards the control of personal and sensitive data/. The answer /to the issue of confidentiality of data/ can only reside with a larger awareness and taking responsibility by the individuals who create the Web {as it is}, and this through {a process of} self-information. Even if this is no easy road, it is the only one likely to be worth pursuing in the end. (to be continued) -------------------------- Translated by Patrice Riemens This translation project is supported and facilitated by: The Center for Internet and Society, Bangalore (http://cis-india.org) The Tactical Technology Collective, Bangalore Office (http://www.tacticaltech.org) Visthar, Dodda Gubbi post, Kothanyur-Bangalore (till March 31st, 2009) (http://www.visthar.org) The Meyberg-Acosta Household, Pune (from April 2, 2009) # distributed via <nettime>: no commercial use without permission # <nettime> is a moderated mailing list for net criticism, # collaborative text filtering and cultural politics of the nets # more info: http://mail.kein.org/mailman/listinfo/nettime-l # archive: http://www.nettime.org contact: nettime@kein.org