Knowledge Organisation Strategy

One way to classify approaches to KO was suggested by Broughton; Hansson; Hjørland & López-Huertas (2005):

 

  1. The traditional approach to KO expressed by classification systems used in libraries and databases, including DDC, LCC and UDC (going back to about 1876).
  2. The facet-analytical approach founded by Ranganathan about 1933 and further developed by the British Classification Research Group
  3. The information retrieval tradition (IR) founded in the 1950s.
  4. User oriented / cognitive views gaining influence from the 1970s
  5. Bibliometric approaches following Garfield’s construction of the Science Citation Index in 1963
  6. The domain analytic approach (first formulated about 1994)
  1. Other approaches (Among recent suggestions are semiotic approaches, “critical-hermeneutical” approaches discourse-analytic approaches and genre-based approaches. An important trend is also an emphasis on document representations, document typology and description, mark up languages, document architectures etc.)

 

Each of the 6 approaches (but not other approaches) will be presented and discussed below.

 

 

The traditional approach

It is difficult to define “the traditional approach” because there is no united theory that corresponds to this concept. If we disregard the other approaches to be introduced, what exist are mostly various different practices and some scattered suggestions on how to organize knowledge. Even a single system such as the Dewey Decimal Classification (DDC) has used quite different principles in various editions (cf., Miksa, 1998). The classification researcher Vanda Broughton (2004, p. 143) wrote about one of the old established systems: “It is quite hard to discern any strong theoretical principles underlying LCC [Library of Congress Classification]”. Also some formulations by S. R. Ranganathan (e.g., 1951) suggest that “traditional” systems seem to lack a theoretical foundation (in his eyes as opposed to his own approach).

 

Among the major figures in the history of KO, which can be classified as “traditional”, are Melvil Dewey (1851-1931) and Henry Bliss (1870-1955). Eugene Garfield wrote about Bliss: “His goals and aspirations were different from those of Melvil Dewey, whom he certainly surpassed in intellectual ability, but by whom he was dwarfed in organizational ability and drive. Dewey was a businessman, but he was in no sense as profound in his accomplishments.” (Garfield 1975, 252). This difference in the character of the two men is reflected in their approach to knowledge organization as also reflected by Miksa’s (1998, pp. 42-45) presentation of the business perspective of Melvil Dewey. Dewey’s business approach is hardly an intellectual approach on which the field can find a theoretical foundation for KO understood as an academic discipline. His interest was not to find an optimal system to support users of libraries, but rather to find an efficient way to manage library collections. He was interested in developing a system which could be used in many libraries, a standardized way to manage library collections.

DDC should thus be seen as the dream of the library administrator rather than the dream of the library user. It is not designed for any specific collection and must be seen as a compromise between different collections and corresponding scholarly interests. In order to minimize the work load in libraries, the system is conservative in the sense that it often prefers to avoid to change structure. In other words: Internal consistency over different editions has often taken priority compared to updating the system in order to make it more in accordance with the surrounding society. The user does not get a detailed, realistic view about relations between disciplines and fields of knowledge, but the library administrator gets a system in which most of the books are already classified by other libraries or agencies and which is used for both shelf arrangement and catalog searching. The library administrator may hire people from library schools, who know the system and may apply this knowledge in all the libraries using DDC. The system is thus also supporting professional interests. It probably represents a rationalization of library work more than anything else. Its main quality may be that it represents a standard not a system optimized for browsing or retrieval for any particular interest. It should be added that what is today called Library and Information Science, LIS, was termed library economy in 1876 when the system was first published, which is also an indication of the administrative rather than the academic goals of the system.This may also explain why systems designed on the basis of more modern principles have not succeeded in influencing practice in libraries.

 

Among the critics of the DDC is Bernd Frohmann, who wrote:

 

“Dewey’s subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation”. (Frohmann 1994, 112-113)

 

“Dewey emphasized more than once that his system maps no structure beyond its own; there is neither a “transcendental deduction” of its categories nor any reference to Cutter’s objective structure of social consensus. It is content-free: Dewey disdained any philosophical excogitation of the meaning of his class symbols, leaving the job of finding verbal equivalents to others. His innovation and the essence of the system lay in the notation. The DDC is a poorly semiotic system of expanding nests of ten digits, lacking any referent beyond itself.

….
The conflict of interpretations over “subjects” became explicit in the battles between “bibliography” (an approach to subjects having much in common with Cutter’s) and Dewey’s “close classification”. William Fletcher spoke for the scholarly bibliographer…. Fletcher’s “subjects”, like Cutter’s, referred to the categories of a fantasized, stable social order, whereas Dewey’s subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation”. (Frohmann 1994, 112-113).

 

 

The quote from Frohmann shows that already when Melvil Dewey published his system there was a critique of the DDC as being empty and rather non-academic. Dewey’s attitude may have influenced library philosophy and practice. LIS professionals may have seen their work more like a syntactical activity that an activity involving interpretation and analysis of meaning.

 

In order to identify an approach to KO which may deserve the label “the traditional approach”, we shall turn to other scholars, including Henry Bliss. An important characteristic in his (and many contemporary thinkers of KO) was that the sciences tend to reflect the order of Nature and that library classification should reflect the order of knowledge as uncovered by science:

 

 

Natural order à Scientific Classification à Library classification (KO)

 

 

The implication is that librarians, in order to classify books, should know about scientific developments. This should also be reflected in their education:

 

“Again from the standpoint of the higher education of librarians, the teaching of systems of classification . . . would be perhaps better conducted by including courses in the systematic encyclopedia and methodology of all the sciences, that is to say, outlines which try to summarize the most recent results in the relation to one another in which they are now studied together. . . .” (Ernest Cushing Richardson, quoted from Bliss, 1935, p. 2).

 

This important principle has been implicit in the management of research libraries and bibliographic databases such as MEDLINE, in which subjects specialists are often hired to do the work in KO. The importance of subject knowledge has not been explicit in the following approaches to KO except in domain analysis (and outside LIS in certain computer approaches).

 

Among the other principles, which may be attributed to the traditional approach to KO are:

 

  • Principle of controlled vocabulary
  • Cutter’s rule about specificity
  • Hulme’s principle of literary warrant (1911)
  • Principle of organizing from the general to the specific

 

The principle of controlled vocabulary is essentially a way of avoiding synonyms and homonyms as indexing terms by using standardized vocabulary. Cutter’s rule states that it is always the most specific, most appropriate expressions that should be looked up in the vocabulary of notations and assigned to documents. In this way the expressions for the topics to be made retrievable are rendered most predictable. The term “literary warrant” as well as the basic principle underlying this expression was introduced by E. Wyndham Hulme (1911, p. 447). Hulme discusses whether, for example, the periodic system of chemistry should be used for book classification. He writes (p. 46-47):

 

“In Inorganic Chemistry what has philosophy to offer? [Philosophy here meaning science, which produced the periodic system]. Merely a classification by the names of the elements for which practically no literature in book form exists. No monograph, for instance, has yet been published on the Chemistry of Iron or Gold.

. . .

Hence we must turn to our second alternative which bases definition upon a purely literary warrant. According to this principle definition is merely the result of an accurate survey and measurement of classes in literature. A class heading is warranted only when a literature in book form has been shown to exist, and the test of the validity of a heading is the degree of accuracy with which it describes the area of subject matter common to the class. Definition [of classes or subject headings], therefore, may be described as the plotting of areas pre-existing in literature. To this literary warrant a quantitative value can be assigned so soon as the bibliography of a subject has been definitely compiled. The real classifier of literature is the book-wright, the so-called book classifier is merely the recorder. ” Hulme (1911, p. 46-47).

 

The principle of ordering from general subjects to specific subjects is generally acknowledged and may be related to an essentialist way of understanding.

 

Today, after more than 100 years of research and development in LIS, the “traditional” approach still has a strong position in KO and in many ways its principles still dominate.

 

The traditional approach, however, shows signs of a certain vagueness in its theoretical and methodological basis. Is it subject knowledge rather than competency in KO that marks the construction and administration of knowledge organizing systems? Often it seems to be assumed that that the organization of knowledge is just a matter of “reading” the correct relations between concepts. There is not much indication of how this is done. Although debates about the philosophy of science, e.g. in relation to positivism, was not unknown among the founding fathers of knowledge organization, they were not particularly clear on this point and the same is also the case with the ordinary practice of KO. It is with the development of the domain-analytic approach that the question about the subjectivity and objectivity of KO in a systematic way is first built into the methodological foundation of KO.

 

 

The facet-analytical approach

The date of the foundation of this approach may be chosen, for example, as the publication of S. R. Ranganathan’s Colon Classification in 1933. The approach has been further developed by, in particular, the British Classification Research Group.  In many ways this approach has dominated what might be termed “modern classification theory.” The BC2 system is probably today the theoretically most advanced system based on this theory (and has also contributed to the further development of this approach).

 

The best way to explain this approach is probably to explain its analytico-synthetic methodology. The meaning of the term “analysis” is: Breaking down each subject into its basic concepts. The meaning of the term synthesis is: Combining the relevant units and concepts to describe the subject matter of the information package in hand.

 

Given subjects (as they appear in, for example, book titles) are first analyzed into a few common categories, which are termed “facets”. Ranganathan proposed his PMEST formula: Personality, Matter, Energy, Space and Time:

 

  • Personality is the distinguishing characteristic of a subject
  • Matter is the physical material of which a subject may be composed
  • Energy is any action that occurs with respect to the subject
  • Space is the geographic component of the location of a subject.
  • Time is the period associated with a subject.

 

The British Classification Research Group (CRG) expanded this list, but here we shall only consider the original one. The first assumption is that all subjects can be analyzed in a way that fits into these five categories. Those categories have been developed before the books have been written and arrived in the library. In other words are they neither dynamically developed nor empirically given: they are logical, a priori categories. Each category (facet) has in principle its own classification or lists of symbols. A given document is classified by taking one or more symbols from the appropriate facets and combining them according to certain rules. This combination is called notational synthesis.  The idea is that the same building blocks can be used for all purposes. The underlying philosophical assumption is that elements do not change their meaning in different contexts. This assumption has never, as far as I know, been discussed in the literature. According to modern theories of meaning it is a rather problematic assumption.

 

Ranganathan has had many followers in LIS. It has however, been extremely difficult to trace critical examinations of this approach. Only very few researchers has had broader knowledge which enabled them to consider this approach in relation to fields like philosophy and linguistics. Among the few who have done this is Moss (1964) who found that Ranganathan based his system of five categories on that of Aristotle without recognizing this. Another critical voice is Francis Miksa, who, for example, wrote:

 

“In the end, there is strong indication that Ranganathan’s use of faceted structure of subjects may well have represented his need to find more order and regularity, in the realm of subjects, than actually exist” (Miksa 1998, p. 73).

 

“Ranganathan vigorously pursued the goal of finding one best subject classification system” (Miksa 1998, p. 73).

 

Hjørland (2007b, 382-384) related the basic philosophy of facet analysis to the philosophy of semantic primitives and thus to a broader theory of semantics. According to his analysis, semantic elements are not direct attributes of language, but are related to models of reality, which are then expressed in language. Chemical compounds may, for example, be expressed in chemical formulae by chemical elements. Chemical elements are discovered and named by chemists; they are not given elements in natural languages. The names of the chemical elements are in this case the semantic primitives. Semantic relations, including the relation between elements and composed expressions, are thus connected to theories of reality.

 

  1. R. Ranganathan wrote in his ‘Philosophy of Library Classification’ (1951):

 

“An enumerative scheme with a superficial foundation can be suitable and even economical for a closed system of knowledge. . . . What distinguishes the universe of current knowledge is that it is a dynamical continuum. It is ever growing; new branches may stem from any of its infinity of points at any time; they are unknowable at present. They can not therefore be enumerated here and now; nor can they be anticipated, their filiations can be determined only after they appear.” (Ranganathan 1951).

 

Ranganathan thus expresses the views:

  1. That enumerative systems have a superficial foundation
  2. That the discovery of new knowledge cannot be anticipated in an enumerative system
  3. That the discovery of new knowledge can be anticipated in a faceted system (based on the view that new knowledge is formed by combination of a priori existing categories)

 

These views reveal some basic assumptions in the facet-analytic approach. The difference between the theoretical foundations of enumerative systems compared to faceted systems is not that the former have a superficial foundation while the latter have a profound foundation. The basic questions in knowledge organization are shared by both approaches: How terms are selected and defined and their semantic relations established. This is not a purely logical matter, but largely an empirical question. While it is correct that it may be easier to combine existing elements to form new classes and thus easier to place new subjects in faceted systems, it is of course impossible for any system to anticipate the discovery of new knowledge. The belief that this should be possible reveals that part of the philosophy of facet analysis is without contact with the real world.

 

La Barre (2006) found that faceted techniques are increasingly being used in the design of web-pages. A specific format, XFML, a simple XML format for exchanging metadata in the form of faceted hierarchies has been developed (Van Dijck 2003). The technique is thus very alive and in use.

 

 

The information retrieval tradition (IR)

Information retrieval (IR) and knowledge organization (KO) are normally considered two different – although strongly related – subfields within Library and Information Science (LIS) – related to respectively search labor and description labor (cf., Warner 2002). They are, however, trying to solve the same kind of problems: enabling users to find relevant information. For this reason we have to consider them competing approaches, and thus try to evaluate their relative strengths and weaknesses. The question then becomes: How can IR be characterized as an approach relative to the other approaches discussed?

 

One way to do this has been to make a distinction between the “physical paradigm” (or “system-driven paradigm”) on one side and “user-oriented” or “cognitive paradigm” on the other. The IR tradition has been understood as “systems driven” as if the system makes a decision of what to present for the users.

 

“In the conventional system-oriented view, a “perfect” system is defined as one that finds the best match between a user’s stated request and documents from a collection. This view has proven to be very limiting. It has led many researchers to focus only on how to improve various aspects of document representations and the matching algorithms. As a result, the system-oriented approach to IR tends to disregard users’ cognitive behaviors as well as the problem-solving context in which an IR process is being carried out. It has become evident that to succeed, IR researchers need to look beyond machine algorithms.” (Gruzd 2007, 758).

 

This distinction between “the system-oriented view” and “the user-oriented view” may, however, represent a misinterpretation. The difference between the Cranfield experiments and user-oriented views is first and foremost that the Cranfield experiments are based on expert evaluations of recall and precision, while the user-oriented views are based on users’ evaluation. It is never the technology that makes the decision of what is relevant. The technology is just constructed on the basis of some views of what is relevant and how this can be measured. Neither the system-oriented view nor the user-oriented view has considered the epistemological problem: How are answers to queries related to different theories or views?

 

Important in the IR-tradition have been, among others, the Cranfield experiments, which were founded in the 1950s, and the TREC experiments (Text Retrieval Conferences) starting in 1992. It was the Cranfield experiments, which introduced the famous measures “recall” and “precision” as evaluation criteria for systems efficiency. The Cranfield experiments found that classification systems like UDC and facet-analytic systems were less efficient compared to free-text searches or low level indexing systems (“UNITERM”). The Cranfield I test found according to Ellis (1996, 3-6) the following results.

 

UNITERM                        82,0% recall

Alphabetical subject headings     81,5% recall

UDC                         75,6% recall

Facet classification scheme 73,8% recall

 

Although these results have been criticized and questioned, the IR-tradition became much more influential while library classification research lost influence. The dominant trend has been to regard only statistical averages. What has largely been neglected is to ask: Are there certain kinds of questions in relation to which other kinds of representation, for example, controlled vocabularies, may improve recall and precision?

 

Julian Warner has characterized the dominant IR-tradition with the word “query transformation” meaning that systems automatically transform a query to a set of relevant references. He contrast this principle by what he terms ”selection power”, a  principle that, according to him has been valued in traditional library work (cf., Warner 2002).

 

Although thesauri were developed in the IR-tradition, this is the exception that confirms the rule: The IR-approach may be characterized as generally sceptical of all forms of human interpretation, indexing and classification. Its focus has clearly been on free-text retrieval: The assumption that texts contain all necessary information needed to retrieve them. Recently Karen Sparck Jones (2005) wrote that traditional (pre-)classification probably is obsolete and may be replaced by new promising techniques such as relevance feedback. If Sparck Jones’ view is typical of the IR approach, then a criticism of this view may provide the basis of an alternative to the IR-approach. In fact, two basic criticisms of relevance feedback can be summarized:

 

  • Relevance feedback is based on certain premises about users’ knowledge that are largely unexplored and may turn out to be highly unrealistic: If users do not have the necessary knowledge to classify a domain, they cannot distinguish relevant and non-relevant documents and are thus unable to provide useful feedback.

 

  • Relevance feedback represents unspecified and unclear semantic relations between documents considered relevant. Why prefer a kind of system implying unspecified relations rather than specified and user-controlled relations?

 

In conclusion: The IR-tradition has generally been based on positivist assumptions: that optimal retrieval can be determined by retrieval tests without considering different views or “paradigms” and without considering text corpora as a merging of different views each putting different meanings to terms. In other words, it has mainly been based on statistical averages, and has neglected to investigate how different kinds of representation and algorithms may serve different views and interests.

 

Article By :