for Albert H. Kritzer CISG Database

of the Institute of International Commercial Law

at Pace University Law School

[ document manifest ]

<< previous toc next >>

A Uniform International Sales Law Terminology
Vikki M. Rogers  *  and Albert H. Kritzer  **    1 

III. A New Frontier: Organization and Dissemination of Materials on International Sales Law

1. Methodologies Currently Established for Retrieval of International Sales Law
2. Problems with Current Information Retrieval Systems for International Sales Law
3. Uniform International Sales Law Indexing Language
4. Relationships Established Between the Descriptors
A. Does Free-Text Searching or Boolean Logic Eliminate the Need for an International Sales Law Indexing Language?

"Since the mere possession of writings does not give knowledge, how are we to extract from this almost incomprehensibly large collection of written records the knowledge that we need?"  79  [page 235]

Enormous strides have been taken to revolutionize the speed and manner in which information on international sales law is disseminated. The University of Freiburg's CISG online website,  80  directed by Professor Schlechtriem, was among the first, and remains a leading site for reporting hundreds of CISG cases. The cisgw3 web site of the Institute of International Commercial Law of the Pace University School of Law,  81  in addition to case presentations, offers researchers a bibliography on the CISG, the Principles of European Contract Law (PECL) and UNIDROIT Principles that exceeds 5,000 entries.

Uniquely, a large percentage of the resources that have been established - UNCITRAL, Unilex, Pace, the Members of the Autonomous Network of CISG Websites - have offered their information on the Internet free of charge. This has enabled persons from all geographical backgrounds to utilize the sources, thus taking great strides toward the development of the global jurisconsultorium that is necessary to foster the uniform application of the law.

The challenge is that in international sales law, more attention has been paid to the amount of information that is disseminated rather than the manner in which it is presented. A predictable retrieval system should be the next step to build solidly upon that which the international community has already created. There is an incorrect presumption that researchers know the legal concepts about which they need information and how to obtain this information. If the West approach had simply been to provide a comprehensive system of reporting without systematically classifying the information obtained, it is possible that [page 236] the ability to properly rely upon precedent would have been strained because of the conflicts arising in jurisdictions based on the inability to obtain necessary legal materials.  82  Honnold has recognized that the ability to retrieve information is just as important as the amount of information that is available:

"The development of a homogeneous body of law under the [Sales] Convention depends on the channels for the collection and sharing of judicial decisions and bibliographic material so that experience in each country can be evaluated and followed or rejected in other jurisdictions."  83 

1. Methodologies Currently Established for Retrieval of International Sales Law

There are three systems currently in place for the retrieval of materials on international sales law - the print index, computer-based Boolean searching and a system of organization based on substantive legal content. Each of these systems is analyzed in turn to determine whether any of them can provide the necessary structure for the creation of the framework to conceptualize international sales law.

(1) The traditional print index " . . . allows a researcher to locate relevant information in a collection of documents. The most common type lists subjects alphabetically, followed by a reference allowing the researcher to locate the information in the document collection."  84  Most books on international sales law, e.g., Honnold's Uniform Law for International Sales Law  85  and Schlechtriem's Kommentar zum Einheitlichen UN-Kaufrecht (CISG),  86  include print indexes. The problem with this retrieval system is that, at the present time, indexes are usually generated from a subjective list of terminology. A single uniform law is represented by different terms in various indexes.  87  Accordingly, information could be lost because the user does not have the benefit of standardized terminology established to categorize the information. Also, the comprehensiveness and styles of these indexes vary dramatically. In some indexes, broad category headings are used to encompass a multitude of [page 237] international and domestic law concepts, while other indexes are so detailed that information might be missed unless the user is aware of the nuances in the legal concepts.  88 

(2) The second system is computer-based and relies on Boolean searching as the major means to find relevant information. Boolean searching, discussed in more detail infra, allows a user to search machine-readable files for keywords that best describe a topic. A unique feature of the Boolean system is that the user can combine keywords or phrases using the operators "and", "or" and "not." The problem, however, is that Boolean searching presupposes that users know exactly what they are looking for because it relies on exact terminology. For example, searching for material on "acceptance of goods" will not retrieve documents that employ the term "taking delivery of goods". Unless the material within the computer is "marked" in a certain way, Boolean searching does not take into account synonyms.

Lawyers are accustomed to terminology derived from their domestic laws. Yet, a lawyer using domestic terminology to obtain information on international concepts may not retrieve information he or she is seeking (e.g., the phrase "rescission of the contract" may not retrieve CISG cases reflecting "avoidance of the contract"). Moreover, since many persons are not familiar with the range of commands that retrieval systems provide to refine searches (e.g., nested Boolean searching or obtaining relevancy rankings), document retrieval by laypersons can have minimal and inconsistent results.

Web pages that include a Boolean search option, usually also provide the contents of the site in a list; sub-categories may be displayed after the top-level heading is "clicked." The lists frequently give only the title of the item of information, however, and not the category or term for the legal concept that the work represents. Therefore, this search method may really be useful only when the user knows the precise title of the document he or she is looking for.

(3) The third retrieval system we have today in international sales law moves away from traditional research methodologies and relies more heavily on the substantive content of the law as a means for the structure of its classification system. It has taken two forms. Both are organized under a provision of a law. Taking the CISG as an example, the first organizes its documents by legal issue, the second also by the use of an "Annotated Text Page."

(a) UNILEX uses a system that organizes CISG cases under each CISG Article pursuant to a list of legal issues that could arise under that Article.  89  The user is led to a case abstract and copy of the text of the case. UNCITRAL [page 238] organizes its CLOUT abstracts in a similar manner, with a different coding system.  90 

(b) The Institute of International Commercial Law of the Pace University School of Law  91  applies the CLOUT coding system and accompanies it with Annotated Text Pages  92  for each individual Article of the CISG. These pages enable researchers to analyze the sum, i.e., the CISG Article, through all of its parts, i.e., the statute itself and its legislative history, scholarly commentaries, and case law. The pages also provide comparisons with the UNIDROIT Principles and PECL. The intent is to enable persons to access the contents of "books" of information on each element of the spectrum of issues present in the CISG by clicking on the materials most relevant to their research.

Both forms of the third retrieval system take closer strides towards the creation of the architecture necessary for information retrieval; however, neither goes far enough to offer a system that will aid sufficiently in the uniform conceptualization of all international sales law - not yet at least.

2. Problems with Current Information Retrieval Systems for International Sales Law

All of the aforementioned methods assume that the user has a sophisticated level of knowledge in researching international topics. Most lawyers do not have this knowledge. Collectively, the information is too scattered among the different resources; individually, none of the current resources has established a system adequate to ensure quick, thorough retrieval results when the number of CISG cases and commentaries grows into the tens of thousands. Moreover, foreign case law that is provided through these sources is often written in a language unknown to the reader.  93  [page 239]

A uniform system for information retrieval should be created to provide a framework for how the law itself should be viewed. Additionally, a systematic, comprehensive case translation program must coexist with the information retrieval system to ensure that information that is retrieved can actually be utilized.  94  These two elements have the potential to lift international sales law from the domestic law paradigm. International sales law must be given a structural backbone so that it can stand autonomously. Creating the architecture for an information retrieval system that can absolve the homeward trend is the next hurdle for the international community. The creation of a uniform system for the retrieval of knowledge on international sales law is one of the elements required to meet the goal of a true global jurisconsultorium and uniform application of international sales law.

This goal begs the next question: What should the blueprints for the creation of a uniform system of information retrieval look like? What are the first realistic steps that should be taken towards establishing a framework for international sales law? As was the situation over a century ago in the US, the answer is found in library science methodologies, specifically, the creation of an information retrieval thesaurus.

3. Uniform International Sales Law Indexing Language

A uniform system for information retrieval would help achieve a more consistent application of international sales law. The term "uniform system" suggests that in different media - print or computer-based - legal concepts would be indexed using the same controlled terminology. Ideally, all information sources would be merged to provide a "one-stop shop" for international sales law. Until that goal is realized, consistent, uniform classification of information in the various sources is the next best practical step.

There are two tools that could be used for the creation of a uniform language for the classification of information - classification schema, and information retrieval thesauri.  95  [page 240]

(1) The first option, a classification schema, assigns numbers to categories of information. Subjects are then classified by number (e.g., the Dewey Decimal System). This system does not provide the structure necessary to create an autonomous international vocabulary. Creating categories for legal topics could be a respectable beginning, but it will ultimately be a flawed route to the control of information because it allows too many domestic law ideas to be pushed into broad categories.

(2) The second option, the creation of an information- retrieval thesaurus for international sales law, is the most effective tool for the organization of materials in this field of law. Unlike either the UNCITRAL Thesaurus on the CISG,   96  which provides a helpful outline of the contents of each Article of the Sales Convention, or Roget's Thesaurus, which provides a list of synonyms, the information thesaurus is a controlled vocabulary  97  containing all the possible subject headings for an index (called "descriptors") and charting the semantic relationships between the terms. The following highlights the main technical aspects of such a thesaurus. The subsequent discussion focuses on the use of a thesaurus for indexing international sales law materials.

An information retrieval thesaurus can be created using either a deductive method (terms are extracted from documents, but no control over the terms is made until enough terms are gathered, and then relationships are assigned) or through an inductive method (terms are selected as they are encountered in documents; vocabulary control and relationships are applied at the outset).  98 

For the creation of an international-sales-law thesaurus, an inductive method should be applied to immediately delineate domestic terms from international terms and select preferred descriptors. The scope of this thesaurus is international sales law, the range of its domain can therefore vary based on subjective definitions of this field. Generally, we can commence assembling descriptors for this thesaurus by deriving them from the CISG, UNIDROIT Principles, Principles of European Contract Law (PECL), lex mercatoria, case law, and scholarly commentaries on them, arbitration rules (inter alia, institutional rules and the UN Model Law on International Arbitration) and Incoterms. Reference materials that are released by the United [page 241] Nations and other organizations and associations should also be incorporated. For example, the United Nations has published an "International Trade Law Terminology" in three languages, and the International Chamber of Commerce provides a book of "Key Words in International Trade" with terminology represented in five languages.

It is not necessary that every commentary on international sales law (ranging in the thousands) be consulted in the creation of this thesaurus. Rather, "key" books and articles should be referred to initially. Descriptors can be modified later, or new descriptors added based on the terms discovered through further research and indexing - the thesaurus is alive; it can always be modified to reflect new legal thoughts. Moreover, since this is a list reflecting international terminology, it should be annotated so that different jurisdictions can be assured that it reflects a balance of sources from different countries and legal cultures.

Although the thesaurus is premised on the idea of extracting terms from international sales law and then applying its terms to classify this law, the imputation of commercial law terms from domestic laws should not be precluded. This feature will only enhance the influence the thesaurus could have on the goal of an autonomous interpretation of international sales law generally. By way of illustration: in the United States a person conducting research on international sales law who is not familiar with its domain would likely use the terminology from Article 2 of the UCC in that person's search. If the thesaurus includes terminology from the UCC, but directs the user to terms which represent parallel legal concepts in international sales law, the researcher is more likely to get all the information needed and is no longer relying on domestic law to find the answer to international legal questions. The incorporation of domestic laws into the structure will impact the substantive development of the law  99  as well as making the search mechanisms derived from the thesaurus more user-friendly.

One of the unique attributes of the information retrieval thesaurus is that it establishes relationships among the terms. The relationships have the ability to control the terms that will denote legal concepts and also place each term within a framework delineating its position in the hierarchy of all of the other descriptors representing legal concepts. [page 242]

4. Relationships Established Between the Descriptors

Semantic relationships  100  - An information retrieval thesaurus denotes the permanent relationships arising from the definition of the subjects involved. There are three types of relationship:

a. Equivalence relationship  101  - it includes synonyms, quasi-synonyms (terms whose meanings may be regarded as different, but which are treated as equivalents for purposes of the thesaurus),  102  variant spellings, acronyms, full forms, and translations. For example:

  Entry Term - rescission of contract
         Use - avoidance of contract
  Descriptor - avoidance of contract
         Used for -    termination of contract
                 rescission of contract
                 renunciation of contract
                 repudiation of contract
                 cancellation of contract
                 cancellation of contract

(in its multilingual form could also direct the user, for example, from the German equivalent of "avoidance of contract" (Rücktritt) to the English term, which would lead to all the information on the subject regardless of the language).

A further example:

   Entry Term - PECL
          Use - Principles of European Contract Law
   Descriptor - Principles of European Contract Law
          Used for (UF) -    PECL
                  Lando Principles

b. Hierarchical relationship  103  - represents broader and narrower terms for each descriptor that is in the thesaurus. The broader term provides the researcher with the context of the legal concept. For example:

  Descriptor - damages
         Broader Term (BT) -    remedies
         Narrower Term (NT) -    consequential damages
                 exemplary damages
                 liquidated damages
                 incidental damages [page 243]

c. Associative relationship  104  - represented within the thesaurus by related term codes and covers "associations between descriptors that are neither equivalent nor hierarchical; yet the terms are semantically or conceptually associated to such an extent that the link between them should be made explicit in the thesaurus, on the grounds that it may suggest additional descriptors for use in indexing or retrieval."  105  For example:

                Descriptor - damages
                       Related Terms (RT) - calculation of damages
                               mitigation of damages
                               reduction in damages
                               proof of damages

Relationships such as these are explained in the Addendum to this paper and defined and explained further in the ANSI/NISO Guidelines for the Construction, Format and Management of Monolingual Thesauri.  106  The ANSI/NISO Guidelines (and the Addendum to this paper) also illustrate other elements of thesaurus construction, including the use of the "scope note." This is a note following a descriptor explaining its coverage, specialized usage, or rules for assigning it.  107  A scope note allows the creator to tailor the terminology so that its application is limited. If a word has a certain "scope" in domestic law but a different "scope" in, for example, the CISG or the UNIDROIT Principles and the PECL, the scope note is used to direct the user to apply the term only in the manner defined by the CISG or in these "restatements."

English is today the most popular language for writings on international sales law. An international sales thesaurus may therefore commence with the English language; however, the mechanisms used to organize this field of law should not be reliant solely on the terminology of one language. Applying relevant thesaurus standards, the information retrieval thesaurus should also include various languages to effectively incorporate materials from around the world into the framework. The International Standards Organization has established a standard for the creation of a multilingual thesaurus.  108 

Software invented for the creation of information retrieval thesauri permits the creator of a thesaurus to generate broad subject categories for its terms. These categories could be most effectively used in this domain by assigning terms to specific international law instruments (or, more specifically, the Article numbers within the law), e.g, the CISG, UNIDROIT Principles or PECL. Similar categories can be created for terms derived from Arbitral Associations or [page 244] Incoterms. By creating relationships between terms and assigning them to subject categories, the thesaurus designer provides a multitude of possibilities for the creation of search mechanisms, for manipulating the presentation of information based on users' needs in the confines of a uniform terminology.

A thesaurus is the first, but essential, step in the creation of a uniform framework for the conceptualization of international sales law. The thesaurus becomes most useful when case law, scholarly commentaries and legislative history materials are indexed together for information retrieval. All these documents can be classified under descriptors from the thesaurus, with different descriptors assigned to different legal instruments as appropriate; those descriptors are then used in the index. For example, because the CISG and the UNIDROIT Principles and PECL assign different meanings to the terms "avoidance" and "termination":

The marvel of a thesaurus is that it can ensure that all information is "tagged" using the same terms, which will have the implicit effect of teaching lawyers to associate and categorize particular terms with either their domestic law or a particular international legal instrument. Consider the comments of Daniel P. Dabney in his article, The Curse of Thamus: An Analysis of Full-Text Legal Document Retrieval:

"Another effect of subject authority control [thesaurus control] in indexing may be an influence on the substantive development of the subject of the collection. For example, some of the terms that might be used as subject headings have connotations that implicitly comment on the subject matter so indexed. Consider, for example, that generations of lawyers and judges have found law relating to employment relations under the heading "Master and Servant." This subject heading no doubt seemed reasonable to the legal community of the turn of the century when the heading was incorporated into the West key number system. A different segment of the society of that period might have found it reasonable to put such material under the heading "Toiler and Leech," and colored fruitful perception of the topic in a different way. "Toiler and Leech" seems outrageous to us; "Master and Servant" seems merely archaic, but this is to a large extent the effect of familiarity. . . . The precoordination [page 245] of subject headings in a thesaurus also may affect the development of the literature by making it appear that certain ideas go together and others do not."  110 

This quote is not only an indication of the influence that a thesaurus can have on the perception, growth and development of concepts in the law, but further serves as a warning to the international sales law community as it works to create an information retrieval system. A methodology that is created to classify information must maintain a high level of flexibility to ensure that new legal thoughts are not recycled into archaic classification schemes. Descriptors should be periodically reviewed by practitioners and academics within this area of law to ensure that the terms are representative of current legal concepts, and are not, in effect, hindering the progression of the law.

It is now time to index all international sales law based on a uniform terminology derived from a suitable information retrieval thesaurus to influence the substantive development of the subject, so that courts and arbitral tribunals will place certain legal ideas together (international) and keep others apart (domestic and international).

A. Does Free-Text Searching or Boolean Logic Eliminate the Need for an International Sales Law Indexing Language?

"[A] revolution in legal research is taking place right now because of a technological change. . . . With computers, researchers can formulate their own word searches rather than rely entirely on the predetermined indexing of a digest."  111  Many people applaud the fact that free-text searching and Boolean logic have liberated researchers from the confines of an index.  112  Since so many research sources on international sales law are computerized and will continue to evolve in this format, it is necessary to examine whether it is even necessary to create a information retrieval thesaurus for indexing international sales law at this stage.

Free-text searching and Boolean logic are tools used in the context of computer-based searching. "Full-text searching enables a researcher to search for every occurrence in the database of any word or combination of words without a pre-existing index."  113  "Boolean logic is a syntactical calculus used for the comparison of data items (words and numbers) and combinations of data items. . . . The power of the Boolean search is the ability to match items that have a specific [page 246] relationship within a document.  114  In a full-text system, such as LEXIS or WESTLAW the use of these conjunctions allows the researcher to create a context - to specify a relationship between the terms for which the researcher is searching."  115  Although these systems have been praised because they do not rely on a pre-coordinated index, they have also been criticized because they do not provide the non-inaugurated researcher (a researcher unfamiliar with the conventions of database searching or unfamiliar with the subject he is seeking to research) with the tools to obtain all of the information he may need on a particular area of law.  116  It is important to consider the mechanics of computer-based research in order to understand why it is not well suited to retrieving legal concepts. "Information in legal databases is organized by words [which are] ... placed in a massive alphabetized list, and [their] location ... noted; this is called the concordance ... the computer essentially compares the words in our request to the concordance, and notes the documents that have the word combinations we have requested ... There is no discernible framework ... There is no overriding organization of concepts and rules. Searching for concepts and rules is something that computers are notoriously poor at doing."  117  [page 247]

Whether a supporter or critic of Boolean or free-text searching, neither approach should be considered the last and most effective tool for creating a uniform information retrieval methodology for international sales law. Free-text searching assumes a certain level of knowledge with respect to the terminology that must be used in the search. As mentioned supra, in most applications it has not been made to handle synonyms nor consider the legal background of the user (possibly using domestic terminology familiar to him or her).  118  These search mechanisms are useful in a national law context, because the framework for the law is already understood, and terms can be used with a level of confidence  119  and security that they will produce complete and relevant research results. In the context of international sales law, a uniform terminology that represents legal concepts for the purposes of searching must still be created.  120  [page 248]

For international sales law, an index (based on the terms in the thesaurus) should be incorporated into search interfaces to allow the user to see and utilize the framework that has been created for the law.  121  Law librarians have recommended the combination of Boolean searching with editorial features (e.g., indexing, etc.).  122  Possibly, a "mark-up language," e.g., legal XML,  123  could be used to incorporate the relationships established in the thesaurus to ensure high recall  124  of relevant  125  documents. Whichever alternative is adopted, computer-assisted legal research in its present form does not justify the abandonment of the precoordinated index.

 79. Daniel D. Dabney, The Curse of Thamus: An Analysis of Full-Text Legal Document Retrieval, 78 Law Libr. J. 5, 12 (1986). In his article, Dabney includes part of Plato's Phaedrus. In Phaedrus, Socrates, in a conversation with Phaedrus, describes the legend of Theuth. Theuth was the Egyptian god who invented many arts (e.g., arithmetic, astronomy). His greatest discovery was writing. The King at the time, Thamus, who usually praised Theuth's inventions, did not approve of writing. He refused to teach it to his people.

"If men learn this, it will implant forgetfulness in their souls; they will cease to exercise memory because they rely on that which is written, calling things to remembrance no longer from within themselves, but by means of external marks. What you have discovered is a recipe not for memory, but for reminder. And it is no true wisdom that you offer your disciples, but only is semblance, for by telling them of many things without teaching them you will make them seem to know much, while for the most part they know nothing ..." Phaedrus 275 a-b.

If the conclusion of this story is correct, and we do not possess knowledge internally, but must seek knowledge from the writings we retrieve, Plato should have continued the conversation between Socrates and Phaedrus to evaluate the systems that should be created to access the knowledge that one is seeking (e.g., for international commercial law: international codes, case law, scholarly commentaries, legislative history). The story should have also analyzed the impact that the research tools used to access the writings would have on the manner that we conceptualize the writings we uncover.

For a more modern view similar to Thamus', see comments by another state leader: "Much reading is an oppression of the mind and extinguishes the natural candle." William Penn quoted in Daniel Akst, On the Contrary: A Corner Office Has Little Room for Books, N.Y.Times, July 1, 2001, Business, at 4.

 80. See supra note 22.

 81. Id.

 82. Delgado & Stefanic, supra note 28 at 214.

 83. J.O. Honnold, supra note 2 at 127 as quoted in Ralph Amissah, Revisiting the Autonomous Contract (to be published). See also Lief Sevón, Observations, in International Uniform Law in Practice [Acts and Proceedings of the 3rd Congress on Private Law held by the International Institute for the Unification of Private Law (Rome 7-10 September 1997)] 135 (Oceana: New York, 1998) "To be able to take account of decisions from other countries one has first to be aware of them"; In the same vein: "[p]roper reporting of decisions [is an] essential prerequisite for the proper working [of the rule of precedent]." René David, The Legal Systems of the World, in International Encyclopedia of Comparative Law, Martinus Nijhoff: The Hague 133 (1984).

 84. Carol M. Blast & Ransford C. Pyle, Legal Research in the Computer Age: A Paradigm Shift, 93 L. Lib. Journal 285, 291 (2001).

 85. J. O. Honnold, Uniform Law for International Sales Law (Kluwer Law International 1999).

 86. Peter Schlechtriem (ed.), Kommentar zum Einheitlichen UN-Kaufrecht-CISG [Commentary on the UN Convention on the International Sale of Goods (CISG)] (C.H. Beck 1998).

 87. See e.g., Addendum.

 88. Id.

 89. UNILEX, edited by Professor Bonnell, also includes a traditional print index and a table of cases organized by country.

 90. As discussed in A/CN.9/SER.C/GUIDE/1, paras. 18-19, "the Secretariat [of UNCITRAL] [has publish[ed], based on classification schemes ("thesauri") separate indices for the UNCITRAL legal texts covered by CLOUT. The purpose of such indexes is to assist users of CLOUT in identifying cases relevant to a given issue by listing cases under the provision or sub-issue with which they deal." ‹›.

 91. Available at ‹›. This retrieval system will be improved by classifying its materials according to descriptor categories. These categories, derived from the information retrieval thesaurus, the Uniform International Sales Law Thesaurus that is currently being constructed, will provide a framework for the conceptualization of international sales law.

 92. Recognizing that the analysis of any CISG Article should combine the actual CISG Articles, case law, legislative history and scholarly commentary, the Pace database provides "Annotated Text Pages" that seek to integrate all of this information for each CISG Article at one source.

 93. See Franco Ferrari, Applying the CISG in a Truly Uniform Manner: Tribunale di Vigevano (Italy), 12 July 2000, Uniform Law Review, NS-Vol. VI., 203, 206 (Kluwer Law Publishing 2001) ("Resorting to foreign case law undoubtedly promotes the uniform application of the CISG. However, requiring interpreters to consider foreign decisions can create practical difficulties . . . foreign case law is often written in a language unknown to the interpreter").

 94. The QM Case Translation Programme was inaugurated on September 27, 2000. As of July 27, 2001, over 150 full texts of CISG cases in English or English translation have been entered on or are being readied to enter on the cisgw3 website. Additional case translations are being processed. The 150+ case translations cited include opinions of the Supreme Courts of Argentina (1 case), Austria (3 cases), France (4 cases), Germany (8 cases), Hungary (1 case), Israel [relevant excerpts only] (1 case), Netherland (1 case), Switzerland (1 case), and of the Supreme Constitutional Court of Colombia (1 case). See supra note 22 for further reference to Pace Law School and Queen Mary CISG Translation Programme that has been designed to coexist with the information retrieval system.

 95. Paul Miller, I Say What I Mean, But Do I Mean What I Say? Ariadne Issue 23 (visited June 18, 2001) ‹›. See also J. Milstead, "How Do I Build a Thesaurus" (visited June 4, 2001) ‹› (prepared specifically for American Society of Indexers web site) for information on the top-down and bottom-up methodologies for thesaurus construction.

 96. See supra note 65.

 97. In the alternative, an uncontrolled vocabulary is essentially a list of words and phrases. This list can be drawn from the information that is to be classified. Uncontrolled vocabularies lack structure and do not provide a mechanism to deal with the challenges that exist in the creation of a multilingual, international vocabulary, a vocabulary that must be released from the confines of domestic legal connotations.

 98. National Information Standards Organization, Guidelines for the Construction, Format and Management of Monolingual Thesauri, ANSI/NISO Z39.19-1993 at 27.

 99. See generally Bowker & Star, supra note 77 at 141, stating that one benefit of the ICD (International Classification of Diseases) is that "it can be used in transnational comparisons, especially where there are radical local differences in belief, practice, and knowledge representation".

 100. See supra note 98 at 13.

 101. Id.

 102. Id. at 15.

 103. Id at 16.

 104. Id.

 105. Id. at 3

 106. See supra note 98.

 107. Id. at 3.

 108. Documentation - Guidelines for the establishment and development of multilingual thesauri, ISO 5964 (1985).

 109. See the Addendum to this Article for further comments on thesaurus treatment of "avoidance" and "termination" under the CISG and the UNIDROIT Principles and PECL.

 110. Dabney, supra note 79 at fn. 8.

 111. Blast & Pyle, supra note 84 at 285.

 112. Dabney, supra note 79 at 17 ("In full-text document retrieval, there is no human subject indexing").

 113. Robert C. Berring, Full-Text Databases and Legal Research: Backing into the Future, 1 High Tech. L. J. 27,28 (1986).

 114. Boolean combinations of descriptors can also exist. Free-text searching can independently function without Boolean operators.

 115. See supra note 112.

 116. Dabney reports as follows on the current Lexis and Westlaw approach: "Both LEXIS and WESTLAW rely almost exclusively on the ability of the systems to recognize words supplied by the user. The difficulty with this approach is that there is an imperfect correspondence between words and ideas." Op cit. at 17. Because many judges and practitioners are not likely to use exactly the same words to describe concepts or ideas, West Publishing has tried to compensate by creating a "Full-Text Plus" system. "This system refers to the fact that the WESTLAW database contains the full text of cases plus the same text of headnotes and Digest summaries printed in the National Reporter System. West posits that this addition introduces 'normalized' language because the trained editor has again entered the picture. The uniform language in the headnote and syllabus are supposed to compensate for the imprecision of the judicial author. Thus, the searcher can formulate a search strategy knowing that his search phrase will be matched up both with the text of the judicial opinion and with the 'normalized' language introduced by West editors in the headnotes and case synopsis." Id. at fn 68.

 117. Barbara Bintliff, From Creativity to Computerese: Thinking Like a Lawyer in the Computer Age, 88 Law Library Journal 338, 346 (1996).

"LEXIS and WESTLAW have begun to develop concept-based systems and have introduced 'natural language' search interfaces as a step in this direction. We now have Freestyle and WIN, respectively. Natural language moves towards a conceptual search system, with a list of thousands of commonly used legal phrases indexed in addition to words. But natural language requires a complex search interface, which substitutes a series of mechanical judgments for our decision-making process. The computer program 'identifies' the 'concepts,' which are basically nouns or legal phrases, in the search request, and matches them against its inventory of words and legal phrases. The program identifies other documents with the same concepts and ranks its findings by statistical relevance - primarily by the number of times the concept occurs and how close to the beginning of the document it first occurs.

Like other computer searches, sometimes the results of natural-language searches are extraordinary, and sometimes they are worthless; usually they are somewhere in between. In any event, your ability to think in computerese and the underlying logic of the computer program determines the outcome of your research. This isn't the bias-free, untouched-by-human-hands results we expect of a computer, for many decisions are made for you by the computer program. Furthermore, many programmers are convinced that a better search, even for conceptual information, can be crafted using the Boolean techniques. One developer of CD-ROM-based legal materials stated that natural-language searching compared to Boolean searching is like using an automatic transmission versus a stick shift. 'You don't need to know anything about transmissions to drive an automatic, but all the race cars have stick shifts.'" Russ Armstrong, CD-ROM v. Law Books, Law-Lib Discussion List (Jan. 8, 1996) email at ‹› Blintiff at 347.

 118. WESTLAW does now provide its users with an option to check a thesaurus of "Related Terms" when a researcher is conducting a search. It therefore permits its users to search with broader terminology, increasing chances of success for the retrieval of relevant information. Although Westlaw does not currently account specifically for the domain, i.e., terminology, of international sales law, it is the sort of technology into which the International Sales Law Thesaurus could easily be incorporated.

 119. This confidence is probably unjustified. "Several extensive studies have clearly documented a false sense of security on the part of computer researchers. One study commented that users felt that 'because the source is 'technological,' they are finding everything or, at the very least, finding the best materials. ...We have suspended our sense of disbelief when it comes to computers." Bintliff, supra note 116, at 349, quoting F.W. Lancaster et al., Searching Databases on CD-ROM: Comparison of the Results of End-User Searching with Results from Two Modes of Searching by Skilled Intermediaries, 33 RQ 370, 382 (1994).

 120. As Professor Germain puts the problem and a solution: ". . . Search engines are essentially of two kinds, human-mediated 'intellectual' indexes and 'robot' or automated indexes. In the intellectual indexes, individual web sites are classified by hand according to various classification schemes . . . 'Robot' or automated indexes use programs that download every page ... so that every word on every page can be indexed by a ... search engine ... An April 1998 study by the journal Science concludes that search engines are not thorough in finding relevant documents, because they each only index a fraction of the total documents available ... The lesson is not to rely on just one engine . . ." Claire M. Germain, Content and Quality of Legal Information and Data on the Internet with a Special Focus on the United States, 27 Int'l J. of Legal Info. 296 (1999) [citations omitted]. For more on difficulties associated with "intellectual" indexes and "robot" or automated indexes, see Section 6 of Graham Greenleaf et al., Moving Access to Law into the 21st Century (visited June 18, 2000) ‹›.

 121. See, e.g., supra note 92.

 122. Dabney, supra note 69 at 34 ("The addition of good human indexing to CALR data bases is a promising approach to the problem of improving retrieval performance in such systems . . .").

 123. See Legal XML Standards Development Project at ‹›.

 124. Recall is the percentage of the total number of relevant documents in a database that are retrieved by the search being studied. See supra 69 at 15.

 125. Relevance is the relationship between a question and a document that makes the document important to the person researching the question. Id. Dabney points out that as recall goes up, relevance goes down, and vice versa. This is a problem inherent in most CALR systems. Id. at 16.

[ document manifest ]

<< previous toc next >>