Towards a Standard Measure of Index Density

All rights reserved. For permission to reprint contact Dan Connolly and Cynthia Landeen
Copyright © 2000 by Dan Connolly and Cynthia Landeen

The Problem

"How thoroughly is that book indexed?" There are many answers to this, including: "Very." "There are four entries per page." "There are 10 pages of index." "I don't know." "It's a six percent index." "The index has 750 lines." This variation represents a problem; there is no standard, accurate measure of how thoroughly a book has been indexed. While many of the answers above actually answer the question, they do so without precision. Others answer a different question entirely, or offer no useful information.

Most indexers would probably tell you that "four entries per page" is the most useful,while others might say that the "six percent index" answer is the most accurate. In part one of this article we'll explain why neither answer is precise (nor particularly useful, given that), and we'll tell you why. Mulvany (1994) touches upon it when she explains why one index, despite appearances, measured as more dense than another: "The critical difference between these two indexes is a matter of typography and layout." (p. 66). Although this begins to define the problem, it is only a beginning, there is more to it. A truly precise density measure takes into account the relationship of the locators to the text in addition to typography and layout.

Background - Density by Line and Page Measurements

Anderson (1971) measures index density by comparing lines in the index with lines in the text. She gives an example of the process: "Suppose that the 12-page index with 50 lines per page (600 lines) is for a book with 300 pages, and 40 lines per page (12,000 lines); the index will amount to 5 percent of the book. (p. 121)" Mulvany (1994), on the other hand, measures index density using pages of the index and pages of the text: "Assume that a book with 200 indexable text pages has 10 extra pages reserved for the index. If we divide 10 by 200, we end up with 5 percent. (p. 64)" When using these differing processes, you get differing answers, while using the same text, depending on if you are using lines or pages for your measurement. For example, if using the figures given by Anderson but measuring density using pages, we would find a 4% measurement of index density, rather than 5%. What we really have here are two similar, but slightly different, methods of calculating the size, but truly not the density, of an index. Both have the same drawbacks, obvious to indexers, which continue to be overlooked for lack of a better method.

The page percentage measure (index pages divided by text pages) takes no account of differences in the typographical style of either the text or the index (trim size, type size, leading, columns, margins, etc). In her work, Anderson also rejects the page percentage method for being less accurate, and in fact, acknowledges that both methods lack precision: "For a more accurate estimate, lay-out of the index would have to be taken into account. Also, those rare indexes with only one column would have to be rated lower, and those with three or four columns higher, than the usual two-column index." (p. 6) The line percentage measure, while accounting for lines in the text, also doesn’t consider differences in typographical styles between the index and the text. Neither measure considers locators, the most precise indicator of density.

Many indexers intuitively understand "four entries per page" as a conceptually useful measure of density. Mulvany even takes her page percentage measures and translates them into a range of entries per page. This, while better, is not precise enough. Typographical elements of the text continue to be left completely out of the equation. An accurate relationship between the locator and the text still has not been described. In addition, there has always been a bit of ambiguity, among indexers, about the term "entries" although we would postulate that "locators" is the term that is meant. But, what of the text page itself? What is the trim size? What is the type size and leading? How many text columns are there? How many words? Even leaving out mention of illustrative material, the number of words on a page can vary immensely, depending on all of these factors. Immensely.

This focus on lines and pages is, finally, inadequate. To date, no accurate method has been proposed that allows one to specify the relationship between index size and the text which it references, with anything approaching meaning. It is the aim of this article to present a more precise measure of index density, based on as much information as can be accurately, easily and quickly obtained, in order to provide a practical tool for indexers, and editors, to both plan and evaluate indexes.

This article is not intended to be a comment on the appropriateness of density as a qualitative measure of an index. There are many factors that ultimately contribute to index density, including constraints in space and scope that are beyond the control of the indexer. No statistic can summarize the appropriate exhaustivity or specificity of an index, concepts that more fully summarize the usefulness and quality of the index, and the depth of coverage of the text's contents by the indexer. Rather, this new measure is intended only to put a reliable figure forward as a descriptive, quantitative measure of an index, one component of a thorough index review.

Development of L/L Density Measure

Pre-conditions and limitations

In developing our measure, we felt we needed to satisfy several pre-conditions and limitations. First, the measure should be easy to implement and calculate. Second, the measure should be convenient, i.e., it should be able to be calculated fairly quickly. Third, the measure should be intuitive, or if not, at least conceptually easy to understand. Fourth, the measure should be accurate, measuring what it is intended to measure, while taking into account the typographical considerations of each book. Finally, it should be consistent, so that the results can be used to compare and contrast the indexing depth and treatment of subjects in indexes of different books.

Measurements and Variables

We first looked at what elements of a book's text and index were measurable. For the text, we determined that we could measure the following variables: characters, words, lines, and pages. We rejected the notion of counting characters as not meeting most of our pre-conditions. Pages, as a measurement, were also dismissed as too gross a measurement, especially given the wide fluctuation in the typography and design of a page from book to book, and even within a book. This left words and lines for consideration. Both were practical and possible.

In indexes, we knew there were also variables to be measured: characters, locators, headings, lines, columns, and pages. We settled upon locators as yielding the most accurate information, and as being the most representative, while still being practical. For both text and index, we believed sampling would yield an accurate estimate of variables. For the text, we decided to count variables over two pages, then average them, for a usable measure. For the index, a count of two columns, averaged and multiplied by the number of columns, would yield an accurate estimate for the average index. For longer indexes (more than 10 pages), a third column could be counted and averaged with the others.

Finally, we needed to determine what this measure of density would look like. Would it be a percentage, as the previous methods were? A category rating (a scale of 1-10, for instance)? A whole number? Expressing the result as a percentage had disadvantages. It could easily be confused with the existing page percentage measure. Also, a percentage is an abstract representation, one not easily "envisioned" by the average person (indexer or not). We finally decided that the measure would simply be a number, and that what it measured would be appended to the end as a descriptor. All that was left was to decide what the final measure would be–lines (of text) per locator or words (of text) per locator.

The Investigation

Initially, we liked words (of text) per locator . We knew that a count of words was more specific than a count of lines (just as lines were more specific than pages, as Anderson noted). We set about measuring some books using this method. Here is where we hit the stumbling block. In counting words (of text) from differing sections of the same book, we came up with disturbingly disparate numbers. This eventually pointed up the surprising fact that words were too variable to consistently yield accurate measurements. Although we had obtained our word counts by selecting whole pages to count (ignoring illustrative material, for the time being), the variation in word size, and therefore word count, from one page to another was too high, creating an inaccurate measure. In addition, if there were other content (tables, photos, graphs, etc.), it became even less possible to create an accurate statistic. Finally, we found the word counting process tiresome.

We then considered lines. Lines (of text) per locator now appeared to offer the best solution. It met all of our pre-conditions for a good density measure: easy to implement and calculate (counting lines is not onerous and does not involve major calculation); convenient (it is not time-consuming); conceptually easy to understand (how many lines of text, on average, before one finds an indexable piece of information); accurate (takes into account the text density of individual books); and consistent (the results can be used across books, to compare and contrast the indexing depth and treatment of subjects).

We also did some research to verify that there was consistency to line length in printed matter. Our initial impression was that words varied from about 9 to about 11 per line. Happily, The Chicago Manual of Style verified that line length should be from 65 to 70 characters (up to twelve 5.8-character words). Words into Type indicated about 21 to 24 picas per line (6 picas per inch) for literary material, which yields a similar character-length.

The Method

Here are the steps to the method that we settled on, with example figures from Indexing Books in parentheses:

Lines/Locator (L/L) Density Measurement


Text Index Put it together

Some comments on this method. First, we decided to ignore all illustrative material in the book. We could conceive of no method that would easily take it into account with anything resembling accuracy. That appears to be beyond doing in a simple and convenient way. We have proposed a way of dealing with that material, but find that it is imperfect at best: Sample 10 random pages of the text and determine how many contain illustrative material. Indicate the level of illustrative material by means of L (0-2), M (3-7), or H (8-10) following the density measure like so: 4.5 L/L (L). Since illustrative material is being ignored, but still wishing to assign an indexing depth to the whole book, we chose to count the lines of a full page of text (one that contains no illustrations). Blank lines on the page selected should also be counted (between sections or paragraphs, for instance). Finally, a line in our definition spans only one column, so that a double columned layout has twice the "lines."

For the locator count, we chose to disregard cross-references of any kind. We decided that cross-references are, more accurately, references to the index, not to the text. They don't directly refer a user to information in the text, but only to where the user can find such a reference in the index.

The L/L Measure In Action

Table 1 depicts the density measurements of 15 books, using both the page % method and the L/L method. Each book has been ranked from most to least dense after applying that method. Of particular note are the books whose titles are set in boldface. These showed most clearly the differences between methods, which arises from attention to the relationship of typographical elements of the text and index, and to the counting of locators.

Book# Title Pages % Rank L/L Rank
1 A Bride's Passage 6% 6 2.8 2
2 Chance and Change 2% 12 10.4 11
3 Chicago Manual of Style, 14th ed. 6% 7 4 4
4 The Creative Priority 2% 14 12.2 13
5 The First Moderns 11% 2 3.4 3
6 How to Build and Use Greenhouses 1% 15 20 15
7 In the Spirit of Happiness 4% 9 6.2 8
8 Indexing Books 7% 5 4.5 5
9 Joy of Cooking 8% 4 13.6 14
10 Lasso the Wind 4% 10 6.4 9
11 McGraw-Hill Manual of Style 8% 3 9.7 10
12 The Oxford Dictionary of Quotations 54% 1 1 1
13 The Rat Pack 2% 11 6.1 7
14 Travels in Alaska 2% 13 10.4 12
15 What's the Economy Trying to Tell You> 4% 8 4.9 6

Table 1. Comparison of Page % and Line/Locator methods of obtaining an index density measurement, with corresponding rank order (most to least dense).

The most astonishing change in the measure of index density in these books must be that of Joy of Cooking (1967). From a ranking as the 4th densest book in the page % measure, it falls to the next-to-least dense when using the L/L method. A close examination of the text and index reveals why. They are both double-columned and of the same font size and leading–an unusual combination since indexes are nearly always of smaller size and more columns than text. If considered logically, this density (13.6 L/L) makes perfect sense, since the recipes within the book are in the range of about 10-20 lines indicating approximately one locator per recipe.

Other significantly changed books include The Rat Pack, which has a three-column index, accounting for its "increase" in density (more locators per index page), and The McGraw-Hill Manual of Style.

The L/L Measure and Precision

The main advantage to the new measurement technique is its precision. One of the ways this can be shown is to take an existing index, add "simulated" locators using the space available, and show how the L/L method would indicate density change, while the page % measurement would always remain the same.

We previously estimated the number of index entries for Indexing Books at 2,318. However, if you look at the index, you can see that there is almost always enough space to add locators for each index heading without adding lines or pages to the index. Using the two columns on page 310 as our sample, and adding only one locator per heading, we will have added 65 locators to our locator count.

This would change the locator count in this way:
from: 63+59 / 2 = 61 X 38 = 2,318 entry locators
to: 95+92 / 2 = 93.5 X 38 = 3,553 entry locators

It would then effect the L/L measure in this way:
from: 10,360 lines/2,318 = 4.5 lines/locator
to: 10,360 lines/3,553 = 2.9 lines/locator

There is actually enough "white space" on page 310 in the existing index to add an additional 147 single page locators. This would increase the locators on page 310 to 335. If the entire index reflected this type of change (adding about 147 locators per page), this would further change the L/L density measurement

to: 10,360/6,365 = 1.6 lines/locator

But, throughout this comprehensive, in-depth re-indexing, the page % measure of this book remains the same (7%).

Testing

Test users of the L/L measure report that each measurement for a single book took less than five minutes in all cases. As convenience was a major pre-condition to satisfy, we were pleased to see this. They found the directions easy to use.

Conclusion and Further Study

The L/L density measure is clearly an improvement over the previously accepted methods (page % and line %). While sacrificing a little speed, it adds precision (reflecting the relationship between locators and text) and reliability (by being meaningfully transferable from book to book). The problem of dealing with illustrative material remains. Adding illustrative material indicators (L, M, H) is a poor solution, but no easy solution recommends itself.

The practical application of the measure is the next step in making it accepted and of use to the publishing/indexing community. Here are some potential areas which could/might be of benefit to indexers and other publishing professionals. Some require more extensive surveys (which we will share in Part 2 of this article) of printed materials using the L/L density measure to develop reliable and accurate averages, while some are available immediately.

Instructors of indexing can use the measure to evaluate the depth of indexing in students' indexes. Standardized, or pre-screened texts will not be necessary, as the measure can be easily applied to any written text (variable-width text on the Internet may present an issue in this regard; the measure at this time is designed for use with books only). Indexers can use the measure when discussing projects with, or seeking consultative advice on projects from, fellow indexers. A standard measure promotes a common language and understanding, so that advice might be targeted specifically to that text.

Knowledgeable editors can specify the desired index density when contracting the index with greater precision. This specificity aids the indexer in writing the index, especially as she will be able to set target locator goals (per page, per chapter) early in the indexing process. Further study on the relationship between index headings and locators will yield valuable information about index density, both for planning and implementation purposes. For instance, if an editor states she has room for a 10-page indented style index, with 40 lines-per-column and 2 columns, the indexer can figure the approximate maximum number of headings (800). Taking it one step further, a closer approximation has already been proposed by M.D. Law (1979) who has written a nice analysis of the problem in which she "suggest[s] you allow two-thirds or possibly three-quarters as many headings as the maximum." By closely analyzing average locators per heading using a broad and comprehensive sampling of books in a variety of fields and formats, we can make some interesting statements on the approximate number of locators needed, thus improving the indexer's ability to space-plan the index as she proceeds. Mulvany (1994) provides a convenient table that translates the page percentage measure of index size into a projection of possible entries per page for various types of books. The connection she makes is not entirely well supported and she qualifies this table by stating that "every book must be evaluated on its own terms" (p. 67). More study needs to be conducted with the L/L measure in order to establish average L/L measures for books by field and format. This information could be used to establish indexing costs, time, and standards. More discussion, hypothesizing, and testing in this area would be of great benefit to the quantitative analysis of indexes.

Dan Connolly is a freelance book indexer in Barrington, Rhode Island. He operates Word for Word Book Services and is the leader of the IndexStudents community, a discussion/activity center for indexer training and education. His email address is dan@wfwbooks.com and his website is www.wfwbooks.com.

Cynthia Landeen, Ph.D. is a freelance indexer in Eugene, Oregon. She is the sole proprietor of In.dex.trous, a back-of-the-book indexing service, is the coordinator for the Speaker’s Bureau for the Pacific Northwest Chapter, and is active in the Eugene peer-review community. Her email address is bookindexer@att.net, and her website is currently under construction.

Sources Consulted

Anderson, M.D., Book Indexing. Cambridge University Press, 1971

Anderson, M.D. "Making an Index to a Specified Length." The Indexer 7, no. 3 (Spring, 1971): 121-122

Law, M.D. "Introduction to Book Indexing" The Indexer 7, no. 2 (Autumn 1970): 46-48

Mulvany, Nancy C. (1994) Indexing Books. Chicago: University of Chicago Press.