Controlled Vocabularies

The need for controlled vocabularies in medical computing system is widely recognized. Builders of medical informatics applications need controlled medical vocabularies to support their applications and it is their advantage to use available standards. In order to do so, these standards need to address the requirements of their intended users. Over the past decade, medical informatics researchers have begun to articulate some of these requirements:

1994 Criteria for Controlled Medical Terminology

Domain completeness

No vocabulary can claim complete coverage of the domain of medicine; however, the MED provides complete coverage of its stated domain: terminologies of selected ancillary systems. Domain completeness is allowed in theory, since there is no inherent limitation on the size of the network with regard to number of nodes in the networks as a whole, number of nodes in a class, depth of a node in the hierarchy., number of relations in the network, or number of relations involving any one node. In practice, the present MUMPS implementation has sufficient room for growth to include terminologies needed for at least several years.


Since we can not expect users of the vocabulary to remember all of the terms, there must be provision for including synonyms in the vocabulary. Synonymy is present in a strightforward manner in MED. The top node in the MED, Medical Entity, has the literal attribute "synonym"; every MED term inherits this attribute, which can be filled with alternate names. For example, the 25,510 ICD9 terms include 7,070 synonyms. Synonyms need not be unique. Thus, "MI" can be a synonym of "myocardical infartion" and "mitral insufficiency".

Multiple classification

A vocabulary can be a simple collection of all possible terms; however, this can be extremely unwiedly for both retrieval and maintainance. All of the existing vocabularies use some hierarchical classification scheme. A strict hierarchy does not allow a term to belong to more than one class. So it is too inflexible.

Consistency of views

It should be noted that user of multiple classification can create a problem. By phrasing a query differently, one can create several paths to get to the same term.

Explicit relationships

The meanings of inter-concept relationships must be clear. For example, the relationship between staphylococcal pneumonia and pneumonia is differentiated from relationship between staphylococcal pneumonia and staphylococcus, where the former is a class relation and the latter is an etiologic relation.

The above three criteria are inherent features of a semantic networks and were therefore achieved by definition in the MED. The directed acyclic graph model permits multiple classification. Therefore, this criteria can be met by the MED.

Nonvagueness and Nonambiguity

Concepts in the terminology must be complete in meaning. And vocabulary terms must not be ambiguous, defined as refering to more than one concept. If a term is ambiguous, then at least two disparate types of data are stored under the same term, directly affecting query specificity.  The MED provided for an expansion of the definition of laboratory tests, in effect making them less vague.


There must be no redundancy in the vocabulary. That is, there must by only one way in which each concept can be expressed. Allowing two terms to refer to teh same concepts will reduce query sensitivity. It was originally believed that redundancy could be detected by comparing the set of semantic relations of a proposed term with the set of each existing term and, when an idential match was found, suggesting that reduncy was present. There are pragmatic reasons why such an approach is impractical. First, since the MED contains semantic information in only limited domains, detecting redundant terms would require a significant scaling up of the present vocabulary maintenance effort. Second, the presence of identical semantic descriptions in the MED turns out to be the method used to detect new classes of terms.


1998 Desiderata for Controlled Medical Vocabularies


Desideratum I: Content

        Must seek to provide breadth and depth.

        Atoms versus molecules: One approach to increase content is to add term as they

                                              are encountered. An alternative approach is to enumerate

                                              all the atoms of the terminology and allow users to combine

                                              them into necessary coded terms.

        A formal methodology is needed.

        Structure must not limit size.

Desideratum II: Concept Orientation

        Concepts, not terms

        Terms must correspond to a least one meaning (nonvague), and

        no more than one meaning (nonambiguous)(Higher-level are ambiguous),and

        one concept per meaning (nonredundant)

        May have multiple context-dependent meanings

Desideratum III: Concept Permanence

        Old concepts can't be deleted

             - Example: non-A-non-B hepatitis

        Names can be changed as long as meaning doesn't change (retronyms)

             - Example: transvenous pacemaker.

Desideratum IV: Nonsemantic Concept Identifiers

        Don't use the name.

        Don't use a code that will run out of room.

        Don'tuse a hierarchical code.

        Meaningless integer (+/- check digit).

Desideratum V: Polyhierarchy

        Almost universal agreement

        Needed for locating concepts trough 'tree walking'.

        Needed for inferencing.

        Needed for "essence".

        Example: diseases of the liver which also involve the kidney.

Desideratum VI:Formal Definitions

        Support understanding.

        Support maintenance.

        Structured and controlled (not narrative).

        Represented through relationships within the vocabulary.

        Defintional versus assertional knowledge.

        Additional effort minimal and will pay off.

Desideratum VII:Reject "Not Elsewhere Classified"

        Can never have a formal definition.

        Vocabulary changes induce semantic drift.

        There are valid alternatives.

Desideratum VIII:Multiple Granularities

        Different levels for different purposes.

        Uncertainty is allowed, imprecision is not (we must be precise about our uncertainty).

Desideratum IX:Multiple Consistent Views

        Multiple views for multiple purposes.

        Must not lead to inconsistency.

Desideratum X: Representing Context

        Needed: a grammar to show usage.

        "What is sensible to say".

        Consider modeling "events".

Desideratum XI:Graceful Evolution

        Will always need to fix mistakes

        Medical knowledge will grow

         Bad reasons:

              - Redundancy

              - Major name changes

              - Code reuse

              - Changed codes

         Good reasons

              - Simple addition

    - Refinement

    - Minor name changes

    - Precoordination

    - Disambiguation

    - Obsolescence

    - Discovered redundancy

Desideratum XII:Recognize Redundancy

        Synonyms are good.

        Redundant concepts are bad.

        Redundant expressions are inevitable.


    - Year 1: "Pneumonia", "Left Lower Lobe"

    - Year 2: "Left Lower Lobe Pneumonia

Practical Considerations

    Think "Concept Orientation"

    Have an editorial policy

    Consider explicit definitions

    Nonsemantic identifiers

    Include "is-a" attribute

    Pay attention to maintenance issues

    Include data dictionary in terminology

Attempt to synthesize requirements expressed in the literature for the past decade

Prepared for the IMIA Working Group 6 Jacksonville, Florida, January 1997

Methods of Information in Medicine, 37(4/5), 1998, J.J.Cimino


The use of standard, controlled medical vocabularies for coding patient information is a well established procedure for U.S. health care providers. The most familiar of these vocabularies are: ICD9-CM, UMLS, SNOMED,LOINC, and READ