|
|
Controlled Vocabularies The need for controlled vocabularies in medical computing system is widely recognized. Builders of medical informatics applications need controlled medical vocabularies to support their applications and it is their advantage to use available standards. In order to do so, these standards need to address the requirements of their intended users. Over the past decade, medical informatics researchers have begun to articulate some of these requirements:
No vocabulary can claim complete coverage of the domain of medicine; however, the MED provides complete coverage of its stated domain: terminologies of selected ancillary systems. Domain completeness is allowed in theory, since there is no inherent limitation on the size of the network with regard to number of nodes in the networks as a whole, number of nodes in a class, depth of a node in the hierarchy., number of relations in the network, or number of relations involving any one node. In practice, the present MUMPS implementation has sufficient room for growth to include terminologies needed for at least several years. Since we can not expect users of the vocabulary
to remember all of the terms, there must be provision for including synonyms
in the vocabulary. Synonymy is present in a strightforward manner in MED.
The top node in the MED, Medical Entity, has the literal attribute "synonym";
every MED term inherits this attribute, which can be filled with alternate
names. For example, the 25,510 ICD9 terms include 7,070 synonyms. Synonyms
need not be unique. Thus, "MI" can be a synonym of "myocardical infartion"
and "mitral insufficiency".
A vocabulary can be a simple collection of all possible terms; however, this can be extremely unwiedly for both retrieval and maintainance. All of the existing vocabularies use some hierarchical classification scheme. A strict hierarchy does not allow a term to belong to more than one class. So it is too inflexible. It should be noted that user of multiple classification can create a problem. By phrasing a query differently, one can create several paths to get to the same term. The meanings of inter-concept relationships must be clear. For example, the relationship between staphylococcal pneumonia and pneumonia is differentiated from relationship between staphylococcal pneumonia and staphylococcus, where the former is a class relation and the latter is an etiologic relation. The above three criteria are inherent features of a semantic networks and were therefore achieved by definition in the MED. The directed acyclic graph model permits multiple classification. Therefore, this criteria can be met by the MED. Concepts in the terminology must be complete in meaning. And vocabulary terms must not be ambiguous, defined as refering to more than one concept. If a term is ambiguous, then at least two disparate types of data are stored under the same term, directly affecting query specificity. The MED provided for an expansion of the definition of laboratory tests, in effect making them less vague. There must be no redundancy in the vocabulary. That is, there must by
only one way in which each concept can be expressed. Allowing two terms
to refer to teh same concepts will reduce query sensitivity. It was originally
believed that redundancy could be detected by comparing the set of semantic
relations of a proposed term with the set of each existing term and, when
an idential match was found, suggesting that reduncy was present. There
are pragmatic reasons why such an approach is impractical. First, since
the MED contains semantic information in only limited domains, detecting
redundant terms would require a significant scaling up of the present vocabulary
maintenance effort. Second, the presence of identical semantic descriptions
in the MED turns out to be the method used to detect new classes of terms.
Must seek to provide breadth and depth. Atoms versus molecules: One approach to increase content is to add term as they are encountered. An alternative approach is to enumerate all the atoms of the terminology and allow users to combine them into necessary coded terms. A formal methodology is needed.
Structure must not limit size.
Concepts, not terms Terms must correspond to a least one meaning (nonvague), and no more than one meaning (nonambiguous)(Higher-level are ambiguous),and one concept per meaning (nonredundant)
May have multiple context-dependent meanings
Old concepts can't be deleted - Example: non-A-non-B hepatitis Names can be changed as long as meaning doesn't change (retronyms)
- Example: transvenous pacemaker.
Don't use the name. Don't use a code that will run out of room. Don'tuse a hierarchical code.
Meaningless integer (+/- check digit).
Almost universal agreement Needed for locating concepts trough 'tree walking'. Needed for inferencing. Needed for "essence".
Example: diseases of the liver which also involve the kidney.
Support understanding. Support maintenance. Structured and controlled (not narrative). Represented through relationships within the vocabulary. Defintional versus assertional knowledge.
Additional effort minimal and will pay off.
Can never have a formal definition. Vocabulary changes induce semantic drift.
There are valid alternatives.
Different levels for different purposes.
Uncertainty is allowed, imprecision is not (we must be precise about our
uncertainty).
Multiple views for multiple purposes.
Must not lead to inconsistency.
Needed: a grammar to show usage. "What is sensible to say".
Consider modeling "events".
Will always need to fix mistakes Medical knowledge will grow Bad reasons: - Redundancy - Major name changes - Code reuse - Changed codes Good reasons - Simple addition
Synonyms are good. Redundant concepts are bad. Redundant expressions are inevitable. Example:
Think "Concept Orientation" Have an editorial policy Consider explicit definitions Nonsemantic identifiers Include "is-a" attribute Pay attention to maintenance issues Include data dictionary in terminology Attempt to synthesize requirements expressed in the literature for the past decade Prepared for the IMIA Working Group 6 Jacksonville, Florida, January 1997 Methods of Information in Medicine, 37(4/5), 1998, J.J.Cimino
The use of standard, controlled medical vocabularies for coding patient
information is a well established procedure for U.S. health care providers.
The most familiar of these vocabularies are: ICD9-CM,
UMLS,
SNOMED,LOINC,
and READ
|