A Brief History of SNOMED CT - Healthcare Information Technology

SNOMED CT, which stands for the “Systematized Nomenclature of Medicine – Clinical Terms,” is a comprehensive clinical terminology designed for computers and the digital exchange of information. It has been gaining importance since its first release in 2002 and SNOMED CT is required for the certification of electronic health systems in the U.S for Stage 2 Meaningful Use. It has been adopted by 24 countries as the de facto standard for healthcare information exchange. The World Health Organization is the process of making modifications to the structure of ICD-11 so that is can be more tightly integrated with SNOMED CT.

The need for a medical terminology that was designed for computer applications and that could evolve as a multinational standard goes back many years. In the 1990’s the GALEN Project was perhaps the largest multinational effort to create a common language of medicine for computational use. In the U.S. the College of American Pathologists developed a precursor of SNOMED CT called SNOMED RT (RT stands for Reference Terminology) that was originally published in the late 1990’s. Prior to that the College of American Pathologists had published SNOP, SNOMED, SNOMED II, and SNOMED International. The terminology was developed in a large part by members of the College of American Pathology and later through contributions from Kaiser Permanente West.

SNOMED RT was designed to be a reference terminology that was concept oriented with logically-defined hierarchies and concept interrelationships. SNOMED RT was developed under tight editorial policies and included a computational process that identified missing logical relationship between concepts and potential errors in the concept modeling process. Some of the challenges with developing SNOMED RT were related to this process, which was referred to as “auto-classification.” The algorithms identified logical relationships between concepts that were already in place weighted toward the higher levels of the hierarchy and then these relationships were passed on to subordinate concepts. Thus an accurate logical relationship would assume the qualities of its parent terms in the hierarchy, and the system would identify new potential parent (is_a) and other relationships between concepts. The system was based on an artificial intelligence construct referred to as “Description Logics.” However, if there were any inaccuracies assigned to concepts higher in the hierarchy, they would also be inherited downstream. Many times these errors were subtle, despite an editing process (referred to as “modeling”) that had two human editors independently review and edit concept relationships. The auto-classification process has the ability to “amplify” modeling errors by making them easier to identify a lower points in the hierarchy.

SNOMED RT was scheduled for release in the fall of the year 2000, but the auto-classification process had resulted in too many errors, which was creating tension between the U.S. and U.K. delegates on the SNOMED International Editorial Board and Authority. In June of 2000, Dr. Michael Stearns, the SNOMED International Director, was tasked with overseeing a large scale edit of the SNOMED RT hierarchies so they would be in a form suitable for release. Approximately 44 physicians and other domain experts on three continents (North America, Europe and South America) contributed to a large scale effort to identify logical errors between concepts in the SNOMED RT hierarchy. The group focused on the upper levels of the hierarchy and after several months of intense work the result was acceptable to the SNOMED International editorial board. SNOMED RT was then published and served as one of two precursors of SNOMED CT.

In parallel with the efforts of the College of American Pathologists, the United Kingdom had developed what were referred to as the “Read Codes” after its author, Dr. James Read, a Loughborough, UK general medical practitioner. The Read Codes were contributed to by the royal colleges of medicine and the British Medical Association, and the Read Codes became mandatory for use in electronic health records in the UK in 1989. Input from specialty societies help the Read Codes grow to over 200,000 clinical concepts, covering a wide range of diseases, symptoms, observations, and other medical concepts. Its rich content and mandatory use in the UK made it an attractive candidate to serve as the international terminology standard for healthcare. The name of the Read Code was changed to “Clinical Terms” following its acquisition by the NHS. Although Clinical Terms was by far the most widely used and field-tested clinical terminology designed for electronic health records, it suffered from some potential shortcomings. The overall technical structure of the Read Codes were not seen (by many but not all) to meet the needs of modern computational-based systems like SNOMED RT and the Galen Project.

The College of American Pathologists and the British National Health System decided the best course would be to take advantage of the rich, field tested content within decision was made in 2009 to merge the two terminologies and form SNOMED CT. The SNOMED International editorial board was joined by members of the NHS and the “SNOMED International Authority” was formed to provide additional oversight and direction. In 1999 Michael Stearns, MD was recruited from the National Cancer Institute where he had been developing an ontology for cancer research using the same description logic based tools that were is use by SNOMED RT content modelers.

Clinical content and technical committees were then formed and through a number of international meetings in the U.S. and U.K. a merger strategy was identified and tools were created. This process identified matching concepts, which were then merged, related concepts, whose relationships were then defined, and concepts that required further review. The same group of over 40 subject experts conducted the modeling of 100’s of thousands of medical concepts over a three year period, and SNOMED CT was released in 2002. It currently has over 330,000 distinct clinical concepts, over 700,000 logical concept interrelationships, and 1.5 million synomyms (referred to as “descriptions” in SNOMED CT).

Since that time it has been made mandatory for use in the United Kingdom for disease reporting by primary care providers. In 2013 the U.S. government required that SNOMED CT be include in EHR system in order for them to be certified for Stage 2 Meaningful Use. This has markedly increased its level of awareness with EHR companies and with clinicians in the U.S. As noted above, the World Health Organization is working toward making ICD-11 highly compatible with SNOMED CT.

Challenges with SNOMED CT remain, however. It is a very large terminology and subsets have been created that allow for clinicians to efficiently choose SNOMED CT terms from drop-down menus. Thus far when it is used the majority of implementations do not take advantage of the SNOMED CT’s deeper and most valuable attributes. This includes its logical interrelationships to other concepts (referred to as attributes), the ability to store clinical expressions as post-coordinated expressions, and the use of the SNOMED CT subset mechanism. The initial use of SNOMED CT will no doubt be limited. As more individuals become more familiar with the deeper logic and capabilities of SNOMED CT it will greatly improve the value of the terabytes of digital health information that are being stored in healthcare systems on a daily basis.

Additional information about SNOMED CT, including links to free full text articles on SNOMED CT, is available here.

For further information or comments please contact Michael Stearns, MD at mcjstearns@gmail.com