HISO 10099 NZ International Patient Summary (NZIPS) draft standard for public comment

I read the NZIPS ‘draft standard’ with an increasing sense of outrage. This is a long post, so you may wish simply to skip to the conclusion—and then, once you in turn are very angry (likely with what I’ve said)—work through at your leisure.

A very basic critique

  1. The reference ‘standard’ is an ISO document (https://www.iso.org/standard/79491.html) that I cannot read unless I fork out 178 Swiss Francs. So much for any sort of open interoperability.

  2. The HISO 10099 ‘standard’ is presented as a slide deck converted into a PDF: https://consult.health.govt.nz/hiso/hiso-10099-2022-nzips/supporting_documents/hiso10099nzipsdraft20220509.pdf Where is the actual document? If I search on the website, I get nothing: https://www.health.govt.nz/search/results/10099%3A2022. Are our standards now PowerPoint presentations?

  3. HISO 10083:2020 is referenced. Its title is ''Accelerating the shift to a fully interoperable digital health ecosystem". The slide deck (‘slide 4’—are we really referencing a standard in terms of slides?!) notes “This mahi is about establishing a standard for representing and communicating personal health information in an agreed format in terms of its content, structure and coding.”

  4. Let’s think about the requirements for being fully interoperable in communicating health information. The core of clinical decision making is medical science. As we understand medical science in the 21st century, this requires a process of (a) identifying problems and creating explanatory hypotheses; (b) testing these hypotheses (‘models’) adequately; and (c) provisionally accepting successful explanations as ‘true’ so they can be acted upon, with continued vigilance. Now how does the slide deck support this process?

  5. It claims to render the HL7 FHIR International Patient Summary Implementation Guide! But this does not claim to be anything other than " minimal and non-exhaustive; specialty-agnostic and condition-independent; but still clinically relevant". Can this constitute a basis for full interoperability?

  6. If our aim is full interoperability, then we need more than just a common data dictionary. We also need to (a) maintain data fidelity; (b) provide provenance; and (c) represent the process of clinical medicine adequately—or at the very least, not misrepresent this process. We also need some sort of international consensus on what data elements must be present. So let’s look at how the IPS implementation guide shapes up.

  7. There are three required sections: Medication Summary, Allergies and Intolerances, and a Problem List. The header has four items (Subject, Author, Attester, Custodian). Everything else is optional (with some ‘recommended’). The page is singularly devoid of useful links: the HL7.org link is a circular reference, and the IPS Wiki link is broken. The document does not in fact provide any sort of global specification—it merely says this is desirable, and then provides a link to the US Core Implementation Guide.

  8. The current version (via the above link) is here, as of 2022-05-13. It’s stated objective is to “[define] the minimum set of constraints on the FHIR resources to create the US Core Profiles”. This is a bizarre document. First, note the split into Client and Server capabilities—but if you click on the separate links, in the ‘Client’ the only mandatory section is 12.1.1.2—which is self-referential! Similarly in the Server section, there is 12.2.1.2—also simply a self-reference. Think about this: everything is optional. Hardly a good start.

  9. With this under our belts, let’s examine references in the core to the required sections. I’ll start with the (US version of) AllergyIntolerance. There are three mandatory components: the patient, a coded representation of what they’re allergic to, and a ‘clinical status’. This last field has three quite sensible options: active, inactive or resolved. The associated rubric defines these respectively as “experiencing/at risk”, “no longer at risk” or “A reaction to the identified substance has been clinically reassessed by testing or re-exposure and is considered no longer to be present…”.

  10. It’s interesting to contrast the above with the slide deck (slides 23–24) where allergies and adverse reactions are discussed. There are four mandatory elements apart from the patient: the substance, a ‘propensity type code’, a ‘propensity agent category code’, and a date (which is confusingly labelled as ‘optional’, despite also being ‘mandatory’).

Well, there goes international harmonisation. But I have bigger fish to fry.

What should be in the core?
Quite apart from the seeming lack of international harmonisation apparent from the above, there are several issues with the specification. The fundamental question here is:

As a clinician, what do I need in order to make a well-reasoned decision?

I’d suggest that for each data element I’m presented with, I need sufficient information that is also trustworthy. If I’m given untrustworthy data about say “penicillin allergy” then I may kill the patient (the anaphylaxis was falsely flagged as ‘resolved’) or give suboptimal therapy (the patient is no longer allergic to penicillin, which I avoided inappropriately). I’d also prefer not to be confused by irrelevant or conflicting detail.

In other words, core components of a penicillin allergy include not just a substance and an (optional!) certainty, but also the nature, severity, precisely who made the assertion, and their basis for making the assertion. I’d suggest that for any clinical datum, we need:

  1. The nature of the condition (What’s the problem?)
  2. Its severity (How bad is it?)
  3. Associated meta-data: Who made the assertion? Where? When? (Can I trust the source?)
  4. How sure were they? Why?
  5. How is this datum (a) supported and (b) disputed?

As I’ve already noted, partial or untrustworthy data may mislead the clinician and cause harm.

Implications—and a deeper dive
The above analysis has substantial implications for the entire project of establishing a patient summary. These are:

  • A summary must adequately represent each of the core data elements it contains.
  • All clinical data elements share a lot in common.
  • Failure to accommodate these two points invites harm to patients.

With this in mind, let’s move to the NZ FHIR “Sandbox” that is intended to demonstrate implementation of such a clinical record. I’ll start with the Allergy/Intolerance specification.

The resource appears fairly comprehensive in what it describes: an identifier, clinicalStatus, verificationStatus, type, category, criticality, code, patient, encounter, onset, recordedDate, recorder, asserter, lastOccurrence, note and reaction. Of substantial clinical relevance is the primary source of the information (asserter), our old friends ClinicalStatus (active|inactive|resolved) and verificationStatus (unconfirmed, confirmed, refuted, entered-in-error), type (allergy or intolerance) and ‘criticality’ (low, high or unable-to-assess) and a SNOMED CT code that references the substance. The reaction field is complex, again representing the substance, one or more manifestations each in turn with an AllergyIntoleranceReactionGpsUvIps field that seems constrained to one of 24 possible reactions, and a severity (mild, moderate or severe).

How does this gel with my stated requirement above? Let’s say it again:

As a clinician, what do I need in order to make a well-reasoned decision?

I seem able to describe e.g. a severe (life-threatening?) allergic reaction to penicillin on a specific date and code this as anaphylaxis [disorder]. It seems feasible for me to identify other individuals who have similarly coded such a reaction. There is also however a fair bit of incoherence here:

  • We have both ‘criticality’ (low|high|?) and ‘severity’ (mild|moderate|severe). Oh! One refers to specific incidents, the other is ‘overall’. But how do we compose this?
  • We have a number of fields that seem redundant. If I know the nature of the substance, then why do I also require an AllergyIntoleranceCategory?
  • What if, as often happens, I’m unsure which box to check ‘allergy’ or ‘intolerance’?
  • What if my reaction doesn’t fit into the 24 pre-coded options?
  • The 24 options are also incoherent. They are a heady mix of patient-reported symptoms (‘tight chest’), clinical findings (‘bronchospasm’), diagnostic labels (‘Stevens-Johnson syndrome’, ‘vasculitis’) and clinical syndromes (‘anaphylaxis’).
  • Specifically, features such as urticaria, bronchospasm and hypotension (Oops! No hypotension!!) are features of anaphylaxis. And why do we have ‘Weal’ and ‘Urticaria’ specified separately?

This taxonomic nightmare has two main implications:

  1. Coding will be incoherent and incomplete
  2. Clinicians will be confused.

It may be possible to tease out reason from the structure—but why not make a more well-reasoned structure in the first place?

A well-reasoned alternative
It’s easy to criticise, but for any criticism to be of value, it needs to supply a solid alternative. I could in fact delve even deeper into a multitude of other issues with the NZIPS proposal—but many of the issues are encapsulated above. For example, the section on PROBLEMS (slides 19–20) highlights many of the ‘Allergy/ADR’ issues already discussed. Instead, let’s look at two things, first, a deeper underlying issue, and then a solution.

The core of the deeper issue can be concisely stated as follows:

The message is not the data model.

In other words, it is inappropriate to equate “a set of messages” with “a comprehensive data model”. FHIR seems deliberately ambiguous about this. The documentation states:

The philosophy behind FHIR is to build a base set of resources that, either by themselves or when combined, satisfy the majority of common use cases. FHIR resources aim to define the information contents and structure for the core information set that is shared by most implementations.

On the one hand, FHIR can be seen as a wrapper for HL7 v3 and its reference information model; but the above ‘pragmatic’ statement deliberately shies away from an underlying data model. Nope, it’s just a set of resources. (Perhaps to skirt around the failure of HL7 v3 owing to issues of complexity?)

This observation cuts to the heart of the problem with the NZIPS specification. Implicit in the entire document is the assumption that the FHIR ‘model’ (a messaging model) somehow specifies the data model—without a model ever being apparent! And the use of ‘FHIR’ to ‘glue’ disparate databases together (however basic or sophisticated this is) also seems to fly in the face of it representing a data model. It’s just ‘resources’.

It is correct that there must be ‘impedance matching’—the messaging system must be adequate to convey information without data loss or colouration—and the databases or other communicating entities must have adequate internal representation of data on both ends, or information will be lost. There is also potentially a ‘lowest common denominator’ problem here.

But tacitly assuming that a messaging model defines the data model is wrong in several ways. Let’s look at what a data model requires:

  1. It should be normalized. In the 1970s Codd originated data normalization, epitomised in third normal form. Although subsequent attempts have been made to formalise ‘NoSQL’, these tend not to last (cf. Hadoop’s current issues) and when properly implemented, are still built around relational models with subsequent, careful denormalisation. A corollary of the normalization is that the sort of redundant/conflicting design we’ve explored above should be prevented.

  2. It should be contextually relevant. There are often several ways that a relational structure can be built, but failure to understand the context will result in inefficiencies that are often order-of-magnitude (or greater) problems.

  3. It should be representationally adequate. I’ve already described several key features of scientific medicine above—including the ability to make, test and provisionally accept hypotheses, with a clear representation of provenance. A large part of this embodies causal assertions, and their testing. I have struggled to find even a hint of this in the FHIR model—and no wonder. It’s just messaging, at the end of the day. But it’s extremely complex messaging.

  4. It must be amenable to extension, while maintaining backward compatibility—but if the design is solid, this should almost never be required. Tacking on a table or even revising a structural component is usually an admission that the basic design was bad. The bones must be solid. (This is a huge issue with hierarchical structures that is often brushed under the carpet—similar to the way FHIR invokes its 80/20 rule).

It can be debated at length whether FHIR is up to its task—as a messaging protocol.

What it is not is a data* model. And it is extremely silly to try to amalgamate a lot of complex messages into something that behaves like a data model. Yet this seems to be the intention on which NZIPS is predicated.

There is thus a huge hole here. What is needed is a specification of the underlying data structures that “can be messaged between”. A subsidiary requirement is then to determine whether FHIR is up to the task.

Without such a data model—which I must again emphasise, is distinct from a messaging model, however complex the latter might be—it is unreasonable to expect any such attempts to add much value. It is likely that they will harm clinical medicine, especially in the long term.

What should this model look like then?

I’ve already mentioned core components above, emphasising a normalized relational structure, with unambiguous representation of clinical reasoning and uncertainty—with built in causal inference. The ‘core’ should be adequate in terms of the who, what, where, when, why and how of crucial data elements, rather than butchering or mutilating these.

Happy to discuss this in more detail, but I’ve already gone on too long. I haven’t even talked about how we can anticipate that Bayesian reasoning will (or should) become predominant in the next few decades—and how inadequately the FHIR model will support this. I’ve hardly touched on the “fat-tail” problem with the 80/20 rule.

Conclusion

  1. Any good standard should be well-specified and open source. This isn’t.
  2. It is unreasonable to assume that a messaging standard tacitly defines a relational data model. This puts the cart before the horse. Even more dangerous is embracing an implicit, complex and wildly denormalized ‘model’.
  3. Clinicians require clinically adequate representation of core data elements, in order to do their job. This standard doesn’t provide these.
  4. There seems to be gross dysharmony between the NZIPS and the admittedly rather vague IPS, as emphasised when comparing it with the US version.
  5. Messaging should be simple. Building baroque complexity into messaging results in a multiplicity of issues—well illustrated above.

Clinicians will bear the brunt of this inadequate enterprise.

My 2c, Dr Jo.


*† Subsequent to comments by @pkjordan, I modified the text at these two points to clarify that I’m talking about data models rather than their instantiation (2/6/2022 19:36).