Databases for health

hamish.luebbers · September 15, 2024, 8:21pm

From what I’ve seen, the only thing driving one database type over another in the world of EHR/PAS/other health systems is the sales team behind the product. Almost every commercial health IT product on the market is old software whose design decisions many years ago were made long before FHIR and sometimes even HL7 was considered. Most are closed systems where the customer (us) was never intended to have much access to the core database.
However we like to have full database access, as this allows us to enhance functionality where the product is lacking (where contractually permitted), and understand and transfer the raw data so that it can be stored usefully in our data warehouses and for analysis by the wider health sector.

For example, the FHIR-based system I was involved with deploying recently used a NoSQL back end purely because the (non-health) vendor suggested it would be easier to implement with FHIR - and it was, from their perspective, because they didn’t need to understand our (highly normalised) data in order to build the database schema. Later in the project it dawned on them that they couldn’t present the data we needed without building relations within the NoSQL database - thereby duplicating data many times over (in order to create pseudo relationships in an inherently non-relational schema), and adding workload to many routine processes (e.g. to ensure that each instance of duplicate data is maintained appropriately). This extra complexity comes with a small ongoing financial cost for the extra disk space and processing time required daily, doesn’t scale well, and negates the efficiency benefits on a non-relational database.

keithduddy · September 16, 2024, 3:50am

Thanks for being the voice of reason/friendly bucket of coolish water here Peter, as usual.

In my analysis of the meeting up of specifications, implementations, and user-facing functionality I always have the Reference Model for Open Distributed Processing - RM-ODP in the back of my mind. It’s an ISO/ITU-T set of specifications from the 1990s, and aside from it’s insistence on being “object oriented”, the main conceptual approach to the post-networked computing world still provides the same analytical power today, in my view.

It’s first question is “From what viewpoint are you viewing the system?”
[These are my mental encapsulations of them - more formal wording & perhaps a different emphasis is provided via the links below]

Enterprise (what do the people and organisations need to communicate about),

Information (Data structures and standards for data representation and interchange that derived from the Enterprise requirements),

Computational (the APIs, for specific purposes, facing outwards from systems that support the transfer and use of the information across the Enterprise ),

Technology (what technology stacks are available, affordable, understood, or already in use, and what do you need build bespoke)

Engineering (how do I structure the existing, and new systems together down in the computers and networks - or up in the cloud - for security and efficiency, and what kinds of data conversion is needed when the information flows from one system to another).

Most of the time, most of the people who need a distributed system should be working in the Enterprise viewpoint, and should not need to care about much more that the Information, and sometimes the Computational viewpoints. The Enterprise viewpoint also allows structuring for roles, process and the like.

The Technology and Engineering viewpoints should be “transparent” to Enterprise stakeholders (in RM-ODP language), meaning, not visible or relevant to any but those who are responsible for designing the gnarly tangle of integration, interoperation, location and platforms that make up the system produced for the Enterprises’ well-specified needs.

(The RM-ODP main site seems to have HTTPS certificate issues at the moment: https://www.rm-odp.net/ - but works if you tell your browser not to worry, but Wikipedia has a good overview: https://en.wikipedia.org/wiki/RM-ODP).

This is a highly developed, technology-neutral way of talking about distributed enterprise systems that encapsulates the 1st year computer science dictums: separation of concerns, modularity, and information hiding.

Keith

DrJo · October 10, 2024, 1:05am

Hi Hamish

The ‘obvious’ solution—that will never happen—is a relatively simple, properly normalised relational model that is open source and free,* mandated, based on well-reasoned standards (or where these standards have a huge amount of batshit craziness in them, like SNOMED CT, a sensible subset) and a common data dictionary, and that is very loosely coupled to the functional layer above it.

Oh, it may happen in about 50 years time, after everyone has repeated NPfIT a dozen times, and blown uncountable billions out of their bottoms for the very reasons explored in the above discussion

Cheers, Jo.

*Free as in GPL, not ‘free beer’.

keithduddy · October 10, 2024, 7:27am

Hi Jo, all,

I know you have probably done some reasonable normalisation of some heath data models into RDBMS table schemas in the past. But if you did database studies in 1980s Computer Science courses (and I don’t think they even teach this now) you’ll know that normalisation is a mathematical concept for getting the right trade-off between low data storage redundancy, and low cost of joining tables, while keeping the cardinality constraints you need. You can choose. From lowest join cost to highest: Unnormalised, 3rd Normal Form, Boyce Codd Normal Form, 4th or 5th Normal Form… or you can just wing it and risk that you get redundant copies of data that are not replicated correctly, or force an uneccesary join every time you make a query. But if you start with a logical model of the data and its relationships, you can use a methodology, or algorithm (implemented by tools) to split the set of relations you want to store into tables with uniqueness constraints, and foreign keys into related tables (or map an Object Oriented models to Relations - which is now fairly well established, and part of the reason why people don’t teach or learn the proof criteria for the Normal Forms).

I was first exposed to a hueristic method: (NIAM) for mapping fact-based models to tables to create an asserted “Optimal Normal Form”, which was not rigorously defined, but could be placed between 3rd and 4th normal form, which is where most database experts aim for to balance the need for too many table joins against the need fully populated rows. The next year I was shown how to prove that a schema met any normal form criteria. This is something I have never used in practical database design, as the utility of expressing data schemas as ER diagrams, or UML Class diagrams, or ORM and the re-using a table generation which guarantees those properties by using an algorithm is so high that I wouldn’t want to do this by hand ever again.

In short: there is no “properly normalised” data schema. Sometimes you want a highly unnormalised schema (one table for everything) for real-time retrieval benefits, which requires external verification of cardinalities/uniqueness which could be encoded into the table design if you allowed for joins. Sometimes (usually for research purposes) you want 5th normal form where every row in every table is fully populated, but you have a lot more tables, and have to do several joins to get at all the data you want (and write longer and more complex SQL queries).

It’s tempting when you start out having to use the SQL hammer directly to treat every data modelling problem as a table-design nail. I think the successor of NIAM and Optimal Normal Form – Object Role Modelling – is probably still the easiest tool for domain people to use to get visual models mapped appropriately into normalised table designs. ORM is still out there (initially hidden in Viso, and then Microsoft approaches database design toooling), and was developed and taken into industry by Terry Halpin, based in the foundations of NIAM. But ER or OO to table design automation is widely available out of the box, and then you don’t have to risk making mistakes which you may be prone to making if you don’t know how to mathematically prove which Normal Form you’re producing.

cheers,
Keith

NathanK · October 10, 2024, 7:58pm

I added the reaction specifically for your post! The more I learn about databases, the less I realise I truly understand…

pkjordan · October 10, 2024, 8:58pm

A lot of implementations now use DB Views generated by pre-optimized Table-Valued Functions or Stored Procedures invoked by Application DB Access Layers and converted to Business Layer Classes via Object Relational Mappers (ORMs). This enables a separation of concerns that leaves DB design and optimization to the technical experts who require deep knowledge of the relevant DBMS tooling.

Therefore, IMO, this is not a health informatics question per se, although obviously of interest to the more technically-minded members of this community. How information is organized at the Application Business Layer and presented by UIs is far more significant.

keithduddy · October 11, 2024, 1:55pm

As usual, Peter makes a more pertinent, and diplomatic response.

I felt like a pious curmudgeon about 10 minutes after posting that screed… I’ll try and pause before hitting the send button next time.

cheers,
K

pkjordan · October 11, 2024, 11:33pm

I think you’re being a little hard on yourself there, @keithduddy. I always enjoy your sagacious contributions.

From an informatics perspective, much depends on how one wishes to query and analyze data and whether that method is tethered to the persistence format. Filtering data is generally best done at the DB layer (e.g. by SQL variants & objects in the case of RDBMS) and then structuring the data that is returned into a format that aids transformation to application layer information models and use by domain-specific languages. A good example of that is the SNOMED CT distribution which many of us persist in a RDBMS, but transform into a format that facilitates ECL queries.

atalagk · October 22, 2024, 1:26am

All the reason to employ fit for purpose standards and methods for storing and managing healthcare data. IMHO, one shouldn’t need to go down to physical persistence level for data querying, filtering, linking etc. Just pick a solution that offers standardised APIs for storage and querying - FHIR and openEHR. With the latter you get your entire EHR in a vendor-neutral and future-proof manner and use FHIR to exchange data with (huge number of) non-openEHR systems - Ref. Plus you can bind codes from a terminology of choice (e.g. SNOMED, LOINC) to certain data elements of interest and this allows for querying based on terminology. If there’s any database operation that you cannot make via APIs the API specs need to be extended and implemented. Besides touching data at database level, if not careful / tool supported, may end up in inconsistencies (e.g. referential integrity).

pkjordan · October 22, 2024, 9:24am

@atalagk APIs do not represent the Data Access Layer of distributed software applications. APIs are software contracts that express capabilities executed at the underlying business layers of an application. If data is persisted in a database then requests have to be made to that DB and they will be executed by the RDBMS. Whether that is done by dynamically generated queries or calls to DB programmability objects, such as stored procedures, is a software design concern - but to provide a clue, numerous tests over the years have shown that, if performance is a concern (and when is it not?), facilitating stored execution plans on the DB server is a no-brainer.

atalagk · October 22, 2024, 10:25am

Hi @pkjordan thanks for the feedback. I can’t say I don’t agree with you on this. Though, some (including myself) would argue that solutions modern cloud based software applications using microservices based architectures do use APIs to do all sorts of things with good performance, scalability and reliability. It really boils down to what we mean by “databases for health”. If it’s a fully-fledged EHR service (I think) I’d stand by my argument. If it’s a specific service or a monolithic system I’d agree lower level access to persistence layer would be more practical. Without defining the “Database” here I don’t think anyone will win the argument

atalagk · October 22, 2024, 10:38am

Many openEHR implementations, including the open source EHRbase, use RDMS!

openEHR has a specific query language (AQL) which under the hood transforms queries into SQL at the DB level. See: https://discourse.openehr.org/t/why-use-aql-instead-of-sql-in-openehr/2719/3

pkjordan · October 22, 2024, 11:39pm

Interesting @atalagk Is there a ‘standard’ RDBMS Schema for persisting archetypes? As you’re probably aware, there is a group within HL7 that’s creating such a schema for the FHIR Resource Model (SQL on FHIR). I was well aware of AQL - in fact its primary author Ma Chunlan introduced herself to me at MedInfo last year. Parsing DSLs like AQL (and SNOMED ECL) and mapping to SQL is always a challenge unless there is open source tooling to assist. I found this to be an insightful discussion on the AQL v SQL issue and this on mapping AQL to SQL. I wonder how a DB expert from outside of the tribal affiliations of the healthcare domain might view this?

atalagk · October 24, 2024, 1:54am

Hi @pkjordan as per the second thread you gave there’s no standard RDBMS schema but ones that I have worked with / looked at (such as Ocean which uses SQL Server and EHRbase default PostgreSQL) the tables correspond to EHR RM entities (e.g. a table for COMPOSITION, another for OBSERVATION etc.) I found THIS from EHRbase GitHub repo which seems to be related with the DB schema / mappings.
You can also find information about AQL builder tooling HERE
Hope this is useful. I’m going to the openEHR conference in the beginning of November and try to get more up to date information on this. Seemingly, AQL is far from perfect but works like a charm for simpler queries and the CONTAINs clause removes the need for making multiple joins.

pkjordan · October 24, 2024, 6:40am

Thanks @atalagk. I look forward to hearing and/or reading your takeaways from the openEHR conference. Perhaps we can create a thread on this forum for that and reflections from the attendees of other recent digital health standards conferences (e.g. Matt Valentine and I who are currently in Seoul at the SNOMED International Business Meeting & Expo).

atalagk · October 25, 2024, 11:18am

@pkjordan sounds like a good idea - I’ll create a new thread on my return.

mca · October 29, 2024, 9:42pm

hi @atalagk , thanks for your thoughts
i think openEHR is interesting and keen to hear more about your experience

DrJo · November 28, 2024, 1:36am

It's fun to come back to this topic and read all of the comments

As I see it, they completely validate my position i.e. I’m shouting at clouds.

You will recall that I mentioned:

a relatively simple, properly normalised relational model

In response I obtained:

A lecture from @keithduddy on normalisation theory that completely fails (apologies, but it does) to get where Ted Codd was coming from.
An implementation note from @pkjordan.
Lots of discussion about “implementing archetypes” in RDMSes.
Hints that because perfection is unobtainable, we should accept any old thing.

Hey guys, did you not spot the word ‘model’?

Would you accept the concept that the starting point should be a properly designed model?

Hey, there’s a huge one. Cumulonimbus, I think.

Damn cloud!!

Dr Jo.

KarenDay · November 28, 2024, 2:23am

Nice model shape Looks like it has a lot of potential (some scary, some interesting, some plain ornery) What would your starting point be?

DrJo · November 28, 2024, 3:20am

My starting points are always principles:

Keep it simple.
Define your terms.
“Science it”. Build it around a good model of how science works: start with identifying your problems, and then iteratively create, test, and refute models.

This is the complete antithesis of (a) almost all approaches currently taken to building EHRs; and (b) Our current governmental initiatives, which are manifestly busy destroying health care in New Zealand, likely for decades to come.

These are cirrostratus…

Damn clouds!!

Dr Jo