Monday, January 27, 2014

Character Sets in HL7v2

If you work with HL7, you need to understand UTF-8. It's the standard and unless you see another character specified in message segment MSH-18, that's what you use. A typical HL7 message will have an empty MSH-18 segment and by default all visible ASCII characters in UTF-8 (hexidecimal 20-7E) are legal.

That simplifies things but there's still a lot to watch out for, especially at computer boundaries. You may be sending messages from Windows to a Linux server, or you may be editing files in a text editor that's configured to a different character set and feeding those into your message client.

There's excellent material on what every programmer needs to know about

Version 2 Character Sets and Encoding from Health Intersections, a paper by someone who had just finished rewriting his v2 parser.

The Unicode FAQ from Unicode.org

What Every Programmer Should Know About Unicode (no excuses!)

From this, we have the need for an HL7 message that can test how a system handles the complete set of visible UTF-8 characters. Start with the UTF-8 Test File from W3C, take any simple HL7 message and add a note segment for every legal character, like this:

NTE|2||0020   SPACE
NTE|2||0021 ! EXCLAMATION MARK
NTE|2||0022 " QUOTATION MARK
NTE|2||0023 # NUMBER SIGN
NTE|2||0024 $ DOLLAR SIGN
NTE|2||0025 % PERCENT SIGN
NTE|2||0026 & AMPERSAND
NTE|2||0027 ' APOSTROPHE
NTE|2||0028 ( LEFT PARENTHESIS
NTE|2||0029 ) RIGHT PARENTHESIS
NTE|2||002A * ASTERISK
NTE|2||002B + PLUS SIGN
NTE|2||002C , COMMA
NTE|2||002D - HYPHEN-MINUS
NTE|2||002E . FULL STOP
NTE|2||002F / SOLIDUS
NTE|2||0030 0 DIGIT ZERO
NTE|2||0031 1 DIGIT ONE
NTE|2||0032 2 DIGIT TWO
NTE|2||0033 3 DIGIT THREE


...and so on.

You can download a simple lab result test message here or create your own with other character sets and neat tricks like embedding escape sequences and changing encoding mid-stream (but for the sake of the rest of us who might have to read your data please don't). Then feed that into an HL7 client program, such as the free and open source HAPI Test Panel, and run it end to end through the system. If it comes out the other side the same way it went in, you're golden. Happy encoding!

No comments: