WebSphere MQ Telemetry Transport and UTF-8

UTF-8 is an efficient encoding of Unicode character-strings that optimizes the encoding of ASCII characters in support of text-based communications.

The WebSphere MQ Telemetry Transport protocol uses a subset of UTF-8. Only single byte (non-extended) characters are supported.

The UTF string format is shown in the table below.

bit 7 6 5 4 3 2 1 0
byte 1 Message Length MSB
byte 2 Message Length LSB
bytes 3 ... Encoded Character Data

Message Length is the number of bytes of encoded string characters, not the number of characters. For ASCII strings, however, these are the same. The format of encoded characters for ASCII codes 0x01 to 0x7F are shown in the table below.

bit 7 6 5 4 3 2 1 0
  0 ASCII code of character

For example, the ASCII text string OTWP is encoded in UTF-8 as shown in the table below.

bit 7 6 5 4 3 2 1 0
byte 1 Message Length MSB (0x00)
  0 0 0 0 0 0 0 0
byte 2 Message Length LSB (0x04)
  0 0 0 0 0 1 0 0
byte 3 'O' (0x4F)
  0 1 0 0 1 1 1 1
byte 4 'T' (0x54)
  0 1 0 1 0 1 0 0
byte 5 'W' (0x57)
  0 1 0 1 0 1 1 1
byte 6 'P' (0x50)
  0 1 0 1 0 0 0 0

The Java writeUTF() and readUTF() data stream methods use this format.

Related concepts
WebSphere MQ Telemetry Transport
Related reference
WebSphere MQ Telemetry Transport topic name