Planning for MBCS Data

The use of multi-byte character set (MBCS) data with Oracle, DB2, or IBM® DB2® for z/OS® has specific database considerations, which are covered in the Cúram Third-Party Tools Installation Guide for Windows and Cúram Third-Party Tools Installation Guide for UNIX. However, for MBCS support with DB2 or DB2 for z/OS specific Curam configuration is required, which impacts the behavior of the Data Manager.

Cúram support for MBCS data with DB2 and DB2 for z/OS is enabled out-of-the-box to ensure error-free operation for users with languages requiring MBCS data and for users who find they require MBCS data when copying/pasting data from other applications. This support entails expanding the size of string columns in the database because DB2 column sizes are based on bytes, which is not necessarily the length required when MBCS data is used. This is explained in more detail in the Cúram Third-Party Tools Installation Guide for Windows and Cúram Third-Party Tools Installation Guide for UNIX. However, these default expansion settings may not be appropriate for those using only Western languages (i.e., SBCS data) and you should consider disabling this support or, for MBCS data, reducing the default expansion factor. Whether database expansion is applied by the Data Manager is controlled by the curam.db.multibyte.expansion property in Bootstrap.properties. The amount of expansion (a factor of 1.0 to 4.0) is set with the curam.db.multibyte.default.factor property in Bootstrap.properties. These properties are described in Cúram Configuration Parameters.

To be 100% sure of no processing errors when processing MBCS data the maximum expansion factor is the default out-of-the-box. However, for many languages and data profiles it's unlikely that every database column character would require MBCS data or that all characters would require the maximum size of 4 bytes. Since there is a cost associated with using the maximum expansion factor in terms of disk space used, network overhead, memory utilization, buffer pool performance, CPU utilization, etc., it is best to use an expansion factor that balances resource utilization and performance while avoiding or minimizing the possibility of application errors caused by data overruns. There are no strict rules for achieving a balance between resource utilization and the possibility of application errors; but, some considerations can help you choose a reasonable expansion factor and your testing should confirm your choice.

Depending on your language, locale, and encoding the number of required MBCS characters will vary. For instance, if you are using English with only a few special characters (e.g. smart quotes) you will require very little expansion. Or, if you are using a language that shares the Latin alphabet with some additional characters (e.g. German) then you will need more space for MBCS data. A language (e.g. Chinese) that utilizes characters at the higher end of the Unicode range will require more space per character, which needs to be tempered by the number of characters required per word; i.e., the language may convey more information in each character than a typical Latin alphabetic character. In other words, consider the average bytes required per character, word, etc. Typically this average is only a rough estimate because, as studies have shown, character usage can vary depending on a number of factors; e.g. data context, data that is more numeric (phone numbers), versus more textual data (names) and even free-form comments. So, some additional safety factor should be considered in choosing your expansion factor.

You also have the ability to control the expansion factor at a more fine-grained level in the modeling environment by specifying theMultibyte_Expansion_Factor option for a string domain and/or entity string attribute, which may be appropriate for your customizations. See the Cúram Modeling Reference Guide for more information on setting these options. You may need to set these fine-grained expansions at this level due to various limits within DB2 and DB2 for z/OS regarding the size of rows, indexes, etc. that can be exceeded by large expansion factors (see the relevant DB2 or DB2 for z/OS SQL reference for more information on these limits).