Go to the first, previous, next, last section, table of contents.


Collating Elements vs. Characters

POSIX generalizes the notion of a character to that of a collating element. It defines a collating element to be "a sequence of one or more bytes defined in the current collating sequence as a unit of collation."

This generalizes the notion of a character in two ways. First, a single character can map into two or more collating elements. For example, the German collates as the collating element `s' followed by another collating element `s'. Second, two or more characters can map into one collating element. For example, the Spanish `ll' collates after `l' and before `m'.

Since POSIX's "collating element" preserves the essential idea of a "character," we use the latter, more familiar, term in this document.


Go to the first, previous, next, last section, table of contents.