sites

public wiki contents of suckless.org
git clone git://git.suckless.org/sites
Log | Files | Refs

commit bfea05be7798690d75fa3b547e9908d77aa8796d
parent ab029cafc41c976c061eed2e49367e0400fd8fd2
Author: Laslo Hunhold <laslo@hunhold.de>
Date:   Sat,  3 Jan 2026 11:40:55 +0100

Update context for libunistring on the libgrapheme page

Some of the points raised in this old rant are not true (anymore) or
were imprecise/wrong regarding libunistring. Thank you Bruno Haible for
reaching out about this!

Signed-off-by: Laslo Hunhold <dev@frign.de>

Diffstat:
Mlibs.suckless.org/libgrapheme/index.md | 28+++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/libs.suckless.org/libgrapheme/index.md b/libs.suckless.org/libgrapheme/index.md @@ -152,19 +152,21 @@ embedded applications. The problem can be easily seen when looking at the sizes of the respective libraries: The ICU library (libicudata.a, libicui18n.a, libicuio.a, libicutest.a, libicutu.a, libicuuc.a) is around 38MB and libunistring -(libunistring.a) is around 2MB, which is unacceptable for static -linking. Both take many minutes to compile even on a good computer and -require a lot of dependencies, including Python for ICU. On -the other hand libgrapheme (libgrapheme.a) only weighs in at around 300K -and is compiled (including Unicode data parsing and compression) in -under a second, requiring nothing but a C99 compiler and POSIX make(1). - -Some libraries, like libutf8proc and libunistring, are incorrect by -basing their API on assumptions that haven't been true for years -(e.g. offering stateless grapheme cluster segmentation even though the -underlying algorithm is not stateless). As an additional factor, -libutf8proc's UTF-8-decoder is unsafe, as it allows overlong encodings -that can be easily used for exploits. +(libunistring.a) is around 2MB. Both take many minutes to compile even on +a good computer, and ICU depends on Python, among others. On the other hand, +libgrapheme (libgrapheme.a) only weighs in at around 400K and is compiled +(including Unicode data parsing and compression) in under a second, +requiring nothing but a C99 compiler and POSIX make(1). + +Some libraries, like libutf8proc, are incorrect by basing their API on +assumptions that haven't been true for years (e.g. offering stateless +grapheme cluster segmentation even though the underlying algorithm is +not stateless). As an additional factor, libutf8proc's UTF-8-decoder +is unsafe, as it allows overlong encodings that can be easily used for +exploits. While libunistring has expanded their API offering e.g. +u8_grapheme_next() and u8_grapheme_prev() that are standard conformant, +its API still contains not-explicitly deprecated functions assuming +an older data model, for instance uc_is_grapheme_break(). While ICU and libunistring offer a lot of functions and the weight mostly comes from locale-data provided by the Unicode standard, which is applied