Update context for libunistring on the libgrapheme page - sites

commit bfea05be7798690d75fa3b547e9908d77aa8796d
parent ab029cafc41c976c061eed2e49367e0400fd8fd2
Author: Laslo Hunhold <laslo@hunhold.de>
Date:   Sat,  3 Jan 2026 11:40:55 +0100

Update context for libunistring on the libgrapheme page

Some of the points raised in this old rant are not true (anymore) or
were imprecise/wrong regarding libunistring. Thank you Bruno Haible for
reaching out about this!

Signed-off-by: Laslo Hunhold <dev@frign.de>

Diffstat:
M libs.suckless.org/libgrapheme/index.md  | 28 +++++++++++++++-------------

1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/libs.suckless.org/libgrapheme/index.md b/libs.suckless.org/libgrapheme/index.md
@@ -152,19 +152,21 @@ embedded applications.
 The problem can be easily seen when looking at the sizes of the respective
 libraries: The ICU library (libicudata.a, libicui18n.a, libicuio.a,
 libicutest.a, libicutu.a, libicuuc.a) is around 38MB and libunistring
-(libunistring.a) is around 2MB, which is unacceptable for static
-linking. Both take many minutes to compile even on a good computer and
-require a lot of dependencies, including Python for ICU. On
-the other hand libgrapheme (libgrapheme.a) only weighs in at around 300K
-and is compiled (including Unicode data parsing and compression) in
-under a second, requiring nothing but a C99 compiler and POSIX make(1).
-
-Some libraries, like libutf8proc and libunistring, are incorrect by
-basing their API on assumptions that haven't been true for years
-(e.g. offering stateless grapheme cluster segmentation even though the
-underlying algorithm is not stateless). As an additional factor,
-libutf8proc's UTF-8-decoder is unsafe, as it allows overlong encodings
-that can be easily used for exploits.
+(libunistring.a) is around 2MB. Both take many minutes to compile even on
+a good computer, and ICU depends on Python, among others. On the other hand,
+libgrapheme (libgrapheme.a) only weighs in at around 400K and is compiled
+(including Unicode data parsing and compression) in under a second,
+requiring nothing but a C99 compiler and POSIX make(1).
+
+Some libraries, like libutf8proc, are incorrect by basing their API on
+assumptions that haven't been true for years (e.g. offering stateless
+grapheme cluster segmentation even though the underlying algorithm is
+not stateless). As an additional factor, libutf8proc's UTF-8-decoder
+is unsafe, as it allows overlong encodings that can be easily used for
+exploits. While libunistring has expanded their API offering e.g.
+u8_grapheme_next() and u8_grapheme_prev() that are standard conformant,
+its API still contains not-explicitly deprecated functions assuming
+an older data model, for instance uc_is_grapheme_break().
 
 While ICU and libunistring offer a lot of functions and the weight mostly
 comes from locale-data provided by the Unicode standard, which is applied

	sites public wiki contents of suckless.org
	git clone git://git.suckless.org/sites
	Log \| Files \| Refs