commit 42e58c7d3a921540f5d901b80a0cc75e234b02e9
parent c7021f101ce95bff58157fb32c50d204cf8569b2
Author: Laslo Hunhold <dev@frign.de>
Date: Wed, 22 Dec 2021 15:20:27 +0100
Add a remark on standard conformance in README
Signed-off-by: Laslo Hunhold <dev@frign.de>
Diffstat:
6 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/README b/README
@@ -7,6 +7,13 @@ up of user-perceived characters (so-called "grapheme clusters") that are
made up of one or more Unicode codepoints, which in turn are encoded in
one or more bytes in an encoding like UTF-8.
+There is a widespread misconception that it was enough to simply
+determine codepoints in a string and treat them as user-perceived
+characters to be Unicode compliant. While this may work in some cases,
+this assumption quickly breaks, especially for non-Western languages and
+decomposed Unicode strings where user-perceived characters are usually
+represented using multiple codepoints.
+
Despite the complicated multilevel structure of Unicode strings,
libgrapheme provides methods to work with them at the byte-level (i.e.
UTF-8 ‘char’ arrays) while also providing codepoint-level methods.
@@ -28,6 +35,19 @@ Afterwards enter the following command to build and install libgrapheme
make install
+Conformance
+-----------
+The libgrapheme library is compliant with the Unicode 14.0.0
+specification (September 2021).
+
+To ensure conformance, libgrapheme includes hundreds of tests including
+all provided with the standard-provided test-data that is parsed
+automatically. The tests can be run with
+
+ make test
+
+to check standard conformance.
+
Usage
-----
Include the header grapheme.h in your code and link against libgrapheme
diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3
@@ -1,4 +1,4 @@
-.Dd 2021-12-19
+.Dd 2021-12-22
.Dt GRAPHEME_DECODE_UTF8 3
.Os suckless.org
.Sh NAME
diff --git a/man/grapheme_encode_utf8.3 b/man/grapheme_encode_utf8.3
@@ -1,4 +1,4 @@
-.Dd 2021-12-17
+.Dd 2021-12-22
.Dt GRAPHEME_ENCODE_UTF8 3
.Os suckless.org
.Sh NAME
diff --git a/man/grapheme_is_character_break.3 b/man/grapheme_is_character_break.3
@@ -1,4 +1,4 @@
-.Dd 2021-12-18
+.Dd 2021-12-22
.Dt GRAPHEME_IS_CHARACTER_BREAK 3
.Os suckless.org
.Sh NAME
diff --git a/man/grapheme_next_character_break.3 b/man/grapheme_next_character_break.3
@@ -1,4 +1,4 @@
-.Dd 2021-12-18
+.Dd 2021-12-22
.Dt GRAPHEME_NEXT_CHARACTER_BREAK 3
.Os suckless.org
.Sh NAME
diff --git a/man/libgrapheme.7 b/man/libgrapheme.7
@@ -1,4 +1,4 @@
-.Dd 2021-12-19
+.Dd 2021-12-22
.Dt LIBGRAPHEME 7
.Os suckless.org
.Sh NAME
@@ -18,11 +18,22 @@ see
that are made up of one or more Unicode codepoints, which in turn
are encoded in one or more bytes in an encoding like UTF-8.
.Pp
+There is a widespread misconception that it was enough to simply
+determine codepoints in a string and treat them as user-perceived
+characters to be Unicode compliant.
+While this may work in some cases, this assumption quickly breaks,
+especially for non-Western languages and decomposed Unicode strings
+where user-perceived characters are usually represented using multiple
+codepoints.
+.Pp
Despite this complicated multilevel structure of Unicode strings,
.Nm
provides methods to work with them at the byte-level (i.e. UTF-8
.Sq char
arrays) while also offering codepoint-level methods.
+.Pp
+Every documented function's manual page provides a self-contained
+example illustrating the possible usage.
.Sh SEE ALSO
.Xr grapheme_decode_utf8 3 ,
.Xr grapheme_encode_utf8 3 ,