Add a remark on standard conformance in README - libgrapheme

commit 42e58c7d3a921540f5d901b80a0cc75e234b02e9
parent c7021f101ce95bff58157fb32c50d204cf8569b2
Author: Laslo Hunhold <dev@frign.de>
Date:   Wed, 22 Dec 2021 15:20:27 +0100

Add a remark on standard conformance in README

Signed-off-by: Laslo Hunhold <dev@frign.de>

Diffstat:
M README  | 20 ++++++++++++++++++++
M man/grapheme_decode_utf8.3  | 2 +-
M man/grapheme_encode_utf8.3  | 2 +-
M man/grapheme_is_character_break.3  | 2 +-
M man/grapheme_next_character_break.3  | 2 +-
M man/libgrapheme.7  | 13 ++++++++++++-

6 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/README b/README
@@ -7,6 +7,13 @@ up of user-perceived characters (so-called "grapheme clusters") that are
 made up of one or more Unicode codepoints, which in turn are encoded in
 one or more bytes in an encoding like UTF-8.
 
+There is a widespread misconception that it was enough to simply
+determine codepoints in a string and treat them as user-perceived
+characters to be Unicode compliant. While this may work in some cases,
+this assumption quickly breaks, especially for non-Western languages and
+decomposed Unicode strings where user-perceived characters are usually
+represented using multiple codepoints.
+
 Despite the complicated multilevel structure of Unicode strings,
 libgrapheme provides methods to work with them at the byte-level (i.e.
 UTF-8 ‘char’ arrays) while also providing codepoint-level methods.
@@ -28,6 +35,19 @@ Afterwards enter the following command to build and install libgrapheme
 
 	make install
 
+Conformance
+-----------
+The libgrapheme library is compliant with the Unicode 14.0.0
+specification (September 2021).
+
+To ensure conformance, libgrapheme includes hundreds of tests including
+all provided with the standard-provided test-data that is parsed
+automatically. The tests can be run with
+
+	make test
+
+to check standard conformance.
+
 Usage
 -----
 Include the header grapheme.h in your code and link against libgrapheme
diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3
@@ -1,4 +1,4 @@
-.Dd 2021-12-19
+.Dd 2021-12-22
 .Dt GRAPHEME_DECODE_UTF8 3
 .Os suckless.org
 .Sh NAME
diff --git a/man/grapheme_encode_utf8.3 b/man/grapheme_encode_utf8.3
@@ -1,4 +1,4 @@
-.Dd 2021-12-17
+.Dd 2021-12-22
 .Dt GRAPHEME_ENCODE_UTF8 3
 .Os suckless.org
 .Sh NAME
diff --git a/man/grapheme_is_character_break.3 b/man/grapheme_is_character_break.3
@@ -1,4 +1,4 @@
-.Dd 2021-12-18
+.Dd 2021-12-22
 .Dt GRAPHEME_IS_CHARACTER_BREAK 3
 .Os suckless.org
 .Sh NAME
diff --git a/man/grapheme_next_character_break.3 b/man/grapheme_next_character_break.3
@@ -1,4 +1,4 @@
-.Dd 2021-12-18
+.Dd 2021-12-22
 .Dt GRAPHEME_NEXT_CHARACTER_BREAK 3
 .Os suckless.org
 .Sh NAME
diff --git a/man/libgrapheme.7 b/man/libgrapheme.7
@@ -1,4 +1,4 @@
-.Dd 2021-12-19
+.Dd 2021-12-22
 .Dt LIBGRAPHEME 7
 .Os suckless.org
 .Sh NAME
@@ -18,11 +18,22 @@ see
 that are made up of one or more Unicode codepoints, which in turn
 are encoded in one or more bytes in an encoding like UTF-8.
 .Pp
+There is a widespread misconception that it was enough to simply
+determine codepoints in a string and treat them as user-perceived
+characters to be Unicode compliant.
+While this may work in some cases, this assumption quickly breaks,
+especially for non-Western languages and decomposed Unicode strings
+where user-perceived characters are usually represented using multiple
+codepoints.
+.Pp
 Despite this complicated multilevel structure of Unicode strings,
 .Nm
 provides methods to work with them at the byte-level (i.e. UTF-8
 .Sq char
 arrays) while also offering codepoint-level methods.
+.Pp
+Every documented function's manual page provides a self-contained
+example illustrating the possible usage.
 .Sh SEE ALSO
 .Xr grapheme_decode_utf8 3 ,
 .Xr grapheme_encode_utf8 3 ,

	libgrapheme unicode string library
	git clone git://git.suckless.org/libgrapheme
	Log \| Files \| Refs \| README \| LICENSE

M	README	\|	20	++++++++++++++++++++
M	man/grapheme_decode_utf8.3	\|	2	+-
M	man/grapheme_encode_utf8.3	\|	2	+-
M	man/grapheme_is_character_break.3	\|	2	+-
M	man/grapheme_next_character_break.3	\|	2	+-
M	man/libgrapheme.7	\|	13	++++++++++++-