commit 1774b5430fe46d8d5511075d3cd644716ad4c3c8
parent 5939cf21cdb050e1c9bce964a30c9ad94f7440b9
Author: Laslo Hunhold <dev@frign.de>
Date: Thu, 6 Oct 2022 22:57:31 +0200
Update README
Signed-off-by: Laslo Hunhold <dev@frign.de>
Diffstat:
M | README | | | 55 | ++++++++++++++++++++++++++++++------------------------- |
1 file changed, 30 insertions(+), 25 deletions(-)
diff --git a/README b/README
@@ -1,25 +1,34 @@
libgrapheme
===========
-The libgrapheme library provides functions to properly handle Unicode
-strings according to the Unicode specification. Unicode strings are made
-up of user-perceived characters (so-called "grapheme clusters") that are
-made up of one or more Unicode codepoints, which in turn are encoded in
-one or more bytes in an encoding like UTF-8.
-
-There is a widespread misconception that it was enough to simply
-determine codepoints in a string and treat them as user-perceived
-characters to be Unicode compliant. While this may work in some cases,
-this assumption quickly breaks, especially for non-Western languages and
-decomposed Unicode strings where user-perceived characters are usually
-represented using multiple codepoints.
-
-Despite the complicated multilevel structure of Unicode strings,
-libgrapheme provides methods to work with them at the byte-level (i.e.
-UTF-8 ‘char’ arrays) while also providing codepoint-level methods.
-
-See libgrapheme(7) to get started and try out the self-contained examples
-given on the manual pages for each function.
+libgrapheme is an extremely simple freestanding C99 library providing
+utilities for properly handling strings according to the latest Unicode
+standard 15.0.0. It offers fully Unicode compliant
+
+ - grapheme cluster (i.e. user-perceived character) segmentation
+ - word segmentation
+ - sentence segmentation
+ - detection of permissible line break opportunities
+ - case detection (lower-, upper- and title-case)
+ - case conversion (to lower-, upper- and title-case)
+
+on UTF-8 strings and codepoint arrays, which both can also be
+null-terminated.
+
+The necessary lookup-tables are automatically generated from the Unicode
+standard data (contained in the tarball) and heavily compressed. Over
+10,000 automatically generated conformance tests and over 150 unit tests
+ensure conformance and correctness.
+
+There is no complicated build-system involved and it's all done using one
+POSIX-compliant Makefile. All you need is a C99 compiler, given the
+lookup-table-generators and compressors are also written in C99. The
+resulting library is freestanding and thus not even dependent on a
+standard library to be present at runtime, making it a suitable choice
+for bare metal applications.
+
+It is also way smaller and much faster than the other established
+Unicode string libraries (ICU, GNU's libunistring, libutf8proc).
Requirements
------------
@@ -38,15 +47,11 @@ Afterwards enter the following command to build and install libgrapheme
Conformance
-----------
The libgrapheme library is compliant with the Unicode 15.0.0
-specification (September 2022).
-
-To ensure conformance, libgrapheme includes hundreds of tests including
-all provided with the standard-provided test-data that is parsed
-automatically. The tests can be run with
+specification (September 2022). The tests can be run with
make test
-to check standard conformance.
+to check standard conformance and correctness.
Usage
-----