libgrapheme

unicode string library
git clone git://git.suckless.org/libgrapheme
Log | Files | Refs | README | LICENSE

commit 1774b5430fe46d8d5511075d3cd644716ad4c3c8
parent 5939cf21cdb050e1c9bce964a30c9ad94f7440b9
Author: Laslo Hunhold <dev@frign.de>
Date:   Thu,  6 Oct 2022 22:57:31 +0200

Update README

Signed-off-by: Laslo Hunhold <dev@frign.de>

Diffstat:
MREADME | 55++++++++++++++++++++++++++++++-------------------------
1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/README b/README @@ -1,25 +1,34 @@ libgrapheme =========== -The libgrapheme library provides functions to properly handle Unicode -strings according to the Unicode specification. Unicode strings are made -up of user-perceived characters (so-called "grapheme clusters") that are -made up of one or more Unicode codepoints, which in turn are encoded in -one or more bytes in an encoding like UTF-8. - -There is a widespread misconception that it was enough to simply -determine codepoints in a string and treat them as user-perceived -characters to be Unicode compliant. While this may work in some cases, -this assumption quickly breaks, especially for non-Western languages and -decomposed Unicode strings where user-perceived characters are usually -represented using multiple codepoints. - -Despite the complicated multilevel structure of Unicode strings, -libgrapheme provides methods to work with them at the byte-level (i.e. -UTF-8 ‘char’ arrays) while also providing codepoint-level methods. - -See libgrapheme(7) to get started and try out the self-contained examples -given on the manual pages for each function. +libgrapheme is an extremely simple freestanding C99 library providing +utilities for properly handling strings according to the latest Unicode +standard 15.0.0. It offers fully Unicode compliant + + - grapheme cluster (i.e. user-perceived character) segmentation + - word segmentation + - sentence segmentation + - detection of permissible line break opportunities + - case detection (lower-, upper- and title-case) + - case conversion (to lower-, upper- and title-case) + +on UTF-8 strings and codepoint arrays, which both can also be +null-terminated. + +The necessary lookup-tables are automatically generated from the Unicode +standard data (contained in the tarball) and heavily compressed. Over +10,000 automatically generated conformance tests and over 150 unit tests +ensure conformance and correctness. + +There is no complicated build-system involved and it's all done using one +POSIX-compliant Makefile. All you need is a C99 compiler, given the +lookup-table-generators and compressors are also written in C99. The +resulting library is freestanding and thus not even dependent on a +standard library to be present at runtime, making it a suitable choice +for bare metal applications. + +It is also way smaller and much faster than the other established +Unicode string libraries (ICU, GNU's libunistring, libutf8proc). Requirements ------------ @@ -38,15 +47,11 @@ Afterwards enter the following command to build and install libgrapheme Conformance ----------- The libgrapheme library is compliant with the Unicode 15.0.0 -specification (September 2022). - -To ensure conformance, libgrapheme includes hundreds of tests including -all provided with the standard-provided test-data that is parsed -automatically. The tests can be run with +specification (September 2022). The tests can be run with make test -to check standard conformance. +to check standard conformance and correctness. Usage -----