libgrapheme

unicode string library
git clone git://git.suckless.org/libgrapheme
Log | Files | Refs | README | LICENSE

commit 79ff57ed9cab260e7051d1a9a5e4135921776acd
parent cdaeb8c0b808cd3f708f2cefd62d767ee82144ef
Author: Laslo Hunhold <dev@frign.de>
Date:   Sat, 17 Oct 2020 18:26:51 +0200

Add sentence explaining the need for grapheme cluster handling

Thanks Silvan for suggesting this!

Signed-off-by: Laslo Hunhold <dev@frign.de>

Diffstat:
Mman/libgrapheme.7 | 13+++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/man/libgrapheme.7 b/man/libgrapheme.7 @@ -98,12 +98,13 @@ this way and represents an abstract character is called a .Dq grapheme cluster . .Pp In many applications it is necessary to count the number of -user-perceived characters, i.e. grapheme clusters, in a string. This is -pretty simple with ASCII-strings, where you just count the number of -bytes (as each byte is a code point and each code point is a grapheme -cluster). With Unicode-strings, it is a common mistake to simply adapt -the ASCII-approach and count the number of code points. This is wrong, -as, for example, the sequence +user-perceived characters, i.e. grapheme clusters, in a string. A good +example for this is a terminal text editor, which needs to properly align +characters on a grid. This is pretty simple with ASCII-strings, where you +just count the number of bytes (as each byte is a code point and each +code point is a grapheme cluster). With Unicode-strings, it is a common +mistake to simply adapt the ASCII-approach and count the number of code +points. This is wrong, as, for example, the sequence .Dq 0x41 0x308 0x304 , while made up of 3 code points, is a single grapheme cluster and represents the user-perceived character