libgrapheme

unicode string library
git clone git://git.suckless.org/libgrapheme
Log | Files | Refs | README | LICENSE

commit 08b2c8e4e5222c04f3304595720d195a98ac7e8a
parent b5b82936c9a8231467ff4481626d3b710940fb03
Author: Laslo Hunhold <dev@frign.de>
Date:   Tue, 14 Dec 2021 14:06:23 +0100

Make lg_utf8_*() NULL-agnostic

The special cases of NULL buffers and allocated zero-length buffers
(malloc(0) does not necessarily return NULL!) can be gracefully
handled:

  lg_grapheme_nextbreak(NULL) -> 0
  lg_grapheme_isbreak(cp1, cp2, NULL) -> run without state
  lg_utf8_decode(NULL, 0, &cp) -> 0, cp=invalid (we consumed nothing
                                                 and the cp is invalid)
  lg_utf8_encode(cp, NULL, 0) -> number of bytes needed (good for a
                                 dry-run!)

While the lg_grapheme_*-functions already handled the cases well,
this commit amends the lg_utf8_* functions to do it.

Signed-off-by: Laslo Hunhold <dev@frign.de>

Diffstat:
Msrc/utf8.c | 12++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/utf8.c b/src/utf8.c @@ -52,10 +52,10 @@ lg_utf8_decode(const uint8_t *s, size_t n, uint_least32_t *cp) { size_t off, i; - if (n == 0) { + if (s == NULL || n == 0) { /* a sequence must be at least 1 byte long */ *cp = LG_CODEPOINT_INVALID; - return 1; + return 0; } /* identify sequence type with the first byte */ @@ -145,8 +145,12 @@ lg_utf8_encode(uint_least32_t cp, uint8_t *s, size_t n) break; } } - if (1 + off > n) { - /* specified buffer is too small to store sequence */ + if (1 + off > n || s == NULL || n == 0) { + /* + * specified buffer is too small to store sequence or + * the caller just wanted to know how many bytes the + * codepoint needs by passing a NULL-buffer. + */ return 1 + off; }