Make lg_utf8_*() NULL-agnostic - libgrapheme

commit 08b2c8e4e5222c04f3304595720d195a98ac7e8a
parent b5b82936c9a8231467ff4481626d3b710940fb03
Author: Laslo Hunhold <dev@frign.de>
Date:   Tue, 14 Dec 2021 14:06:23 +0100

Make lg_utf8_*() NULL-agnostic

The special cases of NULL buffers and allocated zero-length buffers
(malloc(0) does not necessarily return NULL!) can be gracefully
handled:

  lg_grapheme_nextbreak(NULL) -> 0
  lg_grapheme_isbreak(cp1, cp2, NULL) -> run without state
  lg_utf8_decode(NULL, 0, &cp) -> 0, cp=invalid (we consumed nothing
                                                 and the cp is invalid)
  lg_utf8_encode(cp, NULL, 0) -> number of bytes needed (good for a
                                 dry-run!)

While the lg_grapheme_*-functions already handled the cases well,
this commit amends the lg_utf8_* functions to do it.

Signed-off-by: Laslo Hunhold <dev@frign.de>

Diffstat:
M src/utf8.c  | 12 ++++++++----

1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/utf8.c b/src/utf8.c
@@ -52,10 +52,10 @@ lg_utf8_decode(const uint8_t *s, size_t n, uint_least32_t *cp)
 {
 	size_t off, i;
 
-	if (n == 0) {
+	if (s == NULL || n == 0) {
 		/* a sequence must be at least 1 byte long */
 		*cp = LG_CODEPOINT_INVALID;
-		return 1;
+		return 0;
 	}
 
 	/* identify sequence type with the first byte */
@@ -145,8 +145,12 @@ lg_utf8_encode(uint_least32_t cp, uint8_t *s, size_t n)
 			break;
 		}
 	}
-	if (1 + off > n) {
-		/* specified buffer is too small to store sequence */
+	if (1 + off > n || s == NULL || n == 0) {
+		/*
+		 * specified buffer is too small to store sequence or
+		 * the caller just wanted to know how many bytes the
+		 * codepoint needs by passing a NULL-buffer.
+		 */
 		return 1 + off;
 	}

	libgrapheme unicode string library
	git clone git://git.suckless.org/libgrapheme
	Log \| Files \| Refs \| README \| LICENSE