Base32 (RFC 4648) (C)
Topics: bit manipulation, encoding
Problem
Implement standard Base32 (RFC 4648) encode and decode over caller buffers (no
allocation), using the alphabet A-Z2-7 with = padding.
size_t b32_encode(const unsigned char *in, size_t inlen, char *out, size_t outcap);
long b32_decode(const char *in, size_t inlen, unsigned char *out, size_t outcap);
How it works
Base32 regroups bits in chunks of five. Take 5 input bytes (40 bits) and slice
them into 8 groups of 5 bits, each becoming one character. A final group of fewer
than 5 bytes produces fewer characters, and the rest of the 8-character block is
filled with =.
The byte-to-char counts for a final group are fixed: 1 byte gives 2 chars (+6 pad),
2 bytes give 4 (+4 pad), 3 bytes give 5 (+3 pad), 4 bytes give 7 (+1 pad), 5 bytes
give 8 (no pad). Decoding reverses this, and the number of = tells you how many
real bytes the last block carried.
What "malformed" means for decode
- length not a multiple of 8,
- a character outside the alphabet (other than trailing
=),
- an invalid padding count (only 0, 1, 3, 4, 6 pad characters are legal),
- padding anywhere but the end.
The RFC test vectors: f→MY======, fo→MZXQ====, foo→MZXW6===,
foob→MZXW6YQ=, fooba→MZXW6YTB, foobar→MZXW6YTBOI======.
Key concepts
- 5-bit packing is the whole job: shift bytes into a 40-bit accumulator, then
read it out 5 bits at a time. The padding is the bookkeeping that makes a partial
final group reversible.
- It is the harder sibling of Base64 (6-bit groups) and Hex (4-bit groups); the
awkward 5-bit width is exactly what makes the padding interesting.
Run
cc -std=c11 -O2 -Wall tests.c -o /tmp/base32 && /tmp/base32
Sign in to submit your solution.