c - How to determine if a byte is null in a word -
i reading "strlen" source code glibc, , trick developers found speed read n bytes n size of long word, instead of reading 1 byte @ each iteration.
i assume long word has 4 bytes.
the tricky part every "chunk" of 4 bytes function reads can contain null byte, @ each iteration, function has check if there null byte in chunk. like
if (((longword - lomagic) & ~longword & himagic) != 0) { /* null byte found */ }
where longword
chunk of data , himagic
, lowmagic
magical values defined as:
himagic = 0x80808080l; lomagic = 0x01010101l;
here comment thoses values
/* bits 31, 24, 16, , 8 of number zero. call these bits "holes." note there hole left of each byte, @ end: bits: 01111110 11111110 11111110 11111111 bytes: aaaaaaaa bbbbbbbb cccccccc dddddddd 1-bits make sure carries propagate next 0-bit. 0-bits provide holes carries fall into. */
how trick of finding null byte work?
from famous "bit twiddling hacks" page sean eron anderson, description of used in glibc
implementation you're referring (anderson calls algorithm hasless(v, 1)
):
the subexpression
(v - 0x01010101ul)
, evaluates high bit set in byte whenever corresponding byte in v 0 or greater0x80
. sub-expression~v & 0x80808080ul
evaluates high bits set in bytes byte of v doesn't have high bit set (so byte less0x80
). finally, anding these 2 sub-expressions result high bits set bytes in v zero, since high bits set due value greater0x80
in first sub-expression masked off second.
it appears comment(s) in glibc
source confusing because doesn't apply code doing anymore - it's describing have been implementation of algorithm anderson describes before hasless(v, 1)
algorithm described.
Comments
Post a Comment