As others have said, have a length field in with the string. This also has advan...

joosters · on May 19, 2015

Handling strings with (ptr, length) also means that some string copies can be entirely avoided. Text can be chopped, shared and extended by adjusting the two parameters, while the underlying storage remains untouched.

e.g. a really basic example for a web server would be splitting up a URL into a path and query string: both strings can use the underlying URL without any copying.

JoeAltmaier · on May 19, 2015

Lots to add to that if you want: sanity check the length against external buffer limit; move word/dword/mmx128; consider alignment; not use deprecated movs instruction.

In fact the best, alignment-sensitive solution I've never seen written. It would load 2 large words, shift them to target alignment if needed, store, load, repeat. This would guarantee aligned fetch/store and still do whole-bus operations.

I've waited to see an instruction to do this in any machine ever (why do we have to hand-code this kind of thing, when the processor chip KNOWS the best way to get it done?) I've waited 20 years.

nostrademons · on May 20, 2015

You mean vectorizing memcmp/memcpy to work word-at-a-time instead of byte-at-a-time? Google does this when compiling their binaries, and I think Facebook does too. I thought the latter had open-sourced it with Folly but couldn't find a code pointer. I've heard rumors that LLVM can sometimes vectorize them to use SIMD instructions when available, too.

It's hard to do this safely for strcpy/strcmp because you might read past the end of the buffer when trying to test against a null terminator. memcmp/memcpy and length-prefixed blocks let you use a Duff's-Device-like construct to test only the last word byte-by-byte.

JoeAltmaier · on May 20, 2015

In a paged system, reading past the end of the string, but no further than the last whole aligned word containing the null terminator, will never page fault. So that can still work.

userbinator · on May 20, 2015

not use deprecated movs instruction

I've waited to see an instruction to do this in any machine ever (why do we have to hand-code this kind of thing, when the processor chip KNOWS the best way to get it done?) I've waited 20 years.

Look up "enhanced REP MOVSB"; this link may also be interesting reading: https://software.intel.com/en-us/forums/topic/275765

REP STOS (memset) has also gotten the same boost throughout the generations of x86, and if the trend continues I'd expect REP CMPS and LODS to get the same treatment. These string instructions are tiny (1-2 bytes) and yet very powerful; their greatest advantage is that they don't take up the astoundingly large amount of space in the icache that some extremely micro-optimised routines (i.e. ridiculous amounts of loop unrolling) do.