Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As others have said, have a length field in with the string. This also has advantages other than making buffer overflows a lot harder, such as making string copying faster and easier. For example, let's have a skeletonized view of a normal string copy routine in assembly (disclaimer: my assembly is rusty, so this may not be completely right. void where prohibited):

        push rax ; save our registers
        push rdi
        push rsi
        mov rsi, location ; get the pointer to the right place
	mov rdi, destination
		
    beginning:
        mov rax, [rsi] ; copy contents to the register so we can compare
       test rax, rax ; compare our source to itself, if it's zero, it'll set a flag
       jz done ; we're done
       movsb ; copy the byte, increment the rsi and rdi registers
      jmp beginning
	
    done:
        pop rsi ; restore our registers
        pop rdi
        pop rax
		
In contrast, with a length parameter, we can do

        push rcx ; using a different register here
        push rdi
        push rsi
        mov rsi, location ; same as before
        mov rcx, length ; moving our length into the counter register
        mov rdi, destination

        cld ; okay our first change. Clearing the direction flag so the copy
            ; goes from the first byte to the end
        rep movsb ; it'll repeat cx times the movsb command, and then carry on
	    
        pop rsi ; restoring our registers
        pop rdi
        pop rcx
Having the length of strings means you can have much more concise code. It makes loops easier, makes your code cleaner, and in some environments, gives a speed boost.


Handling strings with (ptr, length) also means that some string copies can be entirely avoided. Text can be chopped, shared and extended by adjusting the two parameters, while the underlying storage remains untouched.

e.g. a really basic example for a web server would be splitting up a URL into a path and query string: both strings can use the underlying URL without any copying.


Lots to add to that if you want: sanity check the length against external buffer limit; move word/dword/mmx128; consider alignment; not use deprecated movs instruction.

In fact the best, alignment-sensitive solution I've never seen written. It would load 2 large words, shift them to target alignment if needed, store, load, repeat. This would guarantee aligned fetch/store and still do whole-bus operations.

I've waited to see an instruction to do this in any machine ever (why do we have to hand-code this kind of thing, when the processor chip KNOWS the best way to get it done?) I've waited 20 years.


You mean vectorizing memcmp/memcpy to work word-at-a-time instead of byte-at-a-time? Google does this when compiling their binaries, and I think Facebook does too. I thought the latter had open-sourced it with Folly but couldn't find a code pointer. I've heard rumors that LLVM can sometimes vectorize them to use SIMD instructions when available, too.

It's hard to do this safely for strcpy/strcmp because you might read past the end of the buffer when trying to test against a null terminator. memcmp/memcpy and length-prefixed blocks let you use a Duff's-Device-like construct to test only the last word byte-by-byte.


In a paged system, reading past the end of the string, but no further than the last whole aligned word containing the null terminator, will never page fault. So that can still work.


not use deprecated movs instruction

I've waited to see an instruction to do this in any machine ever (why do we have to hand-code this kind of thing, when the processor chip KNOWS the best way to get it done?) I've waited 20 years.

Look up "enhanced REP MOVSB"; this link may also be interesting reading: https://software.intel.com/en-us/forums/topic/275765

REP STOS (memset) has also gotten the same boost throughout the generations of x86, and if the trend continues I'd expect REP CMPS and LODS to get the same treatment. These string instructions are tiny (1-2 bytes) and yet very powerful; their greatest advantage is that they don't take up the astoundingly large amount of space in the icache that some extremely micro-optimised routines (i.e. ridiculous amounts of loop unrolling) do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: