Thursday, June 18, 2009

MySQL C API

I've been rewriting my code generator in C and utilizing the MySQL C API to insert the codes into the database after they've been generated. I keep rewriting pieces of the code and getting remarkable speed increases. (Moore's law!) Here are some things I've discovered along the way:

  • WRITE TO A MYSQL SOCKET IF YOU CAN. There was a 33% increase in speed between writing to the socket and writing to the TCP stack on 127.0.0.1. I only pointed the MySQL connection to localhost so that I could grab the traffic with wireshark and determine why I was getting some funky SQL (Does anyone know of a DBI trace for the C API? I love using it under Perl...) A 33% speed increase is PHENOMENAL. For those of you not in the know, Unix systems have a filetype known as a "socket" which utilizes a nearly identical system API to the networking stack. It is much faster though since it doesn't have any network overhead (even going through the loopback interface has some network overhead).
  • Use prepared statements, and prepare them as few times as possible. I know, I know... this sounds like a no-brainer if you've ever used DBI under perl. I will tell you though... using prepared statements under MySQL is MUCH harder. You basically have to pass in a pointer address as your bind parameter and then copy your execution values into the pointer's address space. Since the address is passed to the bind once, then the bind parameters are passed to the statement, they can't be updated. I was trying to update the pointer to point to my next code to be inserted, but it had to be a strncpy into the memory space (slow and expensive! but still faster than not using a prepared statement). I finally took the plunge last night and just prepared the statement once and then I pass it around.
  • Sort your insertions if you're doing a lot of them in a row. Doing the sort before you begin inserting is a great help (10% speed increase for me). The MySQL C API still lacks an "execute_array" piece of functionality. Perl is also lacking this, although the DBI::MySQL driver tricks you into thinking it exists (it still just does a whole bunch of straight inserts... yeck).
There's more, but it'll have to be a later post since I'm getting back to work on it right now.