Saturday, April 12, 2014

Auto Increment and InnoDB

Auto Increment is one of the neat pieces of functionality that MySQL has.  It is not implemented the way other database platforms do it.  Typically, a platform like T-SQL (Postgres, Microsoft) will create an internal UDF/SP that does work behind the scenes in a sequence table.

When MySQL initially created Auto Increment, they did so for MyISAM.  When it came time to implemented in a transactional storage engine, they changed the game a little.  It is both good and bad.

To get performance MySQL stores the Auto-Inc current value for a table in memory.  It doesn't actually flush the current value of the counter until some event forces MySQL to do so (a server shutdown/restart or flush perhaps.)  I have seen MySQL crash in the middle of operation and reset the counter back to the last flush and in weird circumstances start handing out the same IDs generating duplicate keys.

Further, because InnoDB is transactional, and the counter is cached in memory until a flush, this is one of the few times where when generating a new sequence will cause InnoDB to do a full table lock for just a split microsecond.

Microsecond you say? That's hardly a horrible amount of time to wait, and to a human that is true. To a computer thousands of updates per second, each microsecond can feel like a life time.  We don't want our databases feeling old ahead of their time.

I present to you...

The counter UDF/SP.

DELIMITER ;;
DROP FUNCTION IF EXISTS calc_id;;
CREATE FUNCTION calc_id(_n BIGINT) RETURNS bigint(20)
    DETERMINISTIC
BEGIN

declare n BIGINT;
declare x BIGINT;

set x = (floor(_n / @@auto_increment_increment)) * @@auto_increment_increment;

set x = x + IF( (@@auto_increment_offset = @@auto_increment_increment),0,@@auto_increment_offset);

if x<=_n then
  set x=x+@@auto_increment_increment;

end if ;
return x;
END
;;
DELIMITER ;



DELIMITER ;;
DROP FUNCTION IF EXISTS get_id;;
CREATE FUNCTION get_id(_seq_idx char(32)) RETURNS bigint(20)
    DETERMINISTIC
BEGIN
  UPDATE sequences SET seq_id= @_seq_id:= calc_id(seq_id) where seq_idx = _seq_idx;
  return @_seq_id;
END ;;
DELIMITER ;


DROP TABLE IF EXISTS sequences;
CREATE TABLE sequences (
  seq_idx char(32) NOT NULL,
  seq_id bigint(20) NOT NULL DEFAULT 0,
  PRIMARY KEY (seq_idx)
) ENGINE=InnoDB;


This little snippet of code has been tested for several years in multiple large scale environments.  I am sure there are ways to improve some of the internal logic, but when it can handle six digits of updates per second on a MySQL (ndb) cluster or five digits of updates per second on a single InnoDB master who cares?

Insert into sequences VALUES('users_pk_idx',0);
Select get_id('users_pk_idx');


This code carries some nifty side effects that eventually developers smile about.
  • To make sure it is happy in any replication environment, I always advise the developers to call the get_id() from code.  Then s/he will explicitly insert the PK value into the query.
  • This means the developer actually knows sequence value ahead of time.  They can plan other code and strategies around knowing the ID before it is inserted.
  • You no longer have to call mysql_insert_id() to get the last insert ID, because you already know what it is.
  • There is no more global locking of the table for microseconds to milliseconds (depending on how many tps you are having on the server)
  • It also follows the protocols for auto-increment ID skipping, so if you are in multi-master environments you can keep that.  
  • The auto-increment number becomes atomic and is as good and durable as to how you setup InnoDB to be. :-)
It is a little more work on the front end, but once you get used to it, you start to like the flexibility of it.  You can even maintain counters for fields that don't need to be PKs, or even UNIQUES explicitly.



No comments:

Post a Comment