MySQL and more

Tuesday, September 11, 2012

What's the innodb main thread really doing?

I've been trying to debug a lot of mutex issues lately, and the output from SHOW ENGINE INNODB STATUS is one of the first places to look for details.  I noticed that there are many times when I see the following:


Main thread process no. 25188, id 1199122752, state: doing background drop tables

Drop tables?  That shouldn't be happening under normal operation.  Looking at the command counters for com_drop_table and com_alter_table (which drops tables as well) shows no activity.  Why is it doing that?

Well, the answer is that it actually isn't.  To find out why I had to dig through the source code.

The main innodb thread is a special thread that just runs in a loop to do various kinds of work.  In older versions of innodb it did just about everything, and that was often a source of problems.  It was once responsible for keeping the transaction logs flushed as well as handling asyncronous io requests and much more.  A stall in any number of places would cause the others to get far behind.

Since then many tasks handled by the main thread have been split out into separate threads, which means even if there's a stall in one, others can continue to function and help make sure the server keeps responding.

Despite this separation of responsibilities, the main thread continues to be critical for the health of the server.

In MySQL 5.5.13 Percona the code shows it can have the following states ( from storage/innobase/srv/srv0srv.c ):


reserving kernel mutex
doing background drop tables
sleeping
making checkpoint
doing insert buffer merge
flushing buffer pool pages
waiting for buffer pool flush to end
master purge
flushing log
suspending
waiting for server activity
archiving log (if log archive is on)
I'm not sure if this list is exhaustive.

So -- in this case what does "doing background drop tables" mean, especially if it can be observed when no tables are being dropped?  It appears that the it's not actually doing anything -- the code sets the state, but then it has to enter a mutex before it can actually do the work.  In my case I'm pretty sure the main thread is actually sitting around waiting to own the mutex.

The real issue is that someone else is holding kernel_mutex.  This is clear from the output elsewhere in the status:


--Thread 1380165952 has waited at lock0lock.c line 3205 for 1.0000 seconds the semaphore:
Mutex at 0x11cb3d08 '&kernel_mutex', lock var 1

In this case I think the main thread simply shows a symtom of the problem and not the actual cause.  I'm still working on finding the cause which means trying to reproduce this on another system.  It's also possible that this has already been fixed by MySQL 5.6 scalability improvements.

No comments:

Followers