The Up-to-Date Reference Manual of InnoDB
Updated October 31, 2002: Added notes on new isolation levels
READ COMMITTED and READ UNCOMMITTED in 4.0.5 to sections 8 and 16
Updated October 14, 2002: MySQL-3.23.53 is released
Updated October 2, 2002: MySQL-4.0.4 is released
Updated September 30, 2002: Added to section 8.5 a note that CREATE TABLE
commits the current InnoDB transaction if the MySQL binlogging is used
Updated August 27, 2002: Added to sections 4.2 and 4.3 tips on new commands
SET UNIQUE_CHECKS=0 and SET FOREIGN_KEY_CHECKS=0
Updated August 12, 2002: Improved section 8.8 about how to cope with
deadlocks
InnoDB provides MySQL with a transaction-safe table handler with
commit, rollback, and crash recovery capabilities. InnoDB does
locking on row level and also provides an Oracle-style
consistent non-locking read in SELECT
s.
These features increase
multiuser concurrency and performance. There is no need for
lock escalation in InnoDB,
because row level locks in InnoDB fit in very small space.
InnoDB tables support FOREIGN KEY
constraints
as the first table type in MySQL.
In SQL queries you can freely mix InnoDB type tables with other
table types of MySQL, even within the same query.
InnoDB has been designed for maximum performance when processing large data volumes. Its CPU efficiency is probably not matched by any other disk-based relational database engine.
Technically, InnoDB is a complete database backend placed under MySQL.
InnoDB has its own buffer pool for caching data and indexes in main
memory. InnoDB stores its tables and indexes in a tablespace, which
may consist of several files. This is different from, for example,
MyISAM
tables where each table is stored as a separate file.
InnoDB tables can be of any size also on those operating
systems where file size is limited to 2 GB.
InnoDB is currently (October 2001) used in production at several large database sites requiring high performance. The famous Internet news site Slashdot.org runs on InnoDB and uses also the MySQL replication feature to InnoDB tables. Mytrix, Inc. stores over 1 TB of data in InnoDB, and another site handles an average load of 800 inserts/updates per second in InnoDB.
InnoDB tables are included in the MySQL source distribution starting from 3.23.34a and are activated in the MySQL -Max binary. For Windows the -Max binaries are contained in the standard distribution.
If you have downloaded a binary version of MySQL that includes support for InnoDB, simply follow the instructions of the MySQL manual for installing a binary version of MySQL. If you already have a recent version MySQL-3.23.4x installed, then the simplest way to install MySQL -Max with the same version number 3.23.4x is to replace the server executable mysqld with the corresponding executable in the -Max distribution. MySQL and MySQL -Max differ only in the server executable.
To compile MySQL-3.23 with InnoDB support, download MySQL-3.23.34a or newer
version from the
MySQL website
and configure MySQL
with the
--with-innodb
option. See the
MySQL manual
about installing a MySQL source distribution. In MySQL-4.0 InnoDB
is compiled in by default.
cd /path/to/source/of/mysql-3.23.47 ./configure --with-innodb
To use InnoDB in MySQL-Max-3.23 you have to specify InnoDB startup
options in your my.cnf or my.ini file. The minimal way
to modify it is to add to the [mysqld]
section the line
innodb_data_file_path=ibdata:30Mbut to get good performance it is best that you specify options as recommended below in the section 'InnoDB startup options'.
InnoDB is distributed under the GNU GPL License Version 2 (of June 1991). In the source distribution of MySQL, InnoDB appears as a subdirectory.
To use InnoDB tables in MySQL-Max-3.23 you MUST specify configuration
parameters
in the [mysqld]
section of
the configuration file my.cnf, or on Windows optionally in
my.ini.
At the minimum, in 3.23 you must specify
innodb_data_file_path
where you give the names and the sizes
of data files. If you do not
mention innodb_data_home_dir
in your my.cnf
the default is that InnoDB creates these data files to the
datadir
of MySQL.
If you specify innodb_data_home_dir
as an empty
string, then you can give absolute paths to your data files.
In MySQL-4.0 you do not need to specify even
innodb_data_file_path
: the default for 4.0 is to create
an auto-extending 10 MB file ibdata1 to the datadir
of MySQL. (In MySQL-4.0.0 and 4.0.1 the data file is 64 MB and not
auto-extending.)
But to get good performance you MUST explicitly set the InnoDB parameters listed below in the examples.
Starting from versions 3.23.50 and 4.0.2 InnoDB allows the last
data file on the innodb_data_file_path
line
to be specified as auto-extending. The syntax for
innodb_data_file_path
is then the following:
pathtodatafile:sizespecification;pathtodatafile:sizespec;... ...;pathtodatafile:sizespec[:autoextend[:max:sizespecification]]If you specify the last data file with the autoextend option, InnoDB will extend the last data file if it runs out of free space in the tablespace. The increment is 8 MB at a time. An example:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:100M:autoextendinstructs InnoDB to create just a single data file whose initial size is 100 MB and which is extended in 8 MB blocks when space runs out. If the disk becomes full you may want to add another data file to another disk, for example. Then you have to look the size of ibdata1, round the size downward to the closest multiple of 1024 * 1024 bytes (= 1 MB), and specify the rounded size of ibdata1 explicitly in
innodb_data_file_path
.
After that you can add another data file:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:988M;/disk2/ibdata2:50M:autoextendBe cautious on file systems where the maximum file size is 2 GB! InnoDB is not aware of the OS maximum file size. On those file systems you might want to specify the max size for the data file:
innodb_data_home_dir = innodb_data_file_path = /ibdata/ibdata1:100M:autoextend:max:2000M
A simple my.cnf example. Suppose you have a computer with 128 MB RAM and one hard disk. Below is an example of possible configuration parameters in my.cnf or my.ini for InnoDB. We assume you are running MySQL-Max-3.23.50 or later, or MySQL-4.0.2 or later.
This example
suits most users, both on Unix and Windows,
who do not want to distribute InnoDB data files
and log files on several disks. This creates an auto-extending
data file ibdata1 and two InnoDB log files
ib_logfile0 and ib_logfile1 to the
datadir
of MySQL (typically /mysql/data).
Also the small archived InnoDB log file
ib_arch_log_0000000000 ends up in the datadir
.
[mysqld] # You can write your other MySQL server options here # ... # Data file(s) must be able to # hold your data and indexes. # Make sure you have enough # free disk space. innodb_data_file_path = ibdata1:10M:autoextend # Set buffer pool size to # 50 - 80 % of your computer's # memory set-variable = innodb_buffer_pool_size=70M set-variable = innodb_additional_mem_pool_size=10M # Set the log file size to about # 25 % of the buffer pool size set-variable = innodb_log_file_size=20M set-variable = innodb_log_buffer_size=8M # Set ..flush_log_at_trx_commit # to 0 if you can afford losing # some last transactions innodb_flush_log_at_trx_commit=1
Check that the MySQL server
has the rights to create files in datadir
.
Note that data files must be < 2G in some file systems! The combined size of the log files must be < 4G. The combined size of data files must be >= 10 MB.
When you for the first time create an InnoDB database, it is best that you start the MySQL server from the command prompt. Then InnoDB will print the information about the database creation to the screen, and you see what is happening. See below in section 3 what the printout should look like. For example, in Windows you can start mysqld-max.exe with:
your-path-to-mysqld>mysqld-max --console
Where to put my.cnf or my.ini in Windows? The rules for Windows are the following:
SET
command of MS-DOS to print the value of WINDIR.
Where to specify options in Unix? On Unix mysqld reads options from the following files, if they exist, in the following order:
--defaults-extra-file=...
.
./configure
option when mysqld
was compiled
(typically /usr/local/mysql/data for a binary installation or /usr/local/var for a source installation).
If you are not sure from where mysqld reads its my.cnf
or my.ini, you can give the path as the first command-line
option to the server:
mysqld --defaults-file=your_path_to_my_cnf
.
InnoDB forms the directory path to a data file by textually
catenating innodb_data_home_dir
to a data file name or
path in innodb_data_file_path
,
adding a possible slash or backslash in between if needed.
If the keyword innodb_data_home_dir
is not mentioned
in my.cnf at all, the default for it is the 'dot' directory
./
which means the datadir
of MySQL.
An advanced my.cnf example. Suppose you have a Linux computer with 2 GB RAM and three 60 GB hard disks (at directory paths `/', `/dr2' and `/dr3'). Below is an example of possible configuration parameters in my.cnf for InnoDB.
Note that InnoDB does not create directories:
you have to create them yourself.
Use the Unix or MS-DOS mkdir
command to create
the data and log group home directories.
[mysqld] # You can write your other MySQL server options here # ... innodb_data_home_dir = # Data files must be able to # hold your data and indexes innodb_data_file_path = /ibdata/ibdata1:2000M;/dr2/ibdata/ibdata2:2000M:autoextend # Set buffer pool size to # 50 - 80 % of your computer's # memory, but make sure on Linux # x86 total memory usage is # < 2 GB set-variable = innodb_buffer_pool_size=1G set-variable = innodb_additional_mem_pool_size=20M innodb_log_group_home_dir = /dr3/iblogs # .._log_arch_dir must be the same # as .._log_group_home_dir innodb_log_arch_dir = /dr3/iblogs set-variable = innodb_log_files_in_group=3 # Set the log file size to about # 15 % of the buffer pool size set-variable = innodb_log_file_size=150M set-variable = innodb_log_buffer_size=8M # Set ..flush_log_at_trx_commit to # 0 if you can afford losing # some last transactions innodb_flush_log_at_trx_commit=1 set-variable = innodb_lock_wait_timeout=50 #innodb_flush_method=fdatasync #set-variable = innodb_thread_concurrency=5
Note that we have placed the two data files on different disks. InnoDB will fill the tablespace formed by the data files from bottom up. In some cases it will improve the performance of the database if all data is not placed on the same physical disk. Putting log files on a different disk from data is very often beneficial for performance. You can also use raw disk partitions (raw devices) as data files. In some Unixes they speed up i/o. See section 12.1 about how to specify them in my.cnf.
Warning: on Linux x86 you must be careful you do not set memory usage too high. glibc will allow the process heap to grow over thread stacks, which will crash your server. It is a risk if the value of
innodb_buffer_pool_size + key_buffer + max_connections * (sort_buffer + record_buffer) + max_connections * 2 MBis close to 2 GB or exceeds 2 GB. Each thread will use a stack (often 2 MB, but in MySQL AB binaries only 256 kB) and in the worst case also
sort_buffer + record_buffer
additional memory.
How to tune other mysqld server parameters? For detailed information on how to tune other MySQL server parameters, see the MySQL manual. Typical values which suit most users are:
skip-locking set-variable = max_connections=200 set-variable = record_buffer=1M set-variable = sort_buffer=1M # Set key_buffer to 5 - 50 % # of your RAM depending on how # much you use MyISAM tables, but # keep key_buffer + InnoDB # buffer pool size < 80 % of # your RAM set-variable = key_buffer=...
Note that some parameters are given using the numeric my.cnf
parameter format: set-variable = innodb... = 123
, others
(string and boolean parameters) with another format:
innodb_... = ...
.
The meanings of the configuration parameters are the following:
innodb_data_home_dir |
The common part of the directory path for all InnoDB data files.
If you do not mention this option in my.cnf, InnoDB will use
the MySQL datadir as the default.
You can specify this also as an empty string, in which case you
can use absolute file paths in innodb_data_file_path .
|
innodb_data_file_path |
Paths to individual data files and their sizes. The full directory path
to each data file is acquired by concatenating innodb_data_home_dir to
the paths specified here. The file sizes are specified in megabytes,
hence the 'M' after the size specification above.
InnoDB also understands the abbreviation 'G', 1G meaning 1024M.
Starting from
3.23.44 you can set the file size bigger than 4 GB on those
operating systems which support big files.
On some operating systems files must be < 2 GB.
The sum of the sizes of the data files must be at least 10 MB.
In MySQL-3.23 this parameter must always be specified in
my.cnf.
In MySQL-4.0.2 and later, if you do not specify this,
the default is to create a 16 MB auto-extending data file ibdata1 to
the datadir of MySQL.
You can also use raw disk partitions as data files. See section 12.1
about how to specify them in my.cnf.
|
innodb_mirrored_log_groups | Number of identical copies of log groups we keep for the database. Currently this should be set to 1. Numeric my.cnf parameter format. |
innodb_log_group_home_dir |
Directory path to InnoDB log files.
Must be set the same as innodb_log_arch_dir .
If you do not specify any InnoDB log parameters,
the default is to create two 5 MB files ib_logfile... to
the datadir of MySQL.
|
innodb_log_files_in_group | Number of log files in the log group. InnoDB writes to the files in a circular fashion. Value 3 is recommended here. Numeric my.cnf parameter format. |
innodb_log_file_size | Size of each log file in a log group in megabytes. If n is the number of log files you specified for the log group, then sensible values range from 1M to 1/nth of the size of the buffer pool specified below, The bigger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk i/o. But bigger log files also mean that recovery will be slower in case of a crash. The combined size of log files must be < 4 GB. Numeric my.cnf parameter format. |
innodb_log_buffer_size | The size of the buffer which InnoDB uses to write log to the log files on disk. Sensible values range from 1M to 8M. A big log buffer allows large transactions to run without a need to write the log to disk until the transaction commit. Thus, if you have big transactions, making the log buffer big will save disk i/o. Numeric my.cnf parameter format. |
innodb_flush_log_at_trx_commit | Normally this is set to 1, meaning that at a transaction commit the log is flushed to disk, and the modifications made by the transaction become permanent, and survive a database crash. If you are willing to compromise this safety, and you are running small transactions, you may set this to 0 to reduce disk i/o to the logs. The default value of this parameter is 0. |
innodb_log_arch_dir |
The directory where fully written log files would be archived if we used
log archiving. The value of this parameter should currently be set the
same as innodb_log_group_home_dir .
|
innodb_log_archive | This value should currently be set to 0. As recovery from a backup is done by MySQL using its own log files, there is currently no need to archive InnoDB log files. The default value of this parameter is 0. |
innodb_buffer_pool_size | The size of the memory buffer InnoDB uses to cache data and indexes of its tables. The bigger you set this the less disk i/o is needed to access data in tables. On a dedicated database server you may set this parameter up to 80 % of the machine physical memory size. Do not set it too large though, because competition of the physical memory may cause paging in the operating system. Numeric my.cnf parameter format. |
innodb_additional_mem_pool_size | Size of a memory pool InnoDB uses to store data dictionary information and other internal data structures. A sensible value for this might be 2M, but the more tables you have in your application the more you will need to allocate here. If InnoDB runs out of memory in this pool, it will start to allocate memory from the operating system, and write warning messages to the MySQL error log. Numeric my.cnf parameter format. |
innodb_file_io_threads | Number of file i/o threads in InnoDB. Normally, this should be 4, but on Windows disk i/o may benefit from a larger number. Numeric my.cnf parameter format. |
innodb_lock_wait_timeout |
Timeout in seconds an InnoDB transaction may wait for a lock before
being rolled back. InnoDB automatically detects transaction deadlocks
in its own lock table and rolls back the transaction. If you use
LOCK TABLES command, or other transaction safe table handlers
than InnoDB in the same transaction, then a deadlock may arise which
InnoDB cannot notice. In cases like this the timeout is useful to
resolve the situation.
Numeric my.cnf parameter format.
The default value of this parameter is 50 seconds.
|
innodb_flush_method |
This is only relevant on Unix.
The default value for this is fdatasync .
Another option is O_DSYNC .
This affects only log flushing, data files in Unix
are always flushed with fsync .
InnoDB versions starting from 3.23.40b
in Unix use fsync if you specify
fdatasync and O_SYNC
if you specify O_DSYNC .
The 'data' versions are not used because there
have been problems with them on many Unix flavors.
|
innodb_force_recovery | Warning: this option should only be defined in an emergency situation when you want to dump your tables from a corrupt database! Possible values are 1 - 6. See below at section 'Forcing recovery' about the meanings of the values. As a safety measure InnoDB prevents a user from modifying data when this option is > 0. This option is available starting from version 3.23.44. Numeric my.cnf parameter format. |
innodb_fast_shutdown | By default, InnoDB does a full purge and an insert buffer merge before a shutdown. These operations can take minutes, or in extreme cases even hours. If you set this parameter to 1, then InnoDB skips these operations at shutdown. Available starting from 3.23.44 and 4.0.1. The default value of this parameter is 1 starting from 3.23.50. |
innodb_thread_concurrency |
InnoDB tries to keep the number of operating system
threads concurrently inside InnoDB below or equal to
the limit given
in this parameter. The default value for this parameter
is 8. If you have low performance and
innodb_monitor reveals many threads
waiting for semaphores, then you may have thread
thrashing and should try setting this parameter lower.
If you have a computer with many processors and disks,
you can try setting this value higher to better utilize
the resources of you computer. A value 'number of
processors + number of disks' is recommended.
Available starting from 3.23.44 and 4.0.1.
Numeric my.cnf parameter format.
|
Suppose you have installed MySQL and have edited my.cnf so that it contains the necessary InnoDB configuration parameters. Before starting MySQL you should check that the directories you have specified for InnoDB data files and log files exist and that you have access rights to those directories. InnoDB cannot create directories, only files. Check also you have enough disk space for the data and log files.
It is best to run the MySQL server mysqld from the command prompt when you create an InnoDB database, not from the safe_mysqld wrapper or as a Windows service. When you run from a command prompt you see what mysqld prints and what is happening.
When you now start the MySQL server, InnoDB will start creating your data files and log files. InnoDB will print something like the following:
heikki@donna:~/mysql-3.23.48/sql> mysqld 020204 23:17:12 InnoDB: The first specified data file /dr2/tmp/heikki/data/ibdata1 did not exist: InnoDB: a new database to be created! InnoDB: Setting file /dr2/tmp/heikki/data/ibdata1 size to 20 MB InnoDB: Database physically writes the file full: wait... 020204 23:17:16 InnoDB: Data file /dr2/tmp/heikki/data/ibdata2 did not exist: new to be created InnoDB: Setting file /dr2/tmp/heikki/data/ibdata2 size to 200 MB InnoDB: Database physically writes the file full: wait... 020204 23:17:41 InnoDB: Data file /dr2/tmp/heikki/data/ibdata3 did not exist: new to be created InnoDB: Setting file /dr2/tmp/heikki/data/ibdata3 size to 1000 MB InnoDB: Database physically writes the file full: wait... 020204 23:21:37 InnoDB: Log file ./ib_logfile0 did not exist: new to be created InnoDB: Setting log file ./ib_logfile0 size to 10 MB InnoDB: Database physically writes the file full: wait... 020204 23:21:39 InnoDB: Log file ./ib_logfile1 did not exist: new to be created InnoDB: Setting log file ./ib_logfile1 size to 10 MB InnoDB: Database physically writes the file full: wait... 020204 23:21:41 InnoDB: Log file ./ib_logfile2 did not exist: new to be created InnoDB: Setting log file ./ib_logfile2 size to 10 MB InnoDB: Database physically writes the file full: wait... InnoDB: Doublewrite buffer not found: creating new InnoDB: Doublewrite buffer created InnoDB: Creating foreign key constraint system tables InnoDB: Foreign key constraint system tables created 020204 23:21:45 InnoDB: Started mysqld: ready for connections
A new InnoDB database has now been created. You can connect to the MySQL
server with the usual MySQL client programs like mysql.
When you shut down the MySQL server with mysqladmin shutdown
,
InnoDB output will be like the following:
020204 23:34:45 mysqld: Normal shutdown 020204 23:34:45 InnoDB: Starting shutdown... 020204 23:34:47 InnoDB: Shutdown completed 020204 23:34:47 mysqld: Shutdown Complete
You can now look at the data files and logs directories and you will see the files created. The log directory will also contain a small file named ib_arch_log_0000000000. That file resulted from the database creation, after which InnoDB switched off log archiving. When MySQL is again started, the output will be like the following:
heikki@donna:~/mysql-3.23.48/sql> mysqld 020204 23:34:27 InnoDB: Started mysqld: ready for connections
If InnoDB prints an operating system error in a file operation, look from section 13.2 what that error code means. Usually the problem is one of the following:
innodb_data_home_dir
or innodb_data_file_path
.
If something goes wrong in an InnoDB database creation, you should delete all files created by InnoDB. This means all data files, all log files, the small archived log file, and in the case you already did create some InnoDB tables, delete also the corresponding .frm files for these tables from the MySQL database directories. Then you can try the InnoDB database creation again. It is best to start the MySQL server from a command prompt so that you see what is happening.
Normally you have to shut down the database server before
shutting down your computer. You can do this from the command line
with the command mysqladmin shutdown
.
On Windows NT and 2000 you can install the MySQL server also
as a Windows service. In that case the MySQL startup happens
automatically at the computer boot time, or you can start it with
the MS-DOS command NET START MySQL
or from the Services
menu of your operating system.
If you run MySQL as a service, you can shut down the MySQL server
with the MS-DOS
NET STOP MySQL
command, or from the Services menu of
your operating system. You can also let the operating system
automatically shut down MySQL at the computer shutdown time.
In MySQL versions < 3.23.47 the Windows operating system only waited for
a few seconds for the InnoDB shutdown to complete, and killed
the database server process if the time limit was exceeded.
Then at the next startup InnoDB had to do a crash recovery.
Starting from the MySQL version 3.23.48, the operating system will wait
longer for the InnoDB shutdown to complete.
If you notice that the operating system does not wait long enough for
the InnoDB shutdown to complete, it is safest to run the MySQL server
from an MS-DOS prompt, and shut it down with mysqladmin shutdown
.
On Windows NT (but not on Windows 2000), at the computer shutdown,
there is the problem that Windows NT by default only waits
for 20 seconds for a service to shut down, and after
that kills the service process.
You can increase this default by opening the Registry Editor
\winnt\system32\regedt32.exe
and editing the value of WaitToKillServiceTimeout
at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
in the Registry tree. Give the new larger value in milliseconds.
Suppose you have started the MySQL client with the command
mysql test
.
To create a table in the InnoDB format you must specify
TYPE = InnoDB
in the table creation SQL command:
CREATE TABLE CUSTOMER (A INT, B CHAR (20), INDEX (A)) TYPE = InnoDB;
This SQL command will create a table and an index on column A
into the InnoDB tablespace consisting of the data files you specified
in my.cnf. In addition MySQL will create a file
CUSTOMER.frm to the MySQL database directory test.
Internally, InnoDB will add to its own data dictionary an entry
for table 'test/CUSTOMER'
. Thus you can create a table
of the same name CUSTOMER
in another database of MySQL, and
the table names will not collide inside InnoDB.
You can query the amount of free space in the InnoDB tablespace
by issuing the table status command of MySQL for any table you have
created with TYPE = InnoDB
. Then the amount of free
space in the tablespace appears in the table comment section in the
output of SHOW
. An example:
SHOW TABLE STATUS FROM test LIKE 'CUSTOMER'
Note that the statistics SHOW
gives about InnoDB tables
are only approximate: they are used in SQL optimization. Table and
index reserved sizes in bytes are accurate, though.
Also take care not to delete or
add .frm files to your InnoDB database manually: use
CREATE TABLE
and DROP TABLE
commands.
InnoDB has its own internal data dictionary, and you will get problems
if the MySQL .frm files are out of 'sync' with the InnoDB
internal data dictionary.
By default, MySQL always starts a new connection in the autocommit mode
which automatically commits every SQL statement you run.
To use transactions, you can switch the autocommit off with the SQL command
SET AUTOCOMMIT = 0
and use COMMIT
and ROLLBACK
to commit or rollback your transaction. If you want to leave the autocommit
on, you can enclose your transactions between BEGIN
and COMMIT
or ROLLBACK
.
heikki@hundin:~/mysql/client> mysql test Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 5 to server version: 3.23.50-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> CREATE TABLE CUSTOMER (A INT, B CHAR (20), INDEX (A)) TYPE = InnoDB; Query OK, 0 rows affected (0.00 sec) mysql> BEGIN; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO CUSTOMER VALUES (10, 'Heikki'); Query OK, 1 row affected (0.00 sec) mysql> COMMIT; Query OK, 0 rows affected (0.00 sec) mysql> SET AUTOCOMMIT=0; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO CUSTOMER VALUES (15, 'John'); Query OK, 1 row affected (0.00 sec) mysql> ROLLBACK; Query OK, 0 rows affected (0.00 sec) mysql> SELECT * FROM CUSTOMER; +------+--------+ | A | B | +------+--------+ | 10 | Heikki | +------+--------+ 1 row in set (0.00 sec) mysql>
In APIs like PHP, Perl DBI/DBD, JDBC, ODBC, or the standard C call interface
of MySQL, send the transaction control statements like "COMMIT"
to
the MySQL server as strings like any other
SQL statements, e.g., "SELECT..."
or "INSERT..."
.
APIs often contain separate
special commit-transaction methods, but since transaction support is
still relatively young in MySQL, those do not always work with
all versions of the APIs.
'user'
or 'host'
to the InnoDB type.
The system tables must always be of the MyISAM type.
If you want all your tables to be created in the InnoDB type, you can, starting from the MySQL version 3.23.43, add the line
default-table-type=innodbto the
[mysqld]
section of your my.cnf
or my.ini.
InnoDB does not have a special optimization for separate index creation.
Therefore it does not pay to export and import the table and create indexes
afterwards.
The fastest way to alter a table to InnoDB is to do the inserts
directly to an InnoDB table, that is, use
ALTER TABLE ... TYPE=INNODB
,
or create an empty InnoDB table with identical definitions and insert
the rows with INSERT INTO ... SELECT * FROM ...
.
If you have UNIQUE
constraints on secondary keys, starting from
3.23.52 you
can speed up a table import by turning the uniqueness checks off
for a while in the import session:
SET UNIQUE_CHECKS=0;For big tables this saves a lot of disk i/o because InnoDB can then use its insert buffer to write secondary index records in a batch.
To get better control over the insertion process, it may be good to insert big tables in pieces:
INSERT INTO newtable SELECT * FROM oldtable WHERE yourkey > something AND yourkey <= somethingelse;
After all data has been inserted you can rename the tables.
During the conversion of big tables you should set the InnoDB buffer pool size big to reduce disk i/o. Not bigger than 80 % of the physical memory, though. You should set InnoDB log files big, and also the log buffer large.
Make sure you do not run out of tablespace: InnoDB tables take a lot
more space than MyISAM tables. If an ALTER TABLE
runs out
of space, it will start a rollback, and that can take hours if it is
disk-bound.
In inserts InnoDB uses the insert buffer to merge secondary index records
to indexes in batches. That saves a lot of disk i/o. In rollback no such
mechanism is used, and the rollback can take 30 times longer than the insertion.
In the case of a runaway rollback, if you do not have valuable data in your just imported table, starting from version 3.23.53 and 4.0.3 you can use the trick explained in section 6.1 to stop the runaway rollback.
Starting from version 3.23.43b InnoDB features foreign key constraints. InnoDB is the first MySQL table type which allows you to define foreign key constraints to guard the integrity of your data.
The syntax of a foreign key constraint definition in InnoDB:
[CONSTRAINT symbol] FOREIGN KEY [id] (index_col_name, ...) REFERENCES table_name (index_col_name, ...) [ON DELETE CASCADE | ON DELETE SET NULL | RESTRICT]Both tables have to be InnoDB type and there must be an index where the foreign key and the referenced key are listed as the first columns. InnoDB does not auto-create indexes on foreign keys or referenced keys: you have to create them explicitly.
Corresponding columns in the foreign key and the referenced key must have similar internal data types inside InnoDB so that they can be compared without a type conversion. The size and the signedness of integer types has to be the same. The length of string types need not be the same.
Starting from version 3.23.50, InnoDB does not check foreign key constraints on those foreign key or referenced key values which contain a NULL column.
Starting from version 3.23.50 you can also associate the
ON DELETE CASCADE
or ON DELETE SET NULL
clause with the foreign key constraint.
If ON DELETE CASCADE
is specified, and a row in the parent
table is deleted, then InnoDB automatically deletes also all those rows
in the child table whose foreign key values are equal to
the referenced key value in the parent row. If
ON DELETE SET NULL
is specified, the child rows are automatically updated
so that the columns in the
foreign key are set to the SQL NULL value.
An example:
CREATE TABLE parent(id INT NOT NULL, PRIMARY KEY (id)) TYPE=INNODB; CREATE TABLE child(id INT, parent_id INT, INDEX par_ind (parent_id), FOREIGN KEY (parent_id) REFERENCES parent(id) ON DELETE CASCADE ) TYPE=INNODB;
If MySQL gives the error number 1005 from a CREATE TABLE
statement, and the error message string refers to errno 150, then
the table creation failed because a foreign key constraint was not
correctly formed.
Similarly, if an ALTER TABLE
fails and it refers to errno
150, that means a foreign key definition would be incorrectly
formed for the altered table.
Starting from version 3.23.50 InnoDB allows you to add a new foreign key constraint to a table through
ALTER TABLE yourtablename ADD [CONSTRAINT symbol] FOREIGN KEY [id] (...) REFERENCES table_name (index_col_name, ...) [ON DELETE CASCADE | ON DELETE SET NULL | RESTRICT]Remember to create the required indexes first, though. You can also add a self-referential foreign key constraint to a table using
ALTER TABLE
.
If you want to import several dumps of tables, but the dumps are not correctly ordered for foreign keys, starting from 3.23.52 and 4.0.3 you can turn the foreign key checks off for a while in the import session:
SET FOREIGN_KEY_CHECKS=0;This allows you to import the tables in any order, and also speeds up the import.
Starting from 3.23.50 the InnoDB parser allows you to
use also backquotes around table and column names in a
FOREIGN KEY ... REFERENCES ...
clause.
Starting from 4.0.5 the InnoDB parser is aware of possible
option lower_case_table_names
you give in my.cnf.
In InnoDB versions < 3.23.50 ALTER TABLE
or CREATE INDEX
should not be used in connection with tables which have foreign
key constraints or which are referenced in foreign key constraints:
Any ALTER TABLE
removes all foreign key
constrainst defined for the table. You should not use
ALTER TABLE
to the referenced table either, but
use DROP TABLE
and CREATE TABLE
to modify the
schema. When MySQL does an ALTER TABLE
it may internally
use RENAME TABLE
, and that will confuse the
foreign key costraints which refer to the table.
A CREATE INDEX
statement is in MySQL
processed as an ALTER TABLE
, and these
restrictions apply also to it.
When doing foreign key checks InnoDB sets shared row level locks on child or parent records it has to look at. InnoDB checks foreign key constraints immediately: the check is not deferred to transaction commit.
InnoDB allows you to drop any table even though that would break the foreign key constraints which reference the table. When you drop a table the constraints which were defined in its create statement are also dropped.
If you re-create a table which was dropped, it has to have a definition which conforms to the foreign key constraints referencing it. It must have the right column names and types, and it must have indexes on the referenced keys, as stated above. If these are not satisfied, MySQL returns error number 1005 and refers to errno 150 in the error message string.
Starting from version 3.23.50 InnoDB returns the foreign key definitions of a table when you call
SHOW CREATE TABLE yourtablenameThen also mysqldump produces correct definitions of tables to the dump file, and does not forget about the foreign keys.
You can also list the foreign key constraints for a table
T
with
SHOW TABLE STATUS FROM yourdatabasename LIKE 'T'The foreign key constraints are listed in the table comment of the output.
If you specify an auto-increment column for a table, then the InnoDB table handle in the data dictionary will contain a special counter called the auto-increment counter which is used in assigning new values for the column. The auto-increment counter is only stored in main memory, not on disk.
InnoDB uses the following algorith to initialize the auto-increment
counter. After a database startup, when a user the first time does
an insert to a table T
or calls SHOW TABLE STATUS
where the table T
is shown, then
InnoDB executes
SELECT MAX(auto-inc-column) FROM T FOR UPDATE,and assigns that value incremented by one to the the column and the auto-increment counter of the table. If the table is empty then the value 1 is assigned. Note that in this initialization we do a normal x-locking read on the table and the lock lasts to the end of the transaction.
InnoDB follows the same procedure in initializing the auto-increment counter for a freshly created table.
If the user specifies in an insert the value 0 to the auto-increment column, then InnoDB treats the row like the value would not have been specified and generates a new value to it.
After the auto-increment counter has been initialized, if a user inserts a row where he explicitly specifies the column value, and the value is bigger than the current counter value, then the counter is set to the specified column value. If the user does not explicitly specify a value, then InnoDB increments the counter by one and assigns its new value to the column.
When accessing the auto-increment counter InnoDB uses a special
table level lock AUTO-INC
lock which it keeps to the
end of the current SQL statement, not to the end of the transaction.
The special lock release strategy was introduced to improve concurrency
for inserts into a table containing an auto-increment column. Two
transactions cannot have the AUTO-INC
lock on the same table
simultaneously.
Note that you may get gaps in the auto-increment column number sequence if you roll back transactions which have got numbers from the counter.
The behavior of the auto-increment mechanism is not defined if a user gives a negative value to the column or if the value becomes bigger than the maximum integer that can be stored in the specified integer type.
To add a new data file to the tablespace you have to
shut down your MySQL database, edit the my.cnf file, adding a
new file to innodb_data_file_path
, and then start MySQL
again.
Currently you cannot remove a data file from InnoDB. To decrease the size of your database you have to use mysqldump to dump all your tables, create a new database, and import your tables to the new database.
If you want to change the number or the size of your InnoDB log files, you have to shut down MySQL and make sure that it shuts down without errors. Then copy the old log files into a safe place just in case something went wrong in the shutdown and you will need them to recover the database. Delete then the old log files from the log file directory, edit my.cnf, and start MySQL again. InnoDB will tell you at the startup that it is creating new log files.
The key to safe database management is taking regular backups.
InnoDB Hot Backup is an online backup tool you can use to backup your InnoDB database while it is running. InnoDB Hot Backup does not require you to shut down your database and it does not set any locks or disturb your normal database processing. InnoDB Hot Backup is a non-free additional tool whose annual license fee is 400 euros per computer where the MySQL server is run. See the InnoDB Hot Backup homepage for detailed information and screenshots.
If you are able to shut down your MySQL server, then to take a 'binary' backup of your database you have to do the following:
At database sites requiring high availability you can use the MySQL replication feature to keep a copy of your database. Replication works also with InnoDB type tables.
In addition to taking the binary backups described above, you should also regularly take dumps of your tables with mysqldump. The reason to this is that a binary file may be corrupted without you noticing it. Dumped tables are stored into text files which are human-readable and much simpler than database binary files. Seeing table corruption from dumped files is easier, and since their format is simpler, the chance for serious data corruption in them is smaller.
A good idea is to take the dumps at the same time you take a binary backup of your database. You have to shut out all clients from your database to get a consistent snapshot of all your tables into your dumps. Then you can take the binary backup, and you will then have a consistent snapshot of your database in two formats.
To be able to recover your InnoDB database to the present from the binary backup described above, you have to run your MySQL database with the binlogging of MySQL switched on. Then you can apply the binlog to the backup database to achieve point-in-time recovery:
mysqlbinlog yourhostname-bin.123 | mysql
To recover from a crash of your MySQL server process, the only thing you have to do is to restart it. InnoDB will automatically check the logs and perform a roll-forward of the database to the present. InnoDB will automatically roll back uncommitted transactions which were present at the time of the crash. During recovery, mysqld will print out something like the following:
heikki@donna:~/mysql-3.23.48/sql> mysqld 020204 23:08:31 InnoDB: Database was not shut down normally. InnoDB: Starting recovery from log files... InnoDB: Starting log scan based on checkpoint at InnoDB: log sequence number 0 177573790 InnoDB: Doing recovery: scanned up to log sequence number 0 177638912 InnoDB: Doing recovery: scanned up to log sequence number 0 177704448 InnoDB: Doing recovery: scanned up to log sequence number 0 177769984 InnoDB: Doing recovery: scanned up to log sequence number 0 177835520 InnoDB: Doing recovery: scanned up to log sequence number 0 177901056 InnoDB: Doing recovery: scanned up to log sequence number 0 177966592 InnoDB: Doing recovery: scanned up to log sequence number 0 178032128 InnoDB: Doing recovery: scanned up to log sequence number 0 178097664 InnoDB: Doing recovery: scanned up to log sequence number 0 178163200 InnoDB: Doing recovery: scanned up to log sequence number 0 178228736 InnoDB: After this prints a line for every 10th scan sweep: InnoDB: Doing recovery: scanned up to log sequence number 0 178884096 ... InnoDB: Doing recovery: scanned up to log sequence number 0 193302016 InnoDB: Doing recovery: scanned up to log sequence number 0 193957376 InnoDB: Doing recovery: scanned up to log sequence number 0 194612736 020204 23:08:40 InnoDB: Starting an apply batch of log records to the database. .. InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 7 3 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: Doing recovery: scanned up to log sequence number 0 195268096 InnoDB: Doing recovery: scanned up to log sequence number 0 195923456 ... InnoDB: Doing recovery: scanned up to log sequence number 0 203132416 InnoDB: Doing recovery: scanned up to log sequence number 0 203787776 InnoDB: Doing recovery: scanned up to log sequence number 0 204443136 InnoDB: 5 uncommitted transaction(s) which must be rolled back InnoDB: Trx id counter is 0 129792 InnoDB: Starting rollback of uncommitted transactions InnoDB: Rolling back trx with id 0 129400 InnoDB: Rolling back of trx id 0 129400 completed InnoDB: Rolling back trx with id 0 129217 InnoDB: Rolling back of trx id 0 129217 completed InnoDB: Rolling back trx with id 0 129098 InnoDB: Rolling back of trx id 0 129098 completed InnoDB: Rolling back trx with id 0 128743 InnoDB: Rolling back of trx id 0 128743 completed InnoDB: Rolling back trx with id 0 127939 InnoDB: Rolling back of trx id 0 127939 completed InnoDB: Rollback of uncommitted transactions completed 020204 23:08:51 InnoDB: Starting an apply batch of log records to the database. .. InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 7 3 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: Last MySQL binlog file offset 0 40418561, file name ./donna-bin.001 020204 23:08:53 InnoDB: Flushing modified pages from the buffer pool... 020204 23:09:03 InnoDB: Started mysqld: ready for connections
If your database gets corrupted or your disk fails, you have to do the recovery from a backup. In the case of corruption, you should first find a backup which is not corrupted. From a backup do the recovery from the general log files of MySQL according to instructions in the MySQL manual.
In some cases of database corruption it is enough just to dump, drop,
and recreate one or a few corrupt tables. You can use the
CHECK TABLE
SQL command to check if a table is
corrupt, though CHECK TABLE
naturally cannot detect all
kinds of corruption. You can use innodb_tablespace_monitor
to check the integrity of the file space management inside the data files.
In some cases apparent database page corruption is actually due to the operating system which has corrupted its own file cache, and the data on disk may be ok. It is best first to try rebooting your computer. It may remove the errors which appeared as database page corruption.
If there is database page corruption, you may want to dump
your tables from the database with SELECT INTO OUTFILE
,
and usually most of the data
is intact and correct. But the corruption may cause
SELECT * FROM table
, or InnoDB
background operations to crash or assert, or even the InnoDB
roll-forward recovery to crash. Starting from the InnoDB
version 3.23.44 there is a my.cnf option with which
you can force
InnoDB to start up, and you can also prevent background operations
from running, so that
you will be able to dump your tables. For example, you can set
set-variable = innodb_force_recovery = 4in my.cnf.
The alternatives for innodb_force_recovery
are listed below.
The database must not otherwise be used with these options!
As a safety measure InnoDB prevents a user from doing INSERT
,
UPDATE
, or DELETE
when this option is > 0.
Starting from version 3.23.53 and 4.0.4 you are allowed to
DROP
or CREATE
a table even if
forced recovery is used. If you know that certain table is causing
a crash in rollback, you can drop it.
You can use this also to stop a runaway rollback
caused by a failing mass import or ALTER TABLE
.
You can kill the mysqld process and
use the my.cnf option innodb_force_recovery=3
to bring your database up without the rollback.
Then DROP
the table which is causing the runaway rollback.
A bigger number below means that all precautions of lower numbers are included. If you are able to dump your tables with an option at most 4, then you are relatively safe that only some data on corrupt individual pages is lost. Option 6 is more dramatic, because database pages are left in an obsolete state, which in turn may introduce more corruption into B-trees and other database structures.
SELECT * FROM table
jump over corrupt index records and pages,
which helps in dumping tables;
InnoDB implements a checkpoint mechanism called a fuzzy checkpoint. InnoDB will flush modified database pages from the buffer pool in small batches, there is no need to flush the buffer pool in one single batch, which would in practice stop processing of user SQL statements for a while.
In crash recovery InnoDB looks for a checkpoint label written to the log files. It knows that all modifications to the database before the label are already present on the disk image of the database. Then InnoDB scans the log files forward from the place of the checkpoint applying the logged modifications to the database.
InnoDB writes to the log files in a circular fashion. All committed modifications which make the database pages in the buffer pool different from the images on disk must be available in the log files in case InnoDB has to do a recovery. This means that when InnoDB starts to reuse a log file in the circular fashion, it has to make sure that the database page images on disk already contain the modifications logged in the log file InnoDB is going to reuse. In other words, InnoDB has to make a checkpoint and often this involves flushing of modified database pages to disk.
The above explains why making your log files very big may save disk i/o in checkpointing. It can make sense to set the total size of the log files as big as the buffer pool or even bigger. The drawback in big log files is that crash recovery can last longer because there will be more log to apply to the database.
InnoDB data and log files are binary-compatible on all platforms
if the floating point number format on the machines is the same.
You can move an InnoDB database simply by copying all the relevant
files, which we already listed in the previous section on backing up
a database. If the floating point formats on the machines are
different but you have not used FLOAT
or DOUBLE
data types in your tables then the procedure is the same: just copy
the relevant files. If the formats are different and your tables
contain floating point data, you have to use mysqldump
and mysqlimport to move those tables.
A performance tip is to switch off the autocommit when you import data into your database, assuming your tablespace has enough space for the big rollback segment the big import transaction will generate. Do the commit only after importing a whole table or a segment of a table.
In the InnoDB transaction model the goal has been to combine the best properties of a multiversioning database to traditional two-phase locking. InnoDB does locking on row level and runs queries by default as non-locking consistent reads, in the style of Oracle. The lock table in InnoDB is stored so space-efficiently that lock escalation is not needed: typically several users are allowed to lock every row in the database, or any random subset of the rows, without InnoDB running out of memory.
In InnoDB all user activity happens inside transactions. If the autocommit mode is used in MySQL, then each SQL statement will form a single transaction. MySQL always starts a new connection with the autocommit mode switched on.
If the autocommit mode is
switched off with SET AUTOCOMMIT = 0
,
then we can think that a user always has a transaction
open. If he issues
the SQL COMMIT
or ROLLBACK
statement,
it ends the current transaction, and a new starts. Both statements
will release all InnoDB locks that were set during the
current transaction. A COMMIT
means that the
changes made in the current transaction are made permanent
and become visible to other users. A ROLLBACK
,
on the other hand, cancels all modifications made by the current
transaction.
If the connection has AUTOCOMMIT = 1
, then the user
can still perform a multi-statement transaction by starting it with
BEGIN
and ending it with COMMIT
or ROLLBACK
.
In terms of the SQL-1992 transaction isolation levels,
the InnoDB default is REPEATABLE READ
.
Starting from version 4.0.5, InnoDB offers all 4 different
transaction isolation levels described by the SQL-1992 standard.
You can set the default isolation level for all connections
in the [mysqld]
section of my.cnf:
transaction-isolation = {READ-UNCOMMITTED | READ-COMMITTED | REPEATABLE-READ | SERIALIZABLE}
A user can change the isolation level of a single session or all new incoming connections with the
SET [SESSION | GLOBAL] TRANSACTION ISOLATION LEVEL {READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE}SQL statement. Note that there are no hyphens in level names in the SQL syntax. If you specify the keyword
GLOBAL
in the above statement, it will determine the initial
isolation level of new incoming connections, but will not change
the isolation level of old connections.
Any user is free to change the isolation level of his session, even
in the middle of a transaction.
In versions < 3.23.50 SET TRANSACTION
had no effect
on InnoDB tables. In versions < 4.0.5 only REPEATABLE READ
and SERIALIZABLE
were available.
You can query the global and session transaction isolation levels with:
SELECT @@global.tx_isolation; SELECT @@tx_isolation;
In row level locking InnoDB uses so-called next-key locking. That means that besides index records, InnoDB can also lock the 'gap' before an index record to block insertions by other users immediately before the index record. A next-key lock means a lock which locks an index record and the gap before it. A gap lock means a lock which only locks a gap before some index record.
A detailed description of each isolation level in InnoDB:
READ UNCOMMITTED
This is also called
'dirty read': non-locking SELECT
s are performed
so that we do not look at a possible earlier version of a record;
thus they are not 'consistent' reads under this isolation level;
otherwise this level works like READ COMMITTED
.
READ COMMITTED
Somewhat Oracle-like isolation level.
All SELECT ... FOR UPDATE
and
SELECT ... LOCK IN SHARE MODE
statements
only lock the index records, NOT the gaps before them, and
thus allow free inserting of new records next to locked
records.
UPDATE
and DELETE
which use
a unique index with a unique search condition,
only lock the index record found, not the gap before it.
But still in range type
UPDATE
and DELETE
InnoDB
must set next-key or gap locks and block insertions
by other users to the
gaps covered by the range. This is necessary
since 'phantom rows' have to be blocked for MySQL
replication and recovery to work.
Consistent reads behave like in
Oracle: each consistent read, even within the same
transaction, sets and reads its own fresh snapshot.
REPEATABLE READ
This is the default isolation level of
InnoDB.
SELECT ... FOR UPDATE
, SELECT ... LOCK IN SHARE MODE
,
UPDATE
, and DELETE
which use
a unique index with a unique search condition,
only lock the index record found, not the gap before it.
Otherwise these operations employ next-key locking, locking
the index range scanned with next-key or gap locks, and
block new insertions by other users.
In consistent reads there is an important difference
from the previous isolation level: in this level
all consistent reads within the same transaction read the
same snapshot established by the first read. This convention
means that if you issue several plain SELECT
s
within the same transaction, these SELECT
s are
consistent also with respect to each other.
SERIALIZABLE
This level is like
the previous one, but
all plain SELECT
s are implicitly converted to
SELECT ... LOCK IN SHARE MODE
.
A consistent read means that InnoDB uses its multiversioning to present to a query a snapshot of the database at a point in time. The query will see the changes made by exactly those transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query will see the changes made by the transaction itself which issues the query.
If you are running with the default REPEATABLE READ
isolation level, then all consistent reads
within the same transaction read the snapshot established by the
first such read in that transaction.
You can get a fresher snapshot for your queries by committing
the current transaction and after that issuing new queries.
Consistent read is the default mode in which InnoDB processes
SELECT
statements in READ COMMITTED
and REPEATABLE READ
isolation levels.
A consistent read does not set any locks
on the tables it accesses, and therefore other users are free to
modify those tables at the same time a consistent read is being performed
on the table.
A consistent read is not convenient in some circumstances.
Suppose you want to add a new row into your table CHILD
,
and make sure that the child already has a parent in table
PARENT
.
Suppose you use a consistent read to read the table PARENT
and indeed see the parent of the child in the table. Can you now safely
add the child row to table CHILD
? No, because it may
happen that meanwhile some other user has deleted the parent row
from the table PARENT
, and you are not aware of that.
The solution is to perform the SELECT
in a locking
mode, LOCK IN SHARE MODE
.
SELECT * FROM PARENT WHERE NAME = 'Jones' LOCK IN SHARE MODE;
Performing a read in share mode means that we read the latest
available data, and set a shared mode lock on the rows we read.
If the latest data belongs to a yet uncommitted transaction of another
user, we will wait until that transaction commits.
A shared mode lock prevents others from updating or deleting
the row we have read. After we see that the above query returns
the parent 'Jones'
, we can safely add his child
to table CHILD
, and commit our transaction.
This example shows how to implement referential
integrity in your application code.
Let us look at another example: we have an integer counter field in
a table CHILD_CODES
which we use to assign
a unique identifier to each child we add to table CHILD
.
Obviously, using a consistent read
to read the present value of the counter is not a good idea, since
then two users of the database may see the same value for the
counter, and we will get a duplicate key error when we add
the two children with the same identifier to the table.
Using LOCK IN SHARE MODE
in the read is not a
good solution either because if two users read the counter at the
same time, then at least one of them will end up in deadlock
when he tries to update the counter.
In this case there are two good ways to implement the
reading and incrementing of the counter: (1) update the counter
first by incrementing it by 1 and only after that read it,
or (2) read the counter first with
a lock mode FOR UPDATE
, and increment after that:
SELECT COUNTER_FIELD FROM CHILD_CODES FOR UPDATE; UPDATE CHILD_CODES SET COUNTER_FIELD = COUNTER_FIELD + 1;
A SELECT ... FOR UPDATE
will read the latest
available data setting exclusive locks on each row it reads.
Thus it sets the same locks a searched SQL UPDATE
would set
on the rows.
In row level locking InnoDB uses an algorithm called next-key locking. InnoDB does the row level locking so that when it searches or scans an index of a table, it sets shared or exclusive locks on the index records in encounters. Thus the row level locks are more precisely called index record locks.
The locks InnoDB sets on index records may also affect the 'gap'
before that index record. If a user has a shared or exclusive
lock on record R in an index, then another user cannot insert
a new index record immediately before R in the index order.
This locking of gaps is done to prevent the so-called phantom
problem. Suppose I want to read and lock all children with identifier
bigger than 100 from table CHILD
,
and update some field in the selected rows.
SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;
Suppose there is an index on table CHILD
on column
ID
. Our query will scan that index starting from
the first record where ID
is bigger than 100.
Now, if the locks set on the index records would not lock out
inserts made in the gaps, a new child might meanwhile be
inserted to the table. If now I in my transaction execute
SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;
again, I will see a new child in the result set the query returns. This is against the isolation principle of transactions: a transaction should be able to run so that the data it has read does not change during the transaction. If we regard a set of rows as a data item, then the new 'phantom' child would break this isolation principle.
When InnoDB scans an index it can also lock the gap
after the last record in the index. Just that happens in the previous
example: the locks set by InnoDB will prevent any insert to
the table where ID
would be bigger than 100.
You can use next-key locking to implement a uniqueness check in your application: if you read your data in share mode and do not see a duplicate for a row you are going to insert, then you can safely insert your row and know that the next-key lock set on the successor of your row during the read will prevent anyone meanwhile inserting a duplicate for your row. Thus the next-key locking allows you to 'lock' the non-existence of something in your table.
SELECT ... FROM ...
: this is a consistent read, reading a
snapshot of the database and setting no locks,
unless the transaction isolation level is set to SERIALIZABLE
,
in which case this sets shared next-key locks on the index records it encounters.
SELECT ... FROM ... LOCK IN SHARE MODE
: sets shared
locks on all index records the read encounters.
SELECT ... FROM ... FOR UPDATE
: sets exclusive locks
on all index records the read encounters.
INSERT INTO ... VALUES (...)
: sets an exclusive lock
on the inserted row; note that this lock is not a next-key lock
and does not prevent other users from inserting to the gap before the
inserted row. If a duplicate key error occurs, sets a shared lock
on the duplicate index record.
AUTO_INCREMENT
column on a table, then
in initializing the auto-increment counter InnoDB sets an exclusive lock
on the end of the index associated with the auto-inc column. In accessing the
auto-increment counter InnoDB uses a specific table lock mode
AUTO-INC
where the lock lasts only to the end of the current
SQL statement, not to the end of the transaction.
INSERT INTO T SELECT ... FROM S WHERE ...
sets an exclusive
(non-next-key) lock on each row inserted into T
. Does
the search on S
as a consistent read, but sets shared
locks on S
if the MySQL logging is on. InnoDB has to set
locks in the latter case because in roll-forward recovery from a
backup every SQL statement has to be executed in exactly the same
way as it was done originally.
CREATE TABLE ... SELECT ...
performs the SELECT
as a consistent read or with shared locks, like in the previous
item.
REPLACE
is done like an insert if there is no collision
on a unique key. Otherwise, an exclusive lock is placed
on the row which has to be updated.
UPDATE ... SET ... WHERE ...
: sets an exclusive
lock on every record the search encounters.
DELETE FROM ... WHERE ...
: sets an exclusive
lock on every record the search encounters.
FOREIGN KEY
constraint is defined on a table,
any insert, update, or delete which requires checking of the constraint
condition sets shared record level locks on the records it
looks at to check the constraint. Also in the case where the
constraint fails, InnoDB sets these locks.
LOCK TABLES ...
: sets table locks. In the implementation
the MySQL layer of code sets these locks. The automatic deadlock detection
of InnoDB cannot detect deadlocks where such table locks are involved:
see the next section below. See also section 14 'InnoDB restrictions'
about the following: since MySQL does know about row level locks,
it is possible that you
get a table lock on a table where another user currently has row level
locks. But that does not put transaction integerity into danger.
SHOW TABLE STATUS
applied to an auto-increment table
sets an exclusive row level lock to
the high end of the auto-increment index. This means that
also SHOW TABLE STATUS
can cause a deadlock
of transactions, something which may surprise users.
Starting from version 3.23.50, it fetches the auto-inc column value
without setting any locks, except in the case when there have been
no inserts to the table after the database startup.
SET AUTOCOMMIT=0
.
In the
autocommit mode MySQL does a commit after each SQL statement, if
that statement did not return an error.
CREATE TABLE
(if MySQL binlogging is used),
ALTER TABLE, BEGIN, CREATE INDEX,
DROP DATABASE, DROP TABLE, RENAME TABLE,
TRUNCATE, LOCK TABLES, UNLOCK TABLES
.
The CREATE TABLE
statement in InnoDB is
processed as a single transaction. It means that a
ROLLBACK
from the user does not undo CREATE TABLE
statements the user made during his transaction.
COMMIT
of your transaction,
then MySQL will roll back your transaction.
InnoDB automatically detects a deadlock of transactions and rolls back a transaction or transactions to prevent the deadlock. Starting from version 4.0.5, InnoDB will try to pick small transactions to roll back. The size of a transaction is determined by the number of rows it has inserted, updated, or deleted. Previous to 4.0.5, InnoDB always rolled back the transaction whose lock request was the last one to build a deadlock, that is, a cycle in the waits-for graph of transactions.
InnoDB cannot detect deadlocks where a lock set by a MySQL
LOCK TABLES
statement is involved, or if a lock set
in another table handler than InnoDB is involved. You have to resolve
these situations using innodb_lock_wait_timeout
set in
my.cnf.
When InnoDB performs a complete rollback of a transaction, all the locks of the transaction are released. However, if just a single SQL statement is rolled back as a result of an error, some of the locks set by the SQL statement may be preserved. This is because InnoDB stores row locks in a format where it cannot afterwards know which was set by which SQL statement.
Suppose you are running on the default
REPEATABLE READ
isolation level.
When you issue a consistent read, that is, an ordinary SELECT
statement, InnoDB will give your transaction a timepoint according
to which your query sees the database. Thus, if transaction B deletes
a row and commits after your timepoint was assigned, then you will
not see the row deleted. Similarly with inserts and updates.
You can advance your timepoint by committing your transaction
and then doing another SELECT
.
This is called multiversioned concurrency control.
User A User B set autocommit=0; set autocommit=0; time | SELECT * FROM t; | empty set | INSERT INTO t VALUES (1, 2); | v SELECT * FROM t; empty set COMMIT; SELECT * FROM t; empty set; COMMIT; SELECT * FROM t; --------------------- | 1 | 2 | ---------------------
Thus user A sees the row inserted by B only when B has committed the insert, and A has committed his own transaction so that the timepoint is advanced past the the commit of B.
If you want to see the 'freshest' state of the database, you should use
the isolation level READ COMMITTED
. Or, you can use
a locking read:
SELECT * FROM t LOCK IN SHARE MODE;
Deadlocks are a classic problem in transactional databases, but they are not dangerous unless they are so frequent that you cannot run certain transactions at all. Normally you have to write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.
InnoDB uses automatic row level locking. You can get deadlocks even in the case of transactions which just insert or delete a single row. That is because these operations are not really 'atomic': they automatically set locks on the (possibly several) index records of the row inserted/deleted.
You can cope with deadlocks and reduce the number of them with the following tricks:
SHOW INNODB STATUS
in MySQL versions >= 3.23.52 and
>= 4.0.3 to determine the cause of the latest deadlock. That can help
to tune your application to avoid deadlocks.
SELECT ... FOR UPDATE
or
... LOCK IN SHARE MODE
, try using a lower isolation level
READ COMMITTED
.
EXPLAIN SELECT
to determine that MySQL picks appropriate
indexes for your queries.
SELECT
to return
data from an old
snapshot, do not add the clause FOR UPDATE
or
LOCK IN SHARE MODE
to it.
Using READ COMMITTED
isolation level is good here, because
each consistent read within the same transaction reads from its own
fresh snapshot.
LOCK TABLES t1 WRITE, t2 READ, ... ; [do something with tables t1 and t2
here]; UNLOCK TABLES
. Table level locks make you transactions to queue
nicely, and deadlocks are avoided. Note that LOCK TABLES
implicitly
starts a transaction, just like the command BEGIN
, and
UNLOCK TABLES
implicitly ends the transaction in a COMMIT
.
1. If the Unix top or the Windows Task Manager shows that the CPU usage percentage with your workload is less than 70 %, your workload is probably disk-bound. Maybe you are making too many transaction commits, or the buffer pool is too small. Making the buffer pool bigger can help, but do not set it bigger than 80 % of physical memory.
2. Wrap several modifications into one transaction. InnoDB must flush the log to disk at each transaction commit, if that transaction made modifications to the database. Since the rotation speed of a disk is typically at most 167 revolutions/second, that constrains the number of commits to the same 167/second per user if the disk does not fool the operating system.
3. If you can afford the loss of some latest committed transactions, you can
set the my.cnf parameter innodb_flush_log_at_trx_commit
to zero. InnoDB tries to flush the log anyway once in a second, though the flush
is not guaranteed.
4. Make your log files big, the combined size of the log files even as big as the buffer pool. When InnoDB has written the log files full, it has to write the modified contents of the buffer pool to disk in a checkpoint. Small log files will cause many unnecessary disk writes. The drawback in big log files is that recovery time will be longer.
5. Also the log buffer should be quite big, say 8 MB.
6. Use theVARCHAR
column type instead of
CHAR
if you are storing variable-length strings or
the column may contain many NULLs. A CHAR(n)
column
always takes n bytes to store data, even if the string is shorter,
or its value is NULL. Smaller tables fit better in the buffer pool
and reduce disk i/o.
7. (Relevant from 3.23.41 up.)
In some versions of Linux and other Unixes flushing files to disk with the Unix
fsync
and other similar methods is surprisingly slow.
The default method InnoDB uses is the fsync
function.
If you are not satisfied with the database write performance, you may
try setting innodb_flush_method
in my.cnf
to O_DSYNC
, though the O_DSYNC
option
seems to be slower on most systems.
8. In importing data to InnoDB, make sure that MySQL does not have
autocommit=1
on. Then every insert requires a log flush to disk.
Put before your plain SQL import file line
set autocommit=0;and after it
commit;
If you use the mysqldump option --opt
, you will get dump
files which are fast to import also to an InnoDB table, even without wrapping
them to the above set autocommit=0; ... commit;
wrappers.
9. Beware of big rollbacks of mass inserts: InnoDB uses the insert buffer to save disk i/o in inserts, but in a corresponding rollback no such mechanism is used. A disk-bound rollback can take 30 times the time of the corresponding insert. If a runaway rollback hits, you can use the trick explained in section 6.1 to stop it.
10. Beware also of other big disk-bound operations.
Use DROP TABLE
or TRUNCATE
(from MySQL-4.0 up) to empty a table, not
DELETE FROM yourtable
.
11. Use the multi-line INSERT
to reduce
communication overhead between the client and the server if you need
to insert many rows:
INSERT INTO yourtable VALUES (1, 2), (5, 5);This tip is of course valid for inserts into any table type, not just InnoDB.
12. If you have UNIQUE
constraints on
secondary keys, starting from 3.23.52 and 4.0.3 you
can speed up table imports by turning the uniqueness checks off
for a while in the import session:
SET UNIQUE_CHECKS=0;For big tables this saves a lot of disk i/o because InnoDB can then use its insert buffer to write secondary index records in a batch.
13. If you have FOREIGN KEY
constraints
in your tables, starting from 3.23.52 and 4.0.3 you
can speed up table imports by turning the foreign key checks off
for a while in the import session:
SET FOREIGN_KEY_CHECKS=0;For big tables this can save a lot of disk i/o.
Starting from version 3.23.42 InnoDB includes InnoDB Monitors which print information on the InnoDB internal state. Starting from versions 3.23.52 and 4.0.3 you can use a new SQL command
SHOW INNODB STATUSto fetch the output of the standard InnoDB Monitor to the SQL client. The data is useful in performance tuning.
Another way to use InnoDB Monitors is to let them continuosly write
data to the standard output of the server mysqld.
When switched on, InnoDB Monitors print data
(note: the MySQL client will not print anything)
about once every 15 seconds. An easy way to use them is to
start the mysqld executable from a command prompt. Otherwise
the output is directed to the error log file of the MySQL
server 'yourhostname'.err (on Windows mysql.err).
On Windows you must start mysqld-max from an MS-DOS prompt
with the --console
option to direct the output to the command prompt window.
The printed information includes data on:
You can make the standard InnoDB Monitor to write to the standard output of mysqld through the following SQL command:
CREATE TABLE innodb_monitor(a int) type = innodb;and stop it by
DROP TABLE innodb_monitor;The
CREATE TABLE
syntax is just a way to pass a command
to the InnoDB engine through the MySQL SQL parser: the created
table is not relevant at all for the InnoDB Monitor. If you shut down
the database when the monitor is running, and you want to start
the monitor again, you have to drop the
table before you can issue a new CREATE TABLE
to start the monitor.
In a similar way you can start innodb_lock_monitor
which is
otherwise the same as innodb_monitor
but it also prints a
lot of lock information. A separate innodb_tablespace_monitor
prints a list of created file segments existing in the tablespace
and also validates the tablespace allocation data structures.
Starting from 3.23.44 there is innodb_table_monitor
with which
you can print the contents of the internal data dictionary
of InnoDB.
A sample output of the InnoDB Monitor from version 3.23.52:
===================================== 020805 22:07:41 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 3 seconds ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 194, signal count 193 --Thread 7176 has waited at ../include/btr0btr.ic line 28 for 0.00 seconds the s emaphore: X-lock on RW-latch at 44d980bc created in file buf0buf.c line 354 a writer (thread id 7176) has reserved it in mode wait exclusive number of readers 1, waiters flag 1 Last time read locked in file ../include/btr0btr.ic line 28 Last time write locked in file ../include/btr0btr.ic line 28 Mutex spin waits 0, rounds 0, OS waits 0 RW-shared spins 77, OS waits 33; RW-excl spins 188, OS waits 161 ------------ TRANSACTIONS ------------ Trx id counter 0 657853517 Purge done for trx's n:o < 0 657853429 undo n:o < 0 80 Total number of lock structs in row lock hash table 22 020805 22:07:36 LATEST DETECTED DEADLOCK: *** (1) TRANSACTION: TRANSACTION 0 657853503, ACTIVE 0 sec, OS thread id 15373 inserting LOCK WAIT 3 lock struct(s), heap size 336 MySQL thread id 6, query id 3741 localhost heikki update insert into ibtest11b (D, B, C) values (5, 'khdkkkk' ,'khdkkkk') *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 104865 n bits 208 table test/ibtest11b index PRI MARY trx id 0 657853503 lock_mode X waiting Record lock, heap no 1 RECORD: info bits 0 0: len 9; hex 73757072656d756d00; asc supremum.;; *** (2) TRANSACTION: TRANSACTION 0 657853500, ACTIVE 0 sec, OS thread id 11275 setting auto-inc lock 19 lock struct(s), heap size 2672, undo log entries 5 MySQL thread id 2, query id 3750 localhost heikki update insert into ibtest11b (D, B, C) values (5, 'khD' ,'khD') *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 0 page no 104865 n bits 200 table test/ibtest11b index PRI MARY trx id 0 657853500 lock_mode X Record lock, heap no 1 RECORD: info bits 0 0: len 9; hex 73757072656d756d00; asc supremum.;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: TABLE LOCK table test/ibtest11b trx id 0 657853500 lock_mode AUTO-INC waiting *** WE ROLL BACK TRANSACTION (2) LIST OF TRANSACTIONS FOR EACH SESSION: ---TRANSACTION 0 657853516, ACTIVE 5 sec, OS thread id 15373 setting auto-inc lo ck LOCK WAIT 1 lock struct(s), heap size 336 MySQL thread id 6, query id 3895 localhost heikki update insert into ibtest11b (D, B, C) values (5, 'khdkkkk' ,'khdkkkk') ------- TRX HAS BEEN WAITING 5 SEC FOR THIS LOCK TO BE GRANTED: TABLE LOCK table test/ibtest11b trx id 0 657853516 lock_mode AUTO-INC waiting ------------------ ---TRANSACTION 0 657853514, ACTIVE 5 sec, OS thread id 11275 inserting LOCK WAIT 13 lock struct(s), heap size 2672, undo log entries 2 MySQL thread id 2, query id 3898 localhost heikki update insert into ibtest11d (D, B, C) values (5, 'khdkkkk' ,'khdkkkk') ------- TRX HAS BEEN WAITING 5 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 104879 n bits 384 table test/ibtest11d index B t rx id 0 657853514 lock_mode X gap type lock waiting Record lock, heap no 130 RECORD: info bits 32 0: len 9; hex 6b48646b6b6b6b6b6b; asc kHdkkkkkk;; 1: ------------------ ---TRANSACTION 0 657853512, ACTIVE 5 sec, OS thread id 14348 updating or deletin g 20 lock struct(s), heap size 2672, undo log entries 175 MySQL thread id 5, query id 3874 localhost heikki updating delete from ibtest11a where A = 215 -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request I/O thread 1 state: waiting for i/o request I/O thread 2 state: waiting for i/o request I/O thread 3 state: waiting for i/o request Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 272 OS file reads, 56 OS file writes, 29 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf for space 0: size 1, free list len 5, seg size 7, 0 inserts, 0 merged recs, 0 merges Hash table size 124633, used cells 1530, node heap has 4 buffer(s) 2895.70 hash searches/s, 126.62 non-hash searches/s --- LOG --- Log sequence number 19 3267291494 Log flushed up to 19 3267283711 Last checkpoint at 19 3266545677 0 pending log writes, 0 pending chkp writes 30 log i/o's done, 0.00 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 82593970; in additional pool allocated 1406336 Buffer pool size 1920 Free buffers 1711 Database pages 205 Modified db pages 39 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages read 178, created 27, written 50 0.00 reads/s, 0.00 creates/s, 0.00 writes/s Buffer pool hit rate 1000 / 1000 -------------- ROW OPERATIONS -------------- 1 queries inside InnoDB, 0 queries in queue; main thread: purging Number of rows inserted 2008, updated 264, deleted 162, read 9 0.00 inserts/s, 0.00 updates/s, 14.66 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================Some notes on the output:
innodb_thread_concurrency
smaller than the default 8
can help in such situations.
Since InnoDB is a multiversioned database, it must keep information of old versions of rows in the tablespace. This information is stored in a data structure we call a rollback segment after an analogous data structure in Oracle.
InnoDB internally adds two fields to each row stored in the database. A 6-byte field tells the transaction identifier for the last transaction which inserted or updated the row. Also a deletion is internally treated as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, then the undo log record contains the information necessary to rebuild the content of the row before it was updated.
InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are only needed in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, and they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.
You must remember to commit your transactions regularly, also those transactions which only issue consistent reads. Otherwise InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space need for your rollback segment.
In our multiversioning scheme a row is not physically removed from the database immediately when you delete it with an SQL statement. Only when InnoDB can discard the update undo log record written for the deletion, it can also physically remove the corresponding row and its index records from the database. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement which did the deletion.
MySQL stores its data dictionary information of tables
in .frm
files in database directories. But every InnoDB type table
also has its own entry in InnoDB internal data dictionaries
inside the tablespace. When MySQL drops a table or a database,
it has to delete both a .frm file or files, and
the corresponding entries inside the InnoDB data dictionary.
This is the reason why you cannot move InnoDB tables between
databases simply by moving the .frm files, and why
DROP DATABASE
did not work for InnoDB type tables
in MySQL versions <= 3.23.43.
Every InnoDB table has a special index called the clustered index
where the data of the rows is stored. If you define a
PRIMARY KEY
on your table, then the index of
the primary key will be the clustered index.
If you do not define a PRIMARY KEY
for
your table, MySQL will pick the first UNIQUE
key with only NOT NULL
columns as the primary key,
and InnoDB will use the index of that key as the clustered index.
If there is no such key in the table,
InnoDB will internally generate a clustered index
where the rows are ordered by the row id InnoDB assigns
to the rows in such a table. The row id is a 6-byte field which
monotonically increases as new rows are inserted. Thus the rows
ordered by the row id will be physically in the insertion order.
Accessing a row through the clustered index is fast, because the row data will be on the same page where the index search leads us. In many databases the data is traditionally stored on a different page from the index record. If a table is large, the clustered index architecture often saves a disk i/o when compared to the traditional solution.
The records in non-clustered indexes (we also call them secondary indexes), in InnoDB contain the primary key value for the row. InnoDB uses this primary key value to search for the row from the clustered index. Note that if the primary key is long, the secondary indexes will use more space.
InnoDB compares CHAR
and VARCHAR
strings
of different length so that the remaining length in the shorter
string is thought of as padded with spaces.
All indexes in InnoDB are B-trees where the index records are stored in the leaf pages of the tree. The default size of an index page is 16 kB. When new records are inserted, InnoDB tries to leave 1 / 16 of the page free for future insertions and updates of the index records.
If index records are inserted in a sequential (ascending or descending) order, the resulting index pages will be about 15/16 full. If records are inserted in a random order, then the pages will be 1/2 - 15/16 full. If the fillfactor of an index page drops below 1/2, InnoDB will try to contract the index tree to free the page.
It is a common situation in a database application that the primary key is a unique identifier and new rows are inserted in the ascending order of the primary key. Thus the insertions to the clustered index do not require random reads from a disk.
On the other hand, secondary indexes are usually non-unique and insertions happen in a relatively random order into secondary indexes. This would cause a lot of random disk i/o's without a special mechanism used in InnoDB.
If an index record should be inserted to a non-unique secondary index, InnoDB checks if the secondary index page is already in the buffer pool. If that is the case, InnoDB will do the insertion directly to the index page. But, if the index page is not found from the buffer pool, InnoDB inserts the record to a special insert buffer structure. The insert buffer is kept so small that it entirely fits in the buffer pool, and insertions can be made to it very fast.
The insert buffer is periodically merged to the secondary index trees in the database. Often we can merge several insertions on the same page in of the index tree, and hence save disk i/o's. It has been measured that the insert buffer can speed up insertions to a table up to 15 times.
If a database fits almost entirely in main memory, then the fastest way to perform queries on it is to use hash indexes. InnoDB has an automatic mechanism which monitors index searches made to the indexes defined for a table, and if InnoDB notices that queries could benefit from building of a hash index, such an index is automatically built.
But note that the hash index is always built based on an existing B-tree index on the table. InnoDB can build a hash index on a prefix of any length of the key defined for the B-tree, depending on what search pattern InnoDB observes on the B-tree index. A hash index can be partial: it is not required that the whole B-tree index is cached in the buffer pool. InnoDB will build hash indexes on demand to those pages of the index which are often accessed.
In a sense, through the adaptive hash index mechanism InnoDB adapts itself to ample main memory, coming closer to the architecture of main memory databases.
CHAR(10)
, in a fixed length format also internally.
From VARCHAR
columns InnoDB truncates trailing
spaces. Note that MySQL may internally convert CHAR
columns
to VARCHAR
. See the MySQL manual about 'Silent column
specification changes'.
InnoDB uses simulated asynchronous disk i/o built into InnoDB: InnoDB creates a number of i/o threads to take care of i/o operations, such as read-ahead.
Starting from 3.23.40b InnoDB uses a novel file flush technique called 'doublewrite'. It adds safety to crash recovery after an operating system crash or a power outage, and improves performance on most Unix flavors by reducing the need for fsync operations.
Doublewrite means that InnoDB before writing pages to a data file first writes them to a contiguous tablespace area called the doublewrite buffer. Only after the write and the flush to the doublewrite buffer has completed, InnoDB writes the pages to their proper positions in the data file. If the operating system crashes in the middle of a page write, InnoDB will in recovery find a good copy of the page from the doublewrite buffer.
Starting from 3.23.41
you can also use a raw disk partition (a raw device) as a data file.
When you create a new data file you have
to put the keyword newraw
immediately after the data
file size in innodb_data_file_path
. The partition must be
>= than you specify as the size. Note that 1M in InnoDB is
1024 x 1024 bytes, while in disk specifications 1 MB usually means
1000 000 bytes.
innodb_data_home_dir= innodb_data_file_path=/dev/hdd1:3Gnewraw;/dev/hdd2:2Gnewraw
When you start the database again you MUST change the keyword
to raw
. Otherwise InnoDB will write over your
partition! Starting from 3.23.44, as a safety measure InnoDB
prevents a user from modifying data when any partition with
newraw
is specified. After you have added a new
partition, shut down the database, edit my.cnf replacing
newraw
with raw
,
and restart.
innodb_data_home_dir= innodb_data_file_path=/dev/hdd1:3Graw;/dev/hdd2:2GrawBy using a raw disk you can on Windows and on some Unixes perform non-buffered i/o.
There are two read-ahead heuristics in InnoDB: sequential read-ahead and random read-ahead. In sequential read-ahead InnoDB notices that the access pattern to a segment in the tablespace is sequential. Then InnoDB will post in advance a batch of reads of database pages to the i/o system. In random read-ahead InnoDB notices that some area in a tablespace seems to be in the process of being fully read into the buffer pool. Then InnoDB posts the remaining reads to the i/o system.
The data files you define in the configuration file form the tablespace of InnoDB. The files are simply catenated to form the tablespace, there is no striping in use. Currently you cannot directly instruct where the space is allocated for your tables, except by using the following fact: from a newly created tablespace InnoDB will allocate space starting from the low end.
The tablespace consists of database pages whose default size is 16 kB. The pages are grouped into extents of 64 consecutive pages. The 'files' inside a tablespace are called segments in InnoDB. The name of the rollback segment is somewhat misleading because it actually contains many segments in the tablespace.
For each index in InnoDB we allocate two segments: one is for non-leaf nodes of the B-tree, the other is for the leaf nodes. The idea here is to achieve better sequentiality for the leaf nodes, which contain the data.
When a segment grows inside the tablespace, InnoDB allocates the first 32 pages to it individually. After that InnoDB starts to allocate whole extents to the segment. InnoDB can add to a large segment up to 4 extents at a time to ensure good sequentiality of data.
Some pages in the tablespace contain bitmaps of other pages, and therefore a few extents in an InnoDB tablespace cannot be allocated to segments as a whole, but only as individual pages.
When you issue a query SHOW TABLE STATUS FROM ... LIKE ...
to ask for available free space in the tablespace, InnoDB will
report you the space which is certainly usable in totally free extents
of the tablespace. InnoDB always reserves some extents for
clean-up and other internal purposes; these reserved extents are not
included in the free space.
When you delete data from a table, InnoDB will contract the corresponding B-tree indexes. It depends on the pattern of deletes if that frees individual pages or extents to the tablespace, so that the freed space is available for other users. Dropping a table or deleting all rows from it is guaranteed to release the space to other users, but remember that deleted rows can be physically removed only in a purge operation after they are no longer needed in transaction rollback or consistent read.
If there are random insertions or deletions
in the indexes of a table, the indexes
may become fragmented. By fragmentation we mean that the physical ordering
of the index pages on the disk is not close to the alphabetical ordering
of the records on the pages, or that there are many unused pages in the
64-page blocks which were allocated to the index.
It can speed up index scans if you
periodically use mysqldump
to dump the table to
a text file, drop the table, and reload it from the dump.
Another way to do the defragmenting is to alter the table type to
MyISAM
and back to InnoDB
again.
Note that a MyISAM
table must fit in a single file
on your operating system.
If the insertions to and index are always ascending and records are deleted only from the end, then the the file space management algorithm of InnoDB guarantees that fragmentation in the index will not occur.
The error handling in InnoDB is not always the same as specified in the ANSI SQL standards. According to the ANSI standard, any error during an SQL statement should cause the rollback of that statement. InnoDB sometimes rolls back only part of the statement, and in some cases the whole transaction. The following list specifies the error handling of InnoDB.
'Table is full'
error
and InnoDB rolls back the SQL statement.
INSERT INTO ... SELECT ...
.
This will probably change so that the SQL statement will be rolled
back if you have not specified the IGNORE
option in your
statement.
In Unix, to print the meaning of an operating system error number, use the perror program which comes with the MySQL distribution.
The following table provides a list of some common Linux system error codes.
The following table provides a list of some common Windows system error codes.
ALTER TABLE
or CREATE INDEX
to modify tables which have
foreign key constraints or which are referenced by foreign key constraints.
Use DROP TABLE
and CREATE TABLE
instead.
'user'
or 'host'
to the InnoDB type. The
system tables must always be of the MyISAM type.
SELECT COUNT(*) FROM T
InnoDB has to
scan one index of the table, which will take some time if the table
is not entirely in the buffer pool. To get a fast count you have to
use a counter table you create yourself, and let your application update
it according
to the inserts and deletes it does. A way to eliminate the bottleneck
posed by lock waits for the counter is to create a whole set of counters.
The application can choose one at random each time. To get the count,
just sum the counters: SELECT SUM(counter_column) FROM
your_counter_table
.
AUTO_INCREMENT=...
in a
CREATE TABLE
statement. This clause is used to set the first value
for an auto-increment column (the default first value is 1). Workaround:
insert the first row with the auto-inc column value explicitly specified.
After that InnoDB starts incrementing from that value.
SHOW TABLE STATUS
does not give accurate statistics
on InnoDB tables, except for the physical size reserved by the table.
The row count is only a rough estimate used in SQL optimization.
load table from master
does not
work yet for
InnoDB tables. A workaround is to alter the table to MyISAM in the master,
do then the load, and after that alter back to InnoDB in the master.
CREATE TABLE T (A CHAR(20), B INT, INDEX T_IND (A(5))) TYPE = InnoDB;
InnoDB will internally create an index on the whole column, not just the
prefix.
INSERT DELAYED
is not supported for InnoDB tables.
LOCK TABLES
operation does not know of InnoDB
row level locks set in already completed SQL statements: this means that
you can get a table lock on a table even if there still exist transactions
of other users which have row level locks on the same table. Thus
your operations on the table may have to wait if they collide with
these locks of other users. Also a deadlock is possible. However,
this does not endanger transaction integrity, because the row level
locks set by InnoDB will always take care of the integrity.
Also, a table lock prevents other transactions from acquiring more
row level locks (in a conflicting lock mode) on the table.
BLOB
or TEXT
column.
DELETE FROM TABLE
does not regenerate the table but instead
deletes all rows, one by one, which is not that fast. In future versions
of MySQL you can use TRUNCATE
which is fast.
DROP DATABASE
with InnoDB tables one has to drop
the individual InnoDB tables in the database with DROP TABLE
.
This restriction is removed in versions >= 3.23.44.
UNIV_PAGE_SIZE
and UNIV_PAGE_SIZE_SHIFT
in univ.i.
The maximun row length is slightly less than half of a database page
in versions <= 3.23.39a of InnoDB. Starting from source
release 3.23.39b (but not yet in the MySQL -Max 3.23.40 binary release)
BLOB and
TEXT columns are allowed to be < 4 GB, and the total row length must also be
< 4 GB. InnoDB does not store fields whose size is <= 128 bytes on separate
pages. After InnoDB has modified the row by storing long fields on
separate pages, the remaining length of the row must be less
than half a database page. The maximun key length is 500 bytes.
--console
option to direct the output to the MS-DOS prompt
window.
innodb_monitor
to
print information about the internal state of InnoDB. If the problem
is with locks, use innodb_lock_monitor
. If the
problem is in creation of tables or other data dictionary
operations, use innodb_table_monitor
to
print the contents of the InnoDB internal data dictionary.
CHECK TABLE
on that table.
A specific problem with tables is that MySQL keeps its own
data dictionary information in .frm files it stores
into the database directories, while InnoDB stores its own
information into InnoDB's own data dictionary inside the
data files. If you move .frm
files around, or
use DROP DATABASE
in MySQL versions < 3.23.44,
or the server crashes in the middle of a data dictionary
operation, then the .frm files may end up to be
out-of-sync with the InnoDB internal data dictionary.
A symptom of an out-of-sync data dictionary is
that a CREATE TABLE
statement fails.
Then you should look into the error log. If it says
that the table already exist inside the InnoDB
internal data dictionary, then you have an orphaned
table inside InnoDB data files, without a corresponding
.frm file.
InnoDB: Error: table test/parent already exists in InnoDB internal InnoDB: data dictionary. Have you deleted the .frm file InnoDB: and not used DROP TABLE? Have you used DROP DATABASE InnoDB: for InnoDB tables in MySQL version <= 3.23.43? InnoDB: See the Restrictions section of the InnoDB manual. InnoDB: You can drop the orphaned table inside InnoDB by InnoDB: creating an InnoDB table with the same name in another InnoDB: database and moving the .frm file to the current database. InnoDB: Then MySQL thinks the table exists, and DROP TABLE will InnoDB: succeed.You can drop the orphaned table by following the above instructions in the error log.
Another symptom of an out-of-sync data dictionary is that MySQL prints an error that it cannot open a file yourtablename.InnoDB.
ERROR 1016: Can't open file: 'child2.InnoDB'. (errno: 1)From the error log you find:
InnoDB: Cannot find table test/child2 from the internal data dictionary InnoDB: of InnoDB though the .frm file for the table exists. Maybe you InnoDB: have deleted and recreated InnoDB data files but have forgotten InnoDB: to delete the corresponding .frm files of InnoDB tables?This means that there is an orphaned .frm file without a corresponding table inside InnoDB. You can drop the orphaned .frm file by deleting it manually.
If MySQL crashes in the middle of an ALTER TABLE
operation, you
may end up with an orphaned temporary table inside the InnoDB tablespace.
With innodb_table_monitor
you see a table whose
name is #sql...
, but since MySQL does not allow
accessing any table with such a name, you cannot dump or drop it.
The solution is to use a special mechanism available starting
from version 3.23.48 of InnoDB.
If you have an orphaned table #sql...
inside
the tablespace, then by calling
CREATE TABLE `rsql..._recover_innodb_tmp_table`(...) type = innodb;where the table definition is similar to the temporary table, you can make InnoDB to rename the orphaned table to
`rsql..._recover_innodb_tmp_table`
. Then you can dump and
drop the renamed table. The backquotes around the table name are needed
because a temporary table name contains the character '-'
.
MySQL/InnoDB-4.0.5, November 15(?), 2002
MySQL/InnoDB-3.23.53, October 14, 2002
MySQL/InnoDB-4.0.4, October 2, 2002
MySQL/InnoDB-4.0.3, August 28, 2002
MySQL/InnoDB-3.23.52, August 16, 2002
MySQL/InnoDB-4.0.2, July 10, 2002
MySQL/InnoDB-3.23.51, June 12, 2002
MySQL/InnoDB-3.23.50, April 23, 2002
MySQL/InnoDB-3.23.49, February 17, 2002
MySQL/InnoDB-3.23.48, February 9, 2002
MySQL/InnoDB-3.23.47, December 28, 2001
MySQL/InnoDB-4.0.1, December 23, 2001
MySQL/InnoDB-3.23.46, November 30, 2001
MySQL/InnoDB-3.23.45, November 23, 2001
MySQL/InnoDB-3.23.44, November 2, 2001
MySQL/InnoDB-3.23.43, October 4, 2001
MySQL/InnoDB-3.23.42, September 9, 2001
MySQL/InnoDB-3.23.41, August 13, 2001:
MySQL/InnoDB-3.23.40, July 16, 2001:
MySQL/InnoDB-3.23.39, June 13, 2001:
MySQL/InnoDB-3.23.38, May 12, 2001:
Contact information of Innobase Oy, producer of the InnoDB engine:
Website: www.innodb.com Heikki.Tuuri@innodb.com phone: 358-9-6969 3250 (office) 358-40-5617367 (mobile) Innobase Oy Inc. World Trade Center Helsinki Aleksanterinkatu 17 P.O.Box 800 00101 Helsinki Finland
The GPL license text.
See a separate web page.
See a separate web page.