Go to the first, previous, next, last section, table of contents.

7 MySQL テーブル型

MySQL Version 3.23.6では、ISAM、HEAP そして MyISAMの三つの基本的なテーブル形式を使うことができます。もっと新しいバージョンでは、コンパイルする方法によってInnoDB や BDB といった形式もサポートしています。

新しいテーブルを作成する際にどのテーブル形式を使用するか指定すればよろしいわけですが、無指定の場合は通常 MyISAM になります。

MySQL will always create a `.frm' file to hold the table and column definitions. The table's index and data will be stored in one or more other files, depending on the table type.

If you try to use a table type that is not compiled-in or activated, MySQL will instead create a table of type MyISAM. This behaviour is convenient when you want to copy tables between MySQL servers that support different table types. (Perhaps your master server supports transactional storage engines for increased safety, while the slave servers use only non-transactional storage engines for greater speed.)

This automatic change of table types can be confusing for new MySQL users. We plan to fix this by introducing warnings in the new client-server protocol in version 4.1 and generating a warning when a table type is automatically changed.

ALTER TABLE 文を使用すれば、テーブルを違う形式に変更できます。「6.5.4 ALTER TABLE 構文」節参照.

MySQL では、二つの違う種類のテーブルをサポートしていることに注意してください。一つはトランザクションセーフのテーブル(BDB, InnoDB)、もう一つはトランザクションを持たないテーブル (HEAP, ISAM, MERGE, and MyISAM).

トランザクションセーフのテーブル(Transaction Safe Tables) の利点は(TST):

Safer. Even if MySQL crashes or you get hardware problems, you can get your data back, either by automatic recovery or from a backup the transaction log.
You can combine many statements and accept these all in one go with the COMMIT command.
You can execute ROLLBACK to ignore your changes (if you are not running in auto-commit mode).
If an update fails, all your changes will be restored. (With NTST tables all changes that have taken place are permanent)
Can provide better concurrency if the table gets many updates concurrently with reads.

Note that to use InnoDB tables you have to use at least the innodb_data_file_path startup option. 「7.5.2 InnoDB 起動オプション」節参照.

非トランザクションセーフ(Not Transaction Safe Tables)のテーブルの利点は(NTST):

Much faster as there is no transaction overhead.
Will use less disk space as there is no overhead of transactions.
Will use less memory to do updates.

You can combine TST and NTST tables in the same statements to get the best of both worlds.

7.1 `MyISAM` テーブル (3.23.0以上)

MyISAM は、MySQL Version 3.23 でのデフォルトのテーブル形式です．これは ISAM コードを基にし、多くの便利な拡張機能を持っています。

インデックスは .MYI (MYIndex) 拡張子のつくファイルに保存され、データは、 .MYD (MYData) 拡張子のつくファイルに保存されます。 myisamchk ユーティリティを使用して、 MyISAM テーブルの検査・修復が可能です。「4.4.6.7 Using myisamchk for Crash Recovery」節参照. myisampack コマンドを使用して、MyISAM を圧縮して小さくすることが可能です「4.7.4 myisampack, MySQL の圧縮された読み込み専用テーブルジェネレータ」節参照.

The following is new in MyISAM:

There is a flag in the MyISAM file that indicates whether the table was closed correctly. If mysqld is started with --myisam-recover, MyISAM tables will automatically be checked and/or repaired on open if the table wasn't closed properly.
You can INSERT new rows in a table that doesn't have free blocks in the middle of the datafile, at the same time other threads are reading from the table (concurrent insert). An free block can come from an update of a dynamic length row with much data to a row with less data or when deleting rows. When all free blocks are used up, all future inserts will be concurrent again.
大ファイル (63 bit) のサポート。ただし、filesystems/operating systems が大ファイルをサポートしている場合。
全データは、下位バイトが先にかかれます。これは、データを、マシン・OS 非依存にしました。 The only requirement is that the machine uses two's-complement signed integers (as every machine for the last 20 years has) and IEEE floating-point format (also totally dominant among mainstream machines). The only area of machines that may not support binary compatibility are embedded systems (because they sometimes have peculiar processors). There is no big speed penalty in storing data low byte first; the bytes in a table row is normally unaligned and it doesn't take that much more power to read an unaligned byte in order than in reverse order. The actual fetch-column-value code is also not time critical compared to other code.
全数値キーは高位バイトが先に書かれます。これはインデックスの圧縮率を良くします。
Internal handling of one AUTO_INCREMENT column. MyISAM will automatically update this on INSERT/UPDATE. The AUTO_INCREMENT value can be reset with myisamchk. This will make AUTO_INCREMENT columns faster (at least 10%) and old numbers will not be reused as with the old ISAM. Note that when an AUTO_INCREMENT is defined on the end of a multi-part-key the old behaviour is still present.
When inserted in sorted order (as when you are using an AUTO_INCREMENT column) the key tree will be split so that the high node only contains one key. This will improve the space utilisation in the key tree.
BLOB と TEXT フィールドにインデックスが張れます
NULL 値をインデックスの張られたフィールドに許します． This takes 0-1 bytes/key.
現在、キーの最大長はデフォルトで 500 バイトです。 (再コンパイルで変更可能). In cases of keys longer than 250 bytes, a bigger key block size than the default of 1024 bytes is used for this key.
1テーブルあたりの最大のキーの数は 32個がデフォルトです. これは myisamchk を再コンパイル無しに、64個まで増やすことが出来ます.
myisamchk を --update-state オプションで実行すると、 myisamchk はテーブルにチェックをおこなったマークをつけます。 myisamchk --fast は、このマークが無いテーブルだけをチェックします。
myisamchk -a stores statistics for key parts (and not only for whole keys as in ISAM).
Dynamic size rows will now be much less fragmented when mixing deletes with updates and inserts. This is done by automatically combining adjacent deleted blocks and by extending blocks if the next block is deleted.
myisampack は BLOB と VARCHAR フィールドをパックすることが可能です。
You can use put the datafile and index file on different directories to get more speed (with the DATA/INDEX DIRECTORY="path" option to CREATE TABLE). 「6.5.3 CREATE TABLE 構文」節参照.

MyISAM also supports the following things, which MySQL will be able to use in the near future:

真の VARCHAR 型のサポート; a VARCHAR column starts with a length stored in 2 bytes.
Tables with VARCHAR may have fixed or dynamic record length.
VARCHAR and CHAR may be up to 64K. All key segments have their own language definition. This will enable MySQL to have different language definitions per column.
A hashed computed index can be used for UNIQUE. This will allow you to have UNIQUE on any combination of columns in a table. (You can't search on a UNIQUE computed index, however.)

Note that index files are usually much smaller with MyISAM than with ISAM. This means that MyISAM will normally use less system resources than ISAM, but will need more CPU time when inserting data into a compressed index.

mysqld に与える以下のオプションは、 MyISAM テーブルの振る舞いを変更できます. 「4.5.6.4 SHOW VARIABLES」節参照.

Option	Description
`--myisam-recover=#`	壊れたテーブルを自動でリカバリ.
`-O myisam_sort_buffer_size=#`	リカバリに使用されるバッファ
`--delay-key-write=ALL`	Don't flush key buffers between writes for any MyISAM table
`-O myisam_max_extra_sort_file_size=#`	Used to help MySQL to decide when to use the slow but safe key cache index create method. Note that this parameter is given in megabytes before 4.0.3 and in bytes starting from this version.
`-O myisam_max_sort_file_size=#`	Don't use the fast sort index method to created index if the temporary file would get bigger than this. Note that this parameter is given in megabytes before 4.0.3 and in bytes starting from this version.
`-O bulk_insert_buffer_size=#`	Size of tree cache used in bulk insert optimisation. Note that this is a limit per thread!

automatic recovery は mysqld を --myisam-recover=# オプション (3.23.26以上) で起動したときに有効になります. 「4.1.1 mysqld コマンド行オプション」節参照. On open, the table is checked if it's marked as crashed or if the open count variable for the table is not 0 and you are running with --skip-external-locking.(4.0.3以前では --skip-locking) 上のどれかに当てはまる場合、以下を行ないます。

テーブルはエラーのためにチェックされます.
もしエラーが見つかったなら、そのテーブルの fast repair (sort はするけれどもデータファイルの再構築は行なわない) を行ないます.
もし修復がデータファイルのエラーのために失敗したら (例えば duplicate key error)、再びリペアを試みますが、この時は、データファイルを再構築します.
もしリペアが失敗したら、さらにもう一度、古い修復方法(ソート無しに、レコード単位に書く)で、修復を試みます。この方法は、いかなるタイプのエラーでも、少ないディスクで修復可能です。

If the recover wouldn't be able to recover all rows from a previous completed statement and you didn't specify FORCE as an option to myisam-recover, then the automatic repair will abort with an error message in the error file:

Error: Couldn't repair table: test.g00pages

この FORCE を使用した場合には、エラーメッセージは以下のようになります：

Warning: Found 344 of 354 rows when repairing ./test/g00pages

もしあなたが自動修復を BACKUP で実行しているなら、あなたは、データベースのディレクトリから `tablename-datetime.BAK' のようなファイルを自動的にバックアップメディアに移動する cron スクリプトを作るべきです。

「4.1.1 mysqld コマンド行オプション」節参照.

7.1.1 Space Needed for Keys

MySQL can support different index types, but the normal type is ISAM or MyISAM. これらは B-tree index を使用します。インデックスファイルのサイズは、 (key_length+4)/0.67 でおおざっぱに計算できます。 (This is for the worst case when all keys are inserted in sorted order and we don't have any compressed keys.)

文字インデックスは圧縮されます。 If the first index part is a string, it will also be prefix compressed. Space compression makes the index file smaller than the above figures if the string column has a lot of trailing space or is a VARCHAR column that is not always used to the full length. Prefix compression is used on keys that start with a string. Prefix compression helps if there are many strings with an identical prefix.

In MyISAM tables, you can also prefix compress numbers by specifying PACK_KEYS=1 when you create the table. This helps when you have many integer keys that have an identical prefix when the numbers are stored high-byte first.

7.1.2 `MyISAM` Table Formats

MyISAM supports 3 different table types. Two of them are chosen automatically depending on the type of columns you are using. The third, compressed tables, can only be created with the myisampack tool.

When you CREATE or ALTER a table you can for tables that doesn't have BLOBs force the table format to DYNAMIC or FIXED with the ROW_FORMAT=# table option. In the future you will be able to compress/decompress tables by specifying ROW_FORMAT=compressed | default to ALTER TABLE. 「6.5.3 CREATE TABLE 構文」節参照.

7.1.2.1 Static (Fixed-length) Table Characteristics

This is the default format. It's used when the table contains no VARCHAR, BLOB, or TEXT columns.

このフォーマットは、最も単純、かつ、安全なフォーマットです。これは, Disk 上に作られるテーブルの中で、最も速いフォーマットでもあります。これはディスク上のデータを見つけやすいからです。 When looking up something with an index and static format it is very simple. Just multiply the row number by the row length.

Also, when scanning a table it is very easy to read a constant number of records with each disk read.

The security is evidenced if your computer crashes when writing to a fixed-size MyISAM file, in which case myisamchk can easily figure out where each row starts and ends. So it can usually reclaim all records except the partially written one. Note that in MySQL all indexes can always be reconstructed:

全ての CHAR, NUMERIC, DECIMAL フィールドは、そのフィールド長に足りない部分にはスペースが埋められます。
とても速い
キャッシュしやすい。
クラッシュの後再構築しやすい。なぜならレコードが固定された位置に割り当てられているから。
Doesn't have to be reorganised (with myisamchk) unless a huge number of records are deleted and you want to return free disk space to the operating system.
通常、動的テーブルよりも多くのディスク容量が必要。

7.1.2.2 Dynamic Table Characteristics

This format is used if the table contains any VARCHAR, BLOB, or TEXT columns or if the table was created with ROW_FORMAT=dynamic.

この形式は少し複雑です。なぜならそれぞれのレコードが、レコードがどのぐらいの長さを持っているかを記録するヘッダーを持っているからです。 One record can also end up at more than one location when it is made longer at an update.

OPTIMIZE table か myisamchk を使用して、テーブルのフラグメンテーションを修正することが可能です。 If you have static data that you access/change a lot in the same table as some VARCHAR or BLOB columns, it might be a good idea to move the dynamic columns to other tables just to avoid fragmentation:

全ての文字フィールドが動的になります（ただし4byteより短い物は除きます）
それぞれのレコードの先頭には、フィールドの状態を表すビット・マップがきます。このビットは、どの文字型フィールドが空文字('')なのか、どの数値フィールドがゼロなのかを示します。 (これはフィールドの値が NULL 値とは違います)。もし、文字型フィールドの文字列の長さ(後に続く空白は取り除かれる)がゼロであったり、あるいは、数値フィールドの値がゼロであった場合は、そのフィールドはビット・マップにマークされ、値はディスクには保存されません。空文字ではない場合は、文字列のバイト数がビット・マップに記録され、文字列自身がフィールドに保存されます。
通常、固定長のテーブルよりも少ないディスク容量ですみます
それぞれのレコードは、要求されただけのスペースを使用します。もしあるレコードが大きくなると、要求された分、それを多くの断片に分けます。この結果、レコードのフラグメンテーションが発生します。
If you update a row with information that extends the row length, the row will be fragmented. In this case, you may have to run myisamchk -r from time to time to get better performance. Use myisamchk -ei tbl_name for some statistics.
Not as easy to reconstruct after a crash, because a record may be fragmented into many pieces and a link (fragment) may be missing.
期待されるレコードの長さは：
```
3
+ (フィールド数 + 7) / 8
+ (char フィールドの数)
+ 数値フィールドをパックしたサイズ
+ 文字の長さ
+ (NULL フィールドの数 + 7) / 8
```
There is a penalty of 6 bytes for each link. A dynamic record is linked whenever an update causes an enlargement of the record. Each new link will be at least 20 bytes, so the next enlargement will probably go in the same link. If not, there will be another link. You may check how many links there are with myisamchk -ed. All links may be removed with myisamchk -r.

7.1.2.3 圧縮テーブルの特徴

これは、myisampack ツール (pack_isam は ISAM テーブル用)に

全ての MySQL ディストリビューション ( MySQL が GPL になる以前の物も含) は、 myisampack で圧縮されたテーブルを読むことが可能です。
圧縮テーブルはディスクスペースを節約します。これは disk usage を少なくするので、遅いディスク(CD-ROM など)にはとても好都合です。
それぞれのレコード別に圧縮されます(アクセスのオーバーヘッドがとても少ない)。レコードのヘッダは固定で(1-3 Bytes)、テーブル中の最大の大きさのレコードに依存します。 Each column is compressed differently. Some of the compression types are:
- There is usually a different Huffman table for each column.
- Suffix space compression.
- Prefix space compression.
- Numbers with value 0 are stored using 1 bit.
- もし整数型のフィールドが小さな値に収まっているならば、そのフィードは、できる限り小さなフィールド型に保存されます。例えば、BIGINT フィールド(8bytes)の値全てが 0 から 255 に収まっていれば、TINYINT フィールド (1 byte) に保存されます。
- If a column has only a small set of possible values, the column type is converted to ENUM.
- A column may use a combination of the above compressions.
固定長、可変長のレコードを扱うことができます
myisamchk で、伸長できます。

7.1.3 `MyISAM` table problems.

The file format that MySQL uses to store data has been extensively tested, but there are always circumstances that may cause database tables to become corrupted.

7.1.3.1 壊れた `MyISAM` テーブル

いくら MyISAM テーブルがとても信頼できるとは言え(SQL文を返す前にテーブルへの変更を全て書き出す)、以下のような事が起きた場合には、テーブルが壊れるかもしれません：

mysqld プロセスが書き込み途中で kill された
コンピュータの不意のシャットダウン(例えば電源断)
ハードウェアのエラー
稼働中のテーブルに対して外部プログラム(myisamchkのような)を使用
MySQL か MyISAM コードのソフトウェアのバグ

壊れたテーブルの、典型的な症状は：

そのテーブルからデータを選択中に Incorrect key file for table: '...'. Try to repair it エラーが出る
クエリがテーブル内のレコードを見つけれなかったり、不完全なデータを返したり。

CHECK TABLE 命令を使用して、テーブルが OK か否かを確かめることができます。「4.4.4 CHECK TABLE 構文 (3.23.13以上)」節参照.

REPAIR TABLE で不正なテーブルを修復することが可能です. 「4.4.5 REPAIR TABLE 構文 (3.23.14以上)」節参照. mysqld が動作していないときに、 myisamchk コマンドを使用してテーブルを修復することも可能です。 myisamchk syntax.

もし多くの不整合がテーブルに起きたならば、その理由を探すべきです！「A.4.1 What To Do If MySQL Keeps Crashing」節参照.

In this case the most important thing to know is if the table got corrupted if the mysqld died (one can easily verify this by checking if there is a recent row restarted mysqld in the mysqld error file). If this isn't the case, then you should try to make a test case of this. 「E.1.6 Making a Test Case When You Experience Table Corruption」節参照.

7.1.3.2 Clients is using or hasn't closed the table properly

Each MyISAM `.MYI' file has in the header a counter that can be used to check if a table has been closed properly.

If you get the following warning from CHECK TABLE or myisamchk:

# clients is using or hasn't closed the table properly

this means that this counter has come out of sync. This doesn't mean that the table is corrupted, but means that you should at least do a check on the table to verify that it's okay.

The counter works as follows:

The first time a table is updated in MySQL, a counter in the header of the index files is incremented.
The counter is not changed during further updates.
When the last instance of a table is closed (because of a FLUSH or because there isn't room in the table cache) the counter is decremented if the table has been updated at any point.
When you repair the table or check the table and it was okay, the counter is reset to 0.
To avoid problems with interaction with other processes that may do a check on the table, the counter is not decremented on close if it was 0.

In other words, the only ways this can go out of sync are:

The MyISAM tables are copied without a LOCK and FLUSH TABLES.
MySQL has crashed between an update and the final close. (Note that the table may still be okay, as MySQL always issues writes for everything between each statement.)
Someone has done a myisamchk --recover or myisamchk --update-stateon a table that was in use by mysqld.
Many mysqld servers are using the table and one has done a REPAIR or CHECK of the table while it was in use by another server. In this setup the CHECK is safe to do (even if you will get the warning from other servers), but REPAIR should be avoided as it currently replaces the datafile with a new one, which is not signaled to the other servers.

7.2 `MERGE` Tables (3.23.25以上)

MERGE tables are new in MySQL Version 3.23.25. The code is still in gamma, but should be resonable stable.

MERGE テーブルは(MRG_MyISAM テーブルとしても知られています) 同一の MyISAM をひとまとめにして使用することができるテーブルです。そのテーブルの集合体に対しては、SELECT, DELETE, UPDATE のみが可能です。もし MERGE テーブルを DROP すると、MERGE の定義だけが破棄されます。

Note that DELETE FROM merge_table used without a WHERE will only clear the mapping for the table, not delete everything in the mapped tables. (We plan to fix this in 4.1).

'identical tables' とは、全てのテーブルが同一のフィールド構造とキーの情報を有するという意味で使用しています。 You can't merge tables in which the columns are packed differently, doesn't have exactly the same columns, or have the keys in different order. However, some of the tables can be compressed with myisampack. 「4.7.4 myisampack, MySQL の圧縮された読み込み専用テーブルジェネレータ」節参照.

MERGE テーブルを作成すると、.frm テーブル定義ファイルと .MRG テーブルリストファイルが作成されます。 .MRG はインデックスファイル (.MYI ファイル) のリスト(これらは一つとして扱われる)だけが含まれます。全ての使用されるテーブルは、同じデータベース内に存在しなければなりません(MERGEテーブルも同様)。

For the moment, you need to have SELECT, UPDATE, and DELETE privileges on the tables you map to a MERGE table.

MERGE テーブルは以下の問題の解決を助けます:

ログテーブルの管理を安易にします。 For example, you can put data from different months into separate files, compress some of them with myisampack, and then create a MERGE to use these as one.
より速く。 You can split a big read-only table based on some criteria and then put the different table part on different disks. A MERGE table on this could be much faster than using the big table. (You can, of course, also use a RAID to get the same kind of benefits.)
Do more efficient searches. If you know exactly what you are looking after, you can search in just one of the split tables for some queries and use a MERGE table for others. You can even have many different MERGE tables active, with possible overlapping files.
More efficient repairs. It's easier to repair the individual files that are mapped to a MERGE file than trying to repair a really big file.
Instant mapping of many files as one. A MERGE table uses the index of the individual tables. It doesn't need to maintain an index of its one. This makes MERGE table collections VERY fast to make or remap. Note that you must specify the key definitions when you create a MERGE table!.
If you have a set of tables that you join to a big table on demand or batch, you should instead create a MERGE table on them on demand. This is much faster and will save a lot of disk space.
Go around the file-size limit for the operating system.
You can create an alias/synonym for a table by just using MERGE over one table. There shouldn't be any really notable performance impacts of doing this (only a couple of indirect calls and memcpy() calls for each read).

MERGE の欠点は：

MERGE には、同一構造の MyISAM テーブルしか使用できません。
ver.4.0.3未満では AUTO_INCREMENT フィールドは、INSERT 時に、自動で更新されません。
REPLACE は動作しません。
MERGE テーブルはより多くのファイル・デスクリプタを使用します。 If you are using a MERGE table that maps over 10 tables and 10 users are using this, you are using 10*10 + 10 file descriptors. (10 datafiles for 10 users and 10 shared index files.)
キーの読み込みが遅い。 When you do a read on a key, the MERGE storage engine will need to issue a read on all underlying tables to check which one most closely matches the given key. If you then do a "read-next" then the MERGE storage engine will need to search the read buffers to find the next key. Only when one key buffer is used up, the storage engine will need to read the next key block. This makes MERGE keys much slower on eq_ref searches, but not much slower on ref searches. 「5.2.1 EXPLAIN 構文 (SELECTについての情報を得る)」節参照.
You can't do DROP TABLE, ALTER TABLE, DELETE FROM table_name without a WHERE clause, REPAIR TABLE, TRUNCATE TABLE, OPTIMIZE TABLE, or ANALYZE TABLE on any of the table that is mapped by a MERGE table that is "open". If you do this, the MERGE table may still refer to the original table and you will get unexpected results. The easiest way to get around this deficiency is to issue the FLUSH TABLES command, ensuring no MERGE tables remain "open".

MERGE テーブルを作るときは、UNION(list-of-tables) に、どのテーブルを一つのようにして扱うかを指定する必要があります。 Optionally you can specify with INSERT_METHOD if you want insert for the MERGE table to happen in the first or last table in the UNION list. もし INSERT_METHOD や NO を指定しなかったなら、 MERGE テーブルに対する全ての INSERT コマンドはエラーになります。

The following example shows you how to use MERGE tables:

CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY, message CHAR(20));
CREATE TABLE t2 (a INT AUTO_INCREMENT PRIMARY KEY, message CHAR(20));
INSERT INTO t1 (message) VALUES ("Testing"),("table"),("t1");
INSERT INTO t2 (message) VALUES ("Testing"),("table"),("t2");
CREATE TABLE total (a INT AUTO_INCREMENT PRIMARY KEY, message CHAR(20))
             TYPE=MERGE UNION=(t1,t2) INSERT_METHOD=LAST;

直接、手動で `.MRG' ファイルを作ることも可能です：

shell> cd /mysql-data-directory/current-database
shell> ls -1 t1.MYI t2.MYI > total.MRG
shell> mysqladmin flush-tables

Now you can do things like:

mysql> SELECT * FROM total;
+---+---------+
| a | message |
+---+---------+
| 1 | Testing |
| 2 | table   |
| 3 | t1      |
| 1 | Testing |
| 2 | table   |
| 3 | t2      |
+---+---------+

Note that the a column, though declared as PRIMARY KEY, is not really unique, as MERGE table cannot enforce uniqueness over a set of underlying MyISAM tables.

To remap a MERGE table you can do one of the following:

DROP the table and re-create it
Use ALTER TABLE table_name UNION(...)
Change the `.MRG' file and issue a FLUSH TABLE on the MERGE table and all underlying tables to force the storage engine to read the new definition file.

7.2.1 `MERGE` Table Problems

The following are the known problems with MERGE tables:

A MERGE table cannot maintain UNIQUE constraints over the whole table. When you do INSERT, the data goes into the first or last table (according to INSERT_METHOD=xxx) and this MyISAM table ensures that the data are unique, but it knows nothing about others MyISAM tables.
DELETE FROM merge_table used without a WHERE will only clear the mapping for the table, not delete everything in the mapped tables.
RENAME TABLE on a table used in an active MERGE table may corrupt the table. This will be fixed in MySQL 4.0.x.
Creation of a table of type MERGE doesn't check if the underlying tables are of compatible types. If you use MERGE tables in this fashion, you are very likely to run into strange problems.
If you use ALTER TABLE to first add an UNIQUE index to a table used in a MERGE table and then use ALTER TABLE to add a normal index on the MERGE table, the key order will be different for the tables if there was an old non-unique key in the table. This is because ALTER TABLE puts UNIQUE keys before normal keys to be able to detect duplicate keys as early as possible.
The range optimiser can't yet use MERGE table efficiently and may sometimes produce non-optimal joins. This will be fixed in MySQL 4.0.x.
DROP TABLE on a table that is in use by a MERGE table will not work on Windows because the MERGE storage engine does the table mapping hidden from the upper layer of MySQL. Because Windows doesn't allow you to drop files that are open, you first must flush all MERGE tables (with FLUSH TABLES) or drop the MERGE table before dropping the table. We will fix this at the same time we introduce VIEWs.

7.3 `ISAM` Tables

You can also use the deprecated ISAM table type. This will disappear rather soon (probably in MySQL 5.0) because MyISAM is a better implementation of the same thing. ISAM uses a B-tree index. The index is stored in a file with the `.ISM' extension, and the data is stored in a file with the `.ISD' extension. You can check/repair ISAM tables with the isamchk utility. 「4.4.6.7 Using myisamchk for Crash Recovery」節参照.

ISAM has the following features/properties:

Compressed and fixed-length keys
Fixed and dynamic record length
16 keys with 16 key parts/key
Max key length 256 (default)
Data is stored in machine format; this is fast, but is machine/OS dependent.

Most of the things true for MyISAM tables are also true for ISAM tables. 「7.1 MyISAM テーブル (3.23.0以上)」節参照. The major differences compared to MyISAM tables are:

ISAM tables are not binary portable across OS/Platforms.
Can't handle tables > 4G.
Only support prefix compression on strings.
Smaller key limits.
Dynamic tables get more fragmented.
Tables are compressed with pack_isam rather than with myisampack.

If you want to convert an ISAM table to a MyISAM table so that you can use utilities such as mysqlcheck, use an ALTER TABLE statement:

mysql> ALTER TABLE tbl_name TYPE = MYISAM;

The embedded MySQL versions doesn't support ISAM tables.

7.4 `HEAP` Tables (3.23.0以上)

HEAP テーブルは hashed index を使用し、メモリ内に保存されます。これはとても速くなりますが、もし MySQL がクラッシュする時にはその内容が全て失われます。 HEAP はテンポラリのテーブルとしてはとても便利です！

MySQL 内部 HEAP テーブルは、100% ダイナッミック・ハッシングを使用しています（オーバーフローエリア無しに）。 There is no extra space needed for free lists. HEAP tables also don't have problems with delete + inserts, which normally is common with hashed tables:

mysql> CREATE TABLE test TYPE=HEAP SELECT ip,SUM(downloads) AS down
    ->                   FROM log_table GROUP BY ip;
mysql> SELECT COUNT(ip),AVG(down) FROM test;
mysql> DROP TABLE test;

Here are some things you should consider when you use HEAP tables:

You should always use specify MAX_ROWS in the CREATE statement to ensure that you accidentally do not use all memory.
Indexes will only be used with = and <=> (but are VERY fast).
HEAP tables can only use whole keys to search for a row; compare this to MyISAM tables where any prefix of the key can be used to find rows.
HEAP テーブルは固定長のレコードです。
HEAP は BLOB/TEXT 型をサポートしません。
HEAP は AUTO_INCREMENT フィールドをサポートしません。
MySQL 4.0.2 以前では、HEAP は NULL が許されているフィールドにインデックスを張れません。
HEAP テーブルに、一意ではないキーを持たせることが可能です。 (this isn't common for hashed tables).
HEAP テーブルは全てのクライアントに共有されます (他のテーブルと同じように)
You can't search for the next entry in order (that is, to use the index to do an ORDER BY).
Data for HEAP tables are allocated in small blocks. The tables are 100% dynamic (on inserting). No overflow areas and no extra key space are needed. Deleted rows are put in a linked list and are reused when you insert new data into the table.
You need enough extra memory for all HEAP tables that you want to use at the same time.
メモリを解放するには, DELETE FROM heap_table, TRUNCATE heap_table , DROP TABLE heap_table を実行すべきです.
MySQL cannot find out approximately how many rows there are between two values (this is used by the range optimiser to decide which index to use). This may affect some queries if you change a MyISAM table to a HEAP table.
To ensure that you accidentally don't do anything foolish, you can't create HEAP tables bigger than max_heap_table_size.

HEAP テーブル内の１レコードに必要とされるメモリは：

SUM_OVER_ALL_KEYS(max_length_of_key + sizeof(char*) * 2)
+ ALIGN(length_of_row+1, sizeof(char*))

sizeof(char*) is 4 on 32-bit machines and 8 on 64-bit machines.

7.5 `InnoDB` テーブル (3.23.6以上)

7.5.1 InnoDB テーブル概要

InnoDB provides MySQL with a transaction-safe (ACID compliant) storage engine with commit, rollback, and crash recovery capabilities. InnoDB does locking on row level and also provides an Oracle-style consistent non-locking read in SELECTs. These features increase multiuser concurrency and performance. There is no need for lock escalation in InnoDB, because row level locks in InnoDB fit in very small space. InnoDB テーブルは MySQL のテーブルタイプの中では初めて、 FOREIGN KEY 制約をサポートします。

InnoDB has been designed for maximum performance when processing large data volumes. Its CPU efficiency is probably not matched by any other disk-based relational database engine.

Technically, InnoDB is a complete database backend placed under MySQL. InnoDB has its own buffer pool for caching data and indexes in main memory. InnoDB stores its tables and indexes in a tablespace, which may consist of several files. This is different from, for example, MyISAM tables where each table is stored as a separate file. InnoDB tables can be of any size also on those operating systems where file-size is limited to 2 GB.

最新の InnoDB についての情報は http://www.innodb.com/. The most up-to-date version of the InnoDB manual is always placed there, and you can also order commercial licenses and support for InnoDB.

InnoDB is currently (October 2001) used in production at several large database sites requiring high performance. The famous Internet news site Slashdot.org runs on InnoDB. Mytrix, Inc. stores over 1 TB of data in InnoDB, and another site handles an average load of 800 inserts/updates per second in InnoDB.

InnoDB はバージョン 3.23.34a から、 MySQL のソース配布に含まれるようになり、そして、MySQL-max バイナリで有効になりました。 For Windows the -Max binaries are contained in the standard distribution.

If you have downloaded a binary version of MySQL that includes support for InnoDB, simply follow the instructions of the MySQL manual for installing a binary version of MySQL. If you already have MySQL-3.23 installed, then the simplest way to install MySQL -Max is to replace the server executable `mysqld' with the corresponding executable in the -Max distribution. MySQL and MySQL -Max differ only in the server executable. 「2.2.10 Installing a MySQL Binary Distribution」節参照.

InnoDB をサポートするように MySQL をコンパイルするには、MySQL-3.23.34a 以上をダウンロードし、 --with-innodb オプションで MySQL を configure します。「2.3 MySQL ソースディストリビューションのインストール」節参照.

cd /path/to/source/of/mysql-3.23.37
./configure --with-innodb

InnoDB を使用するためには、`my.cnf' か `my.ini' ファイルに InnoDB 起動オプションを指定しなくてはなりません。必要最低限の変更は、 [mysqld] セクションに以下のラインを加えることです

innodb_data_file_path=ibdata:30M

but to get good performance it is best that you specify options as recommended. 「7.5.2 InnoDB 起動オプション」節参照.

InnoDB is distributed under the GNU GPL License Version 2 (of June 1991). In the source distribution of MySQL, InnoDB appears as a subdirectory.

7.5.2 InnoDB 起動オプション

MySQL-3.23.37 から、オプションの接頭語が innobase_... から innodb_... に変わりました！

MySQL-Max-3.23 で InnoDB テーブルを使用するためには、設定ファイル `my.cnf' (Windows では `my.ini') の [mysqld] セクションに設定をしなくてはなりません。「4.1.2 `my.cnf' オプションファイル」節参照.

+ At the minimum, in 3.23 you must specify innodb_data_file_path + where you specify the names and the sizes of datafiles. If you do + not mention innodb_data_home_dir in `my.cnf' the default + is to create these files to the datadir of MySQL. + If you specify innodb_data_home_dir as an empty string, + then you can give absolute paths to your data files in + innodb_data_file_path. In MySQL-4.0 you do not need to specify even innodb_data_file_path: the default for it is to create + an auto-extending 10 MB file `ibdata1' to the datadir of MySQL. (In MySQL-4.0.0 and 4.0.1 the datafile is 64 MB and not auto-extending.) + If you don't want to use InnoDB tables, you can add the + skip-innodb option to your MySQL option file. +

ver.3.23 では、InnoDB を使用するために唯一必要なパラメタはデータファイルの大きさを指定する innodb_data_file_path です。 If you do not mention innodb_data_home_dir in `my.cnf' the default is to create these files to the datadir of MySQL. If you specify innodb_data_home_dir as an empty string, then you can give absolute paths to your data files in innodb_data_file_path. In MySQL-4.0 you do not need to specify even innodb_data_file_path: the default for it is to create an auto-extending 10 MB file `ibdata1' to the datadir of MySQL. (In MySQL-4.0.0 and 4.0.1 the datafile is 64 MB and not auto-extending.)

If you don't want to use InnoDB tables, you can add the skip-innodb option to your MySQL option file.

他のオプションはよりよいパフォーマンスを得たい場合に設定します。

Starting from versions 3.23.50 and 4.0.2 InnoDB allows the last datafile on the innodb_data_file_path line to be specified as auto-extending. The syntax for innodb_data_file_path is then the following:

pathtodatafile:sizespecification;pathtodatafile:sizespecification;...
...  ;pathtodatafile:sizespecification[:autoextend[:max:sizespecification]]

If you specify the last datafile with the autoextend option, InnoDB will extend the last datafile if it runs out of free space in the tablespace. The increment is 8 MB at a time. An example:

innodb_data_home_dir =
innodb_data_file_path = /ibdata/ibdata1:100M:autoextend

instructs InnoDB to create just a single datafile whose initial size is 100 MB and which is extended in 8 MB blocks when space runs out. If the disk becomes full you may want to add another data file to another disk, for example. Then you have to look the size of `ibdata1', round the size downward to the closest multiple of 1024 * 1024 bytes (= 1 MB), and specify the rounded size of `ibdata1' explicitly in innodb_data_file_path. After that you can add another datafile:

innodb_data_home_dir =
innodb_data_file_path = /ibdata/ibdata1:988M;/disk2/ibdata2:50M:autoextend

Be cautious on filesystems where the maximum file-size is 2 GB! InnoDB is not aware of the OS maximum file-size. On those filesystems you might want to specify the max size for the datafile:

innodb_data_home_dir =
innodb_data_file_path = /ibdata/ibdata1:100M:autoextend:max:2000M

A simple `my.cnf' example. Suppose you have a computer with 128 MB RAM and one hard disk. Below is an example of possible configuration parameters in `my.cnf' or `my.ini' for InnoDB. We assume you are running MySQL-Max-3.23.50 or later, or MySQL-4.0.2 or later. This example suits most users, both on Unix and Windows, who do not want to distribute InnoDB datafiles and log files on several disks. This creates an auto-extending data file `ibdata1' and two InnoDB log files `ib_logfile0' and `ib_logfile1' to the datadir of MySQL (typically `/mysql/data'). Also the small archived InnoDB log file `ib_arch_log_0000000000' ends up in the datadir.

[mysqld]
# You can write your other MySQL server options here
# ...
#                                  Data file(s) must be able to
#                                  hold your data and indexes.
#                                  Make sure you have enough
#                                  free disk space.
innodb_data_file_path = ibdata1:10M:autoextend
#                                  Set buffer pool size to
#                                  50 - 80 % of your computer's
#                                  memory
set-variable = innodb_buffer_pool_size=70M
set-variable = innodb_additional_mem_pool_size=10M
#                                  Set the log file size to about
#                                  25 % of the buffer pool size
set-variable = innodb_log_file_size=20M
set-variable = innodb_log_buffer_size=8M
#                                  Set ..flush_log_at_trx_commit
#                                  to 0 if you can afford losing
#                                  some last transactions 
innodb_flush_log_at_trx_commit=1

Check that the MySQL server has the rights to create files in datadir.

Note that datafiles must be < 2G in some file systems! The combined size of the log files must be < 4G. The combined size of datafiles must be >= 10 MB.

When you for the first time create an InnoDB database, it is best that you start the MySQL server from the command prompt. Then InnoDB will print the information about the database creation to the screen, and you see what is happening. See below next section what the printout should look like. For example, in Windows you can start `mysqld-max.exe' with:

your-path-to-mysqld>mysqld-max --console

Where to put `my.cnf' or `my.ini' in Windows? The rules for Windows are the following:

Only one of `my.cnf' or `my.ini' should be created.
The `my.cnf' file should be placed in the root directory of the drive `C:'.
The `my.ini' file should be placed in the WINDIR directory, e.g, `C:\WINDOWS' or `C:\WINNT'. You can use the SET command of MS-DOS to print the value of WINDIR.
If your PC uses a boot loader where the `C:' drive is not the boot drive, then your only option is to use the `my.ini' file.

Where to specify options in Unix? On Unix `mysqld' reads options from the following files, if they exist, in the following order:

`/etc/my.cnf' Global options.
`COMPILATION_DATADIR/my.cnf' Server-specific options.
`defaults-extra-file' The file specified with --defaults-extra-file=....
`~/.my.cnf' User-specific options.

`COMPILATION_DATADIR' is the MySQL data directory which was specified as a ./configure option when `mysqld' was compiled (typically `/usr/local/mysql/data' for a binary installation or `/usr/local/var' for a source installation).

If you are not sure from where `mysqld' reads its `my.cnf' or `my.ini', you can give the path as the first command-line option to the server: mysqld --defaults-file=your_path_to_my_cnf.

InnoDB forms the directory path to a datafile by textually catenating innodb_data_home_dir to a datafile name or path in innodb_data_file_path, adding a possible slash or backslash in between if needed. If the keyword innodb_data_home_dir is not mentioned in `my.cnf' at all, the default for it is the 'dot' directory `./' which means the datadir of MySQL.

An advanced `my.cnf' example. あなたの機械が Linux で 512M の RAM と 20GB のハードディスクだと仮定します。 (ディレクトリパスは `/', `/dr2', `/dr3') 以下はその場合の `my.cnf' の InnoDB パラメターの例です.

Note that InnoDB does not create directories: you have to create them yourself. Use the Unix or MS-DOS mkdir command to create the data and log group home directories.

[mysqld]
# You can write your other MySQL server options here
# ...
innodb_data_home_dir =
#                                  Data files must be able to
#                                  hold your data and indexes
innodb_data_file_path = /ibdata/ibdata1:2000M;/dr2/ibdata/ibdata2:2000M:autoextend
#                                  Set buffer pool size to
#                                  50 - 80 % of your computer's
#                                  memory, but make sure on Linux
#                                  x86 total memory usage is
#                                  < 2 GB
set-variable = innodb_buffer_pool_size=1G
set-variable = innodb_additional_mem_pool_size=20M
innodb_log_group_home_dir = /dr3/iblogs
#                                  .._log_arch_dir must be the same
#                                  as .._log_group_home_dir
innodb_log_arch_dir = /dr3/iblogs
set-variable = innodb_log_files_in_group=3
#                                  Set the log file size to about
#                                  15 % of the buffer pool size
set-variable = innodb_log_file_size=150M
set-variable = innodb_log_buffer_size=8M
#                                  Set ..flush_log_at_trx_commit to
#                                  0 if you can afford losing
#                                  some last transactions 
innodb_flush_log_at_trx_commit=1
set-variable = innodb_file_io_threads=4
set-variable = innodb_lock_wait_timeout=50
#innodb_flush_method=fdatasync
#innodb_fast_shutdown=1
#set-variable = innodb_thread_concurrency=5

Note that we have placed the two datafiles on different disks. InnoDB will fill the tablespace formed by the datafiles from bottom up. In some cases it will improve the performance of the database if all data is not placed on the same physical disk. Putting log files on a different disk from data is very often beneficial for performance. You can also use raw disk partitions (raw devices) as datafiles. In some Unixes they speed up I/O. See the manual section on InnoDB file space management about how to specify them in `my.cnf'.

Warning: on Linux x86 you must be careful you do not set memory usage too high. glibc will allow the process heap to grow over thread stacks, which will crash your server. It is a risk if the value of

innodb_buffer_pool_size + key_buffer +
max_connections * (sort_buffer + read_buffer_size) + max_connections * 2 MB

is close to 2 GB or exceeds 2 GB. Each thread will use a stack (often 2 MB, but in MySQL AB binaries only 256 kB) and in the worst case also sort_buffer + read_buffer_size additional memory.

How to tune other `mysqld' server parameters? Typical values which suit most users are:

skip-locking
set-variable = max_connections=200
set-variable = read_buffer_size=1M
set-variable = sort_buffer=1M
#                                  Set key_buffer to 5 - 50%
#                                  of your RAM depending on how
#                                  much you use MyISAM tables, but
#                                  keep key_buffer + InnoDB
#                                  buffer pool size < 80% of
#                                  your RAM
set-variable = key_buffer=...

Note that some parameters are given using the numeric `my.cnf' parameter format: set-variable = innodb... = 123, others (string and boolean parameters) with another format: innodb_... = ... .

The meanings of the configuration parameters are the following:

Option	Description
`innodb_data_home_dir`	全ての InnoDB データファイルの、共通トップディレクトリのパス。 If you do not mentioned this option in `my.cnf' the default is the `datadir` of MySQL. You can specify this also as an empty string, in which case you can use absolute file paths in `innodb_data_file_path`.
`innodb_data_file_path`	単一のデータファイル(individual data files)のパスとそのサイズ。 (訳注： innodb は保存すべきデータをいくつかのファイルに分割して書き込みます。この 'individual data files' は、その、実際にデータを保存している個々のファイルの事を指しています。) それぞれのデータファイルへのフルパスは、`innodb_data_home_dir` で指定されたパスと連結されることで求められます。ファイルサイズはメガバイトで与えられ、上記のように 'M' がサイズの後にきます。ファイルサイズは 4000M より大きくしてはいけません。ほとんどのオペレーティングシステムでは 2000M より大きなファイルを扱えません。 InnoDB は 'G' の省略形も認識します。 1G は 1024M になります。バージョン 3.23.44 からは、オペーレーティングシステムが large ファイルを扱えるのであれば、4GB 以上の指定が可能です。いくつかのオペレーティングシステムは一つのファイルは 2GB 未満の制約があります。データファイルのサイズの合計は、少なくとも、10MB 以上必要です。
`innodb_mirrored_log_groups`	データベースのために維持しておくログのグループのコピーの数。現在、これは 1 しかセットできません。
`innodb_log_group_home_dir`	InnoDB ログファイルのディレクトリのパス。
`innodb_log_files_in_group`	log group 内の、ログファイルの数。 InnoDB はログファイルを、ローテートするやり方で書きます。 3 が推奨値です。
`innodb_log_file_size`	log group 内の、各ログファイルの大きさ(Mega bytes)。 Sensible values range from 1M to 1/nth of the size of the buffer pool specified below, where n is the number of log files in the group. The bigger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But bigger log files also mean that recovery will be slower in case of a crash. The combined size of log files must be < 4 GB on 32-bit computers.
`innodb_log_buffer_size`	InnoDB が、disk上のログファイルにログを書き出すために使用する、バッファのサイズ。 Sensible values range from 1M to 8M. A big log buffer allows large transactions to run without a need to write the log to disk until the transaction commit. Thus, if you have big transactions, making the log buffer big will save disk I/O.
`innodb_flush_log_at_trx_commit`	Normally this is set to 1, meaning that at a transaction commit the log is flushed to disk, and the modifications made by the transaction become permanent, and survive a database crash. If you are willing to compromise this safety, and you are running small transactions, you may set this to 0 to reduce disk I/O to the logs.
`innodb_log_arch_dir`	The directory where fully written log files would be archived if we used log archiving. ＊現在のところ、これには、 `innodb_log_group_home_dir` と同じ値をセットしなくてはなりません。＊
`innodb_log_archive`	This value should currently be set to 0. As recovery from a backup is done by MySQL using its own log files, there is currently no need to archive InnoDB log files.
`innodb_buffer_pool_size`	InnoDB がデータやテーブルのインデックスををキャッシュするために使用するメモリのサイズ。大きな値をセットすると、テーブルのデータへのアクセスに必要なディスク i/o が少なくなります。データベースサーバ専用のマシンでは、このパラメタを物理メモリの 80% までセットしてもかまいません。物理メモリの競合がオペレーティングシステムのページングの原因になるかもしれないので、あまりに大きすぎる値は与えないように。
`innodb_additional_mem_pool_size`	Size of a memory pool InnoDB uses to store data dictionary information and other internal data structures. A sensible value for this might be 2M, but the more tables you have in your application the more you will need to allocate here. If InnoDB runs out of memory in this pool, it will start to allocate memory from the operating system, and write warning messages to the MySQL error log.
`innodb_file_io_threads`	Number of file I/O threads in InnoDB. Normally, this should be 4, but on Windows disk I/O may benefit from a larger number.
`innodb_lock_wait_timeout`	Timeout in seconds an InnoDB transaction may wait for a lock before being rolled back. InnoDB は自動的にトランザクションのデッド・ロックがテーブルに起きた事を感知し、そのトランザクションをロール・バックします。 If you use `LOCK TABLES` command, or other transaction-safe storage engines than InnoDB in the same transaction, then a deadlock may arise which InnoDB cannot notice. In cases like this the timeout is useful to resolve the situation.
`innodb_flush_method`	(3.23.40 以上からの機能.) The default value for this is `fdatasync`. Another option is `O_DSYNC`.

7.5.3 InnoDB テーブルの保存先の作成

あなたが MySQL を既にインストールしており、`my.cnf' には InnoDB configuration パラメタが記述されていると仮定します。 MySQL を起動する前に、あなたが指定している InnoDB データファイルとログファイルを保存するディレクトリが存在するか、そして、それらのディレクトリのパーミッションが正しいかを確認するべきです。 InnoDB はディレクトリを自動では作成できません。ファイルのみです。データとログファイルを保存するために十分なディスクの空きがあるかもチェックしてください。

MySQL を起動すると、InnoDB は data file と log file を作成します。 InnoDB は以下のようなメッセージを出力します：

~/mysqlm/sql > mysqld
InnoDB: The first specified datafile /home/heikki/data/ibdata1
did not exist:
InnoDB: a new database to be created!
InnoDB: Setting file /home/heikki/data/ibdata1 size to 134217728
InnoDB: Database physically writes the file full: wait...
InnoDB: datafile /home/heikki/data/ibdata2 did not exist:
new to be created
InnoDB: Setting file /home/heikki/data/ibdata2 size to 262144000
InnoDB: Database physically writes the file full: wait...
InnoDB: Log file /home/heikki/data/logs/ib_logfile0 did not exist:
new to be created
InnoDB: Setting log file /home/heikki/data/logs/ib_logfile0 size to 5242880
InnoDB: Log file /home/heikki/data/logs/ib_logfile1 did not exist:
new to be created
InnoDB: Setting log file /home/heikki/data/logs/ib_logfile1 size to 5242880
InnoDB: Log file /home/heikki/data/logs/ib_logfile2 did not exist:
new to be created
InnoDB: Setting log file /home/heikki/data/logs/ib_logfile2 size to 5242880
InnoDB: Started
mysqld: ready for connections

新しく InnoDB データベースが、これで作成されました。 mysql のような MySQL クライアントを使用して、MySQL サーバに接続することが可能です。 `mysqladmin shutdown' で MySQL をシャットダウンしたときには、 InnoDB は以下のようなメッセージを出力します：

010321 18:33:34  mysqld: Normal shutdown
010321 18:33:34  mysqld: Shutdown Complete
InnoDB: Starting shutdown...
InnoDB: Shutdown completed

You can now look at the datafiles and logs directories and you will see the files created. ログディレクトリには、`ib_arch_log_0000000000' という名前の、小さなファイルが含まれているはずです。 That file resulted from the database creation, after which InnoDB switched off log archiving. MySQL が次回に起動したとき、出力は次のようになります：

~/mysqlm/sql > mysqld
InnoDB: Started
mysqld: ready for connections

7.5.3.1 InnoDB データベース作成に失敗した場合

If InnoDB prints an operating system error in a file operation, usually the problem is one of the following:

You did not create InnoDB data or log directories.
`mysqld' does not have the rights to create files in those directories.
`mysqld' does not read the right `my.cnf' or `my.ini' file, and consequently does not see the options you specified.
The disk is full or a disk quota is exceeded.
You have created a subdirectory whose name is equal to a datafile you specified.
There is a syntax error in innodb_data_home_dir or innodb_data_file_path.

もし InnoDB database 作成時になにか問題が起きたならば、 InnoDB によって作成された全てのファイルを削除すべきです。 This means all datafiles, all log files, the small archived log file, and in the case you already did create some InnoDB tables, delete also the corresponding `.frm' files for these tables from the MySQL database directories. Then you can try the InnoDB database creation again.

7.5.4 InnoDB テーブルの作成

mysql test として MySQL クライアントを実行したとします。 InnoDB 形式のテーブルを作成するためには、あなたは SQL コマンドのテーブル作成文に、TYPE = InnoDB を指定しなくてはなりません。

CREATE TABLE CUSTOMER (A INT, B CHAR (20), INDEX (A)) TYPE = InnoDB;

この SQL コマンドは、`my.cnf' で定義された InnoDB のテーブル空間に存在するデータファイル内に、一つのテーブルと、一つのインデックス (A フィールドに張られた)を作成します。 MySQL は `CUSTOMER.frm' ファイルを MuSQL データベースディレクトリ `test' に作成します。内部では、InnoDB は、InnoDB 自身のデータディレクトリをもち、そこに 'test/CUSTOMER' テーブルのエントリを追加します。よって、MySQL 内の違うデータベース内に、同じ CUSTOMER という名前を持つテーブルを作成することが可能で、もちろんこの名前は、 InnoDB 内でも他とは衝突しません。

MySQL の table status コマンドを使用して、TYPE = InnoDB で作成したテーブル全てに対して、InnoDB のテーブル空間の未使用量がどれくらいあるかを出すことができます。テーブル空間の未使用領域の総量は、SHOW で出力されたテーブルの comment セクションに現われます。例：

SHOW TABLE STATUS FROM test LIKE 'CUSTOMER'

SHOW を使用して得られた InnoDB のテーブルの情報は概算です；それらは SQL オプティマイゼイションで使用されます。ただし、テーブルとインデックスに割り当てられているサイズ(bytes)は正確です。

7.5.4.1 MyISAM テーブルを InnoDB テーブルに変換

InnoDB does not have a special optimization for separate index creation. Therefore it does not pay to export and import the table and create indexes afterwards. 素早くテーブルを InnoDB に変換する方法は、 InnoDB に直接 insert することです。テーブルの変換には、ALTER TABLE ... TYPE=INNODB を使用するか、空の InnoDB テーブルを作成してデータを INSERT INTO ... SELECT * FROM ... でインサートします。

insert 時においては、大きなテーブルの場合は、いくつかにテーブルを分割して行うとよいでしょう：

INSERT INTO newtable SELECT * FROM oldtable
   WHERE yourkey > something AND yourkey <= somethingelse;

全てのデータが挿入できたら、テーブル名を変更します。

大きなテーブルを変換している間は、InnoDB のバッファプールサイズを大きくすることで、disk I/O を軽減できます。ただし、物理メモリの 80% 以上をバッファに割り当てないように。 InnoDB log ファイルのサイズもログのバッファも大きく取るべきです。

確実に、テーブル空間より大きくならないようにします； InnoDB テーブルは MyISAM テーブルよりも多くの disk を使用します。もし ALTER TABLE が取っている領域よりも超えた場合には、ロール・バックが開始されます。これはディスク・バウンドのために (大きなファイルならば) 数時間かかるかもしれません。インサートする場合には、InnoDB はインサート・バッファを使用します。これはセカンダリ・インデックス・レコードをインデックスに一度にマージさせるために使用されます。このため多くの disk I/O が節約されます。ロール・バックにおいては、インサートで使用されるようなメカニズムは無く、ロール・バックは、インサートの約30倍、時間がかかります。

このような手に負えないロール・バックのばあい、もし、価値のあるデータがあなたの InnoDB データベース内に一つもなければ、データベースのプロセスをキルして全ての InnoDB データファイルとログファイルを消去し、全ての InnoDB テーブルの `.frm' ファイルを消し、そしてもう一度試みる方が良いでしょう。とてつもなく多くの disk I/O が完了するのを待つよりも。

7.5.4.2 外部キー制約 (Foreign Key Constraints) (3.23.44以上)

InnoDB バージョン 3.23.43b 以上では foreign key constraints を持ちます. InnoDB is the first MySQL table type which allows you to define foreign key constraints to guard the integrity of your data.

The syntax of a foreign key constraint definition in InnoDB:

[CONSTRAINT symbol] FOREIGN KEY (index_col_name, ...)
                  REFERENCES table_name (index_col_name, ...)
                  [ON DELETE {CASCADE | SET NULL | NO ACTION
                              | RESTRICT}]
                  [ON UPDATE {CASCADE | SET NULL | NO ACTION
                              | RESTRICT}]

Both tables have to be InnoDB type and there must be an index where the foreign key and the referenced key are listed as the FIRST columns. InnoDB does not auto-create indexes on foreign keys or referenced keys: you have to create them explicitly.

Corresponding columns in the foreign key and the referenced key must have similar internal data types inside InnoDB so that they can be compared without a type conversion. The size and the signedness of integer types has to be the same. The length of string types need not be the same. If you specify a SET NULL action, make sure you have not declared the columns in the child table NOT NULL.

If MySQL gives the error number 1005 from a CREATE TABLE statement, and the error message string refers to errno 150, then the table creation failed because a foreign key constraint was not correctly formed. Similarly, if an ALTER TABLE fails and it refers to errno 150, that means a foreign key definition would be incorrectly formed for the altered table.

Starting from version 3.23.50 you can also associate the ON DELETE CASCADE or ON DELETE SET NULL clause with the foreign key constraint. Starting from version 4.0.8 you can use also similar ON UPDATE actions.

If ON DELETE CASCADE is specified, and a row in the parent table is deleted, then InnoDB automatically deletes also all those rows in the child table whose foreign key values are equal to the referenced key value in the parent row. If ON DELETE SET NULL is specified, the child rows are automatically updated so that the columns in the foreign key are set to the SQL NULL value.

Starting from version 3.23.50, InnoDB does not check foreign key constraints on those foreign key or referenced key values which contain a NULL column.

Starting from version 3.23.50 the InnoDB parser allows you to use backquotes (`) around table and column names in the FOREIGN KEY ... REFERENCES ... clause but the InnoDB parser is not yet aware of the option lower_case_table_names you can specify in `my.cnf'.

例:

CREATE TABLE parent(id INT NOT NULL, PRIMARY KEY (id)) TYPE=INNODB;
CREATE TABLE child(id INT, parent_id INT, INDEX par_ind (parent_id),
             FOREIGN KEY (parent_id) REFERENCES parent(id)
             ON DELETE SET NULL
) TYPE=INNODB;

(訳注：両方のテーブルとも、InnoDB でなくてはなりません。さらに foreign key と the referenced key に指定される項目は、インデックス中の定義の中では、最初の項目として、指定されていなければなりません。上記の例では、FOREIGN KEY で指定されている V, U は、インデックスの最初の項目に指定されています。)

Starting from version 3.23.50 InnoDB allows you to add a new foreign key constraint to a table through

ALTER TABLE yourtablename
ADD [CONSTRAINT symbol] FOREIGN KEY (...) REFERENCES anothertablename(...)
[on_delete_and_on_update_actions]

Remember to create the required indexes first, though. In InnoDB versions < 3.23.50 ALTER TABLE or CREATE INDEX should not be used in connection with tables which have foreign key constraints or which are referenced in foreign key constraints: Any ALTER TABLE removes all foreign key constrainst defined for the table. You should not use ALTER TABLE to the referenced table either, but use DROP TABLE and CREATE TABLE to modify the schema. When MySQL does an ALTER TABLE it may internally use RENAME TABLE, and that will confuse the foreign key costraints which refer to the table. A CREATE INDEX statement is in MySQL processed as an ALTER TABLE, and these restrictions apply also to it.

When doing foreign key checks InnoDB sets shared row level locks on child or parent records it has to look at. InnoDB checks foreign key constraints immediately: the check is not deferred to transaction commit.

InnoDB allows you to drop any table even though that would break the foreign key constraints which reference the table. When you drop a table the constraints which were defined in its create statement are also dropped.

If you re-create a table which was dropped, it has to have a definition which conforms to the foreign key constraints referencing it. It must have the right column names and types, and it must have indexes on the referenced keys, as stated above. If these are not satisfied, MySQL returns error number 1005 and refers to errno 150 in the error message string.

Starting from version 3.23.50 InnoDB returns the foreign key definitions of a table when you call

SHOW CREATE TABLE yourtablename

Then also `mysqldump' produces correct definitions of tables to the dump file, and does not forget about the foreign keys.

You can also list the foreign key constraints for a table T with

SHOW TABLE STATUS FROM yourdatabasename LIKE 'T'

The foreign key constraints are listed in the table comment of the output.

7.5.5 InnoDB データとログの追加と削除

バージョン 3.23.50以上の3.23, 4.0.2以上では InnoDB の最後のデータファイルを autoextend に指定することが可能です。それ未満のバージョンでは、データファイルを追加することでテーブルの空間を大きくすることが可能です。テーブルの空間を大きくするためには、新しいデータファイルを追加します。これを行なうには、MySQL サーバーを一度シャットダウンし、 `my.cnf' ファイルを編集して新しいデータファイルを innodb_data_file_path に追加し、MySQL サーバを起動します。

今のところ、データファイルを InnoDB から削除することはできません。データベースのサイズを小さくするには、一度 mysqldump でテーブルをダンプし、新しくデータベースを作成し、ダンプしたテーブルを取り込みます。

もし InnoDB のログファイルのサイズを変更したいならば、 MySQL サーバを停止しなくてはなりません(エラー無しで確実に止まるようにしてください)。なにかシャットダウン時に問題があった場合には、古いログファイルを安全な場所にコピーし、データベースの修復をしましょう。そして、古いログファイルを log ファイルのディレクトリから消去し、 `my.cnf' ファイルを編集した後、MySQL サーバを起動します。 InnoDB は新しいログファイルを作成したことを告げるでしょう。

7.5.6 InnoDB データベースのバックアップと修復

The key to safe database management is taking regular backups.

InnoDB Hot Backup is an online backup tool you can use to backup your InnoDB database while it is running. InnoDB Hot Backup does not require you to shut down your database and it does not set any locks or disturb your normal database processing. InnoDB Hot Backup is a non-free additional tool which is not included in the standard MySQL distribution. See the InnoDB Hot Backup homepage http://www.innodb.com/hotbackup.html for detailed information and screenshots.

If you are able to shut down your MySQL server, then to take a 'binary' backup of your database you have to do the following:

MySQL を停止。エラー無しで確実に止まるように。
全てのデータファイルを安全な場所にコピー。
全ての InnoDB ログファイルを安全な場所にコピー。
`my.cnf' 設定ファイルを安全な場所にコピー。
InnoDB テーブルの `.frm' ファイルを安全な場所にコピー。

上のバイナリバックアップに加えて、`mysqldump' で通常のテーブルのダンプも取るべきです。この理由は、バイナリファイルはあなたの知らないところでおかしくなっているかもしれないからです。テキストファイルのダンプされたテーブルは人間が読むことができ、そしてバイナリファイルよりもずっと簡素です。ダンプされたファイルからテーブルのおかしくなった箇所を見つけるのは容易で、そして、致命的なデータの不正を少なくするチャンスでもあります。

データベースのバイナリバックアップと同時に、ダンプを取ることは、よい考えです。全テーブルを現在の状況と矛盾無くダンプするためには、全てのクライアントをデータベースからシャット・アウトするべきです。 Then you can take the binary backup, and you will then have a consistent snapshot of your database in two formats.

To be able to recover your InnoDB database to the present from the binary backup described above, you have to run your MySQL database with the general logging and log archiving of MySQL switched on. Here by the general logging we mean the logging mechanism of the MySQL server which is independent of InnoDB logs.

MySQL サーバプロセスのクラッシュからのリカバリを行なうには、 MySQL をリスタートすることがただ一つの方法です。 InnoDB は自動でログをチェックし、データベースの roll-forward を行ないます。 InnoDB は、クラッシュ時にコッミトされていないトランザクションを自動的にロールバックします。リカバリの間、InnoDB は以下のような出力をします：

~/mysqlm/sql > mysqld
InnoDB: Database was not shut down normally.
InnoDB: Starting recovery from log files...
InnoDB: Starting log scan based on checkpoint at
InnoDB: log sequence number 0 13674004
InnoDB: Doing recovery: scanned up to log sequence number 0 13739520
InnoDB: Doing recovery: scanned up to log sequence number 0 13805056
InnoDB: Doing recovery: scanned up to log sequence number 0 13870592
InnoDB: Doing recovery: scanned up to log sequence number 0 13936128
...
InnoDB: Doing recovery: scanned up to log sequence number 0 20555264
InnoDB: Doing recovery: scanned up to log sequence number 0 20620800
InnoDB: Doing recovery: scanned up to log sequence number 0 20664692
InnoDB: 1 uncommitted transaction(s) which must be rolled back
InnoDB: Starting rollback of uncommitted transactions
InnoDB: Rolling back trx no 16745
InnoDB: Rolling back of trx no 16745 completed
InnoDB: Rollback of uncommitted transactions completed
InnoDB: Starting an apply batch of log records to the database...
InnoDB: Apply batch completed
InnoDB: Started
mysqld: ready for connections

If your database gets corrupted or your disk fails, you have to do the recovery from a backup. In the case of corruption, you should first find a backup which is not corrupted. From a backup do the recovery from the general log files of MySQL according to instructions in the MySQL manual.

7.5.6.1 Checkpoints

InnoDB は fuzzy checkpoint と呼ばれるチェックポイントのメカニズムを持っています。 InnoDB will flush modified database pages from the buffer pool in small batches, there is no need to flush the buffer pool in one single batch, which would in practice stop processing of user SQL statements for a while.

クラッシュ・リカバリ時には、InnoDB はログファイルに書かれたチェックポイント・ラベルを見ます。データベースのディスク・イメージ上にそのラベルが反映させられる前に、それはデータベースの全ての変更を知っています。 It knows that all modifications to the database before the label are already present on the disk image of the database. Then InnoDB scans the log files forward from the place of the checkpoint applying the logged modifications to the database.

InnoDB はログファイルを循環して使用します。全てのコミットされた変更分(変更だから当然バッファプール内のデータベースのページとディスク上のイメージが違う)は、InnoDB がリカバリを実行しなければならない時に、ログファイルに存在しなければなりません。(でなければリカバリを行なうことができません。) これは、InnoDB がログファイルを循環的に再利用しようとした時に、ディスク上のデータベースページのイメージに、既にログファイルに記録されていた変更分が含まれていなければならないということです。言い換えれば、InnoDB はチェックポイントを作成しなければならず、そしてこれは変更されたデータベースのページをディスクにフラッシュすることをしばしば引き起こすことになります。 InnoDB writes to the log files in a circular fashion. All committed modifications which make the database pages in the buffer pool different from the images on disk must be available in the log files in case InnoDB has to do a recovery. This means that when InnoDB starts to reuse a log file in the circular fashion, it has to make sure that the database page images on disk already contain the modifications logged in the log file InnoDB is going to reuse. In other words, InnoDB has to make a checkpoint and often this involves flushing of modified database pages to disk.

上に述べた事柄は、なぜログファイルをとても大きく作成することがチェックポイント作成において disk I/O を減らすことになるのかの理由です。 The above explains why making your log files very big may save disk I/O in checkpointing. It can make sense to set the total size of the log files as big as the buffer pool or even bigger. The drawback in big log files is that crash recovery can last longer because there will be more log to apply to the database.

7.5.7 InnoDB データベースを他の機械に移動する

InnoDB データとログファイルは、もし、その機械の浮動小数点数のフォーマットが同じであれば、全プラットフォームでバイナリ互換を持ちます。 InnoDB は、単純に、関連ファイルを全てコピーすることで移動可能です。 (関連ファイルは前節で述べられています)　

InnoDB データベースは、全てのファイルを単純にコピーするだけで、移動が可能です。ファイルは、前述のデータベースのバックアップの節で述べたものです。もし浮動小数点数のフォーマットが違う機械であったとしても、テーブルに FLOAT や DOUBLE の型を使用していないのであれば、手順は同じです：すなわち単にファイルをコピーするだけです。もし浮動小数点数のフォーマットが違う機械で、テーブルに浮動小数点数を保存している場合は、 `mysqldump' と `mysqlimport' コマンドを使用してテーブルを移動しなくてはなりません。

A performance tip is to switch off auto-commit mode when you import data into your database, assuming your tablespace has enough space for the big rollback segment the big import transaction will generate. Do the commit only after importing a whole table or a segment of a table.

7.5.8 InnoDB Transaction Model

In the InnoDB transaction model the goal has been to combine the best properties of a multi-versioning database to traditional two-phase locking. InnoDB does locking on row level and runs queries by default as non-locking consistent reads, in the style of Oracle. The lock table in InnoDB is stored so space-efficiently that lock escalation is not needed: typically several users are allowed to lock every row in the database, or any random subset of the rows, without InnoDB running out of memory.

In InnoDB all user activity happens inside transactions. もし、autocommit mode が MySQL で使用されているなら、それぞれの SQL 文は単一の transaction となります。 MySQL always starts a new connection with the autocommit mode switched on.

もし autocommit mode が SET AUTOCOMMIT = 0 で off にされたなら、ユーザーは常にトランザクションが可能な状態となります。もし SQL の COMMIT or ROLLBACK 文を実行すると、現在のトランザクションは完結し、新しいトランザクションが開始します。どちらの文も全ての InnoDB のロック(カレントのトランザクションにセットされたロック)を外します。 A COMMIT means that the changes made in the current transaction are made permanent and become visible to other users. A ROLLBACK, on the other hand, cancels all modifications made by the current transaction.

If the connection has AUTOCOMMIT = 1, then the user can still perform a multi-statement transaction by starting it with BEGIN and ending it with COMMIT or ROLLBACK.

In terms of the SQL-1992 transaction isolation levels, the InnoDB default is REPEATABLE READ. Starting from version 4.0.5, InnoDB offers all 4 different transaction isolation levels described by the SQL-1992 standard. You can set the default isolation level for all connections in the [mysqld] section of `my.cnf':

transaction-isolation = {READ-UNCOMMITTED | READ-COMMITTED
                         | REPEATABLE-READ | SERIALIZABLE}

A user can change the isolation level of a single session or all new incoming connections with the

SET [SESSION | GLOBAL] TRANSACTION ISOLATION LEVEL
                       {READ UNCOMMITTED | READ COMMITTED
                        | REPEATABLE READ | SERIALIZABLE}

SQL statement. Note that there are no hyphens in level names in the SQL syntax. If you specify the keyword GLOBAL in the above statement, it will determine the initial isolation level of new incoming connections, but will not change the isolation level of old connections. Any user is free to change the isolation level of his session, even in the middle of a transaction. In versions < 3.23.50 SET TRANSACTION had no effect on InnoDB tables. In versions < 4.0.5 only REPEATABLE READ and SERIALIZABLE were available.

You can query the global and session transaction isolation levels with:

SELECT @@global.tx_isolation;
SELECT @@tx_isolation;

In row level locking InnoDB uses so-called next-key locking. That means that besides index records, InnoDB can also lock the 'gap' before an index record to block insertions by other users immediately before the index record. A next-key lock means a lock which locks an index record and the gap before it. A gap lock means a lock which only locks a gap before some index record.

A detailed description of each isolation level in InnoDB:

READ UNCOMMITTED This is also called 'dirty read': non-locking SELECTs are performed so that we do not look at a possible earlier version of a record; thus they are not 'consistent' reads under this isolation level; otherwise this level works like READ COMMITTED.
READ COMMITTED Somewhat Oracle-like isolation level. All SELECT ... FOR UPDATE and SELECT ... LOCK IN SHARE MODE statements only lock the index records, NOT the gaps before them, and thus allow free inserting of new records next to locked records. UPDATE and DELETE which use a unique index with a unique search condition, only lock the index record found, not the gap before it. But still in range type UPDATE and DELETE InnoDB must set next-key or gap locks and block insertions by other users to the gaps covered by the range. This is necessary since 'phantom rows' have to be blocked for MySQL replication and recovery to work. Consistent reads behave like in Oracle: each consistent read, even within the same transaction, sets and reads its own fresh snapshot.
REPEATABLE READ This is the default isolation level of InnoDB. SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE, UPDATE, and DELETE which use a unique index with a unique search condition, only lock the index record found, not the gap before it. Otherwise these operations employ next-key locking, locking the index range scanned with next-key or gap locks, and block new insertions by other users. In consistent reads there is an important difference from the previous isolation level: in this level all consistent reads within the same transaction read the same snapshot established by the first read. This convention means that if you issue several plain SELECTs within the same transaction, these SELECTs are consistent also with respect to each other.
SERIALIZABLE This level is like the previous one, but all plain SELECTs are implicitly converted to SELECT ... LOCK IN SHARE MODE.

7.5.8.1 Consistent Read

A consistent read means that InnoDB uses its multi-versioning to present to a query a snapshot of the database at a point in time. The query will see the changes made by exactly those transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query will see the changes made by the transaction itself which issues the query.

If you are running with the default REPEATABLE READ isolation level, then all consistent reads within the same transaction read the snapshot established by the first such read in that transaction. You can get a fresher snapshot for your queries by committing the current transaction and after that issuing new queries.

Consistent read is the default mode in which InnoDB processes SELECT statements in READ COMMITTED and REPEATABLE READ isolation levels. A consistent read does not set any locks on the tables it accesses, and therefore other users are free to modify those tables at the same time a consistent read is being performed on the table.

7.5.8.2 Locking Reads

consistent read は、いくつかの状況下では好都合ではありません。

あなたが新しいレコードを CHILD テーブルに加えたいとします。この child はすでに PARENT テーブルにある parent を確実に持っているものとします。

そしてあなたが PARENT テーブルを読むために、consistent read を使用するとします。そのテーブル内に、たしかに、その child の parent を見たとします。

この場合、その child のレコードを安全に CHILD テーブルに追加することが可能でしょうか？

答えは No 。なぜなら、あなたが気がつかない内に、他のユーザーがその parent のレコードを PARENT テーブルから消してしまうかも知れないからです。

この解決として、SELECT を locking mode で動作させる方法があります。 LOCK IN SHARE MODE.

SELECT * FROM PARENT WHERE NAME = 'Jones' LOCK IN SHARE MODE;

Performing a read in share mode means that we read the latest available data, and set a shared mode lock on the rows we read. If the latest data belongs to a yet uncommitted transaction of another user, we will wait until that transaction commits. A shared mode lock prevents others from updating or deleting the row we have read. After we see that the above query returns the parent 'Jones', we can safely add his child to table CHILD, and commit our transaction. This example shows how to implement referential integrity in your application code.

Let us look at another example: we have an integer counter field in a table CHILD_CODES which we use to assign a unique identifier to each child we add to table CHILD. Obviously, using a consistent read or a shared mode read to read the present value of the counter is not a good idea, since then two users of the database may see the same value for the counter, and we will get a duplicate key error when we add the two children with the same identifier to the table.

In this case there are two good ways to implement the reading and incrementing of the counter: (1) update the counter first by incrementing it by 1 and only after that read it, or (2) read the counter first with a lock mode FOR UPDATE, and increment after that:

SELECT COUNTER_FIELD FROM CHILD_CODES FOR UPDATE;
UPDATE CHILD_CODES SET COUNTER_FIELD = COUNTER_FIELD + 1;

A SELECT ... FOR UPDATE will read the latest available data setting exclusive locks on each row it reads. Thus it sets the same locks a searched SQL UPDATE would set on the rows.

7.5.8.3 Next-key Locking: Avoiding the Phantom Problem

row level のロッキングにおいては、InnoDB は next-key locking と呼ばれるアルゴリズムを使用します。 InnoDB does the row level locking so that when it searches or scans an index of a table, it sets shared or exclusive locks on the index records in encounters. Thus the row level locks are more precisely called index record locks.

The locks InnoDB sets on index records also affect the 'gap' before that index record. If a user has a shared or exclusive lock on record R in an index, then another user cannot insert a new index record immediately before R in the index order. This locking of gaps is done to prevent the so-called phantom problem. Suppose I want to read and lock all children with identifier bigger than 100 from table CHILD, and update some field in the selected rows.

SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;

CHILD テーブルの ID フィールドにインデックスが張られているとします。 Our query will scan that index starting from the first record where ID is bigger than 100. Now, if the locks set on the index records would not lock out inserts made in the gaps, a new child might meanwhile be inserted to the table. If now I in my transaction execute

SELECT * FROM CHILD WHERE ID > 100 FOR UPDATE;

again, I will see a new child in the result set the query returns. This is against the isolation principle of transactions: a transaction should be able to run so that the data it has read does not change during the transaction. If we regard a set of rows as a data item, then the new 'phantom' child would break this isolation principle.

When InnoDB scans an index it can also lock the gap after the last record in the index. Just that happens in the previous example: the locks set by InnoDB will prevent any insert to the table where ID would be bigger than 100.

You can use next-key locking to implement a uniqueness check in your application: if you read your data in share mode and do not see a duplicate for a row you are going to insert, then you can safely insert your row and know that the next-key lock set on the successor of your row during the read will prevent anyone meanwhile inserting a duplicate for your row. Thus the next-key locking allows you to 'lock' the non-existence of something in your table.

7.5.8.4 Locks Set by Different SQL Statements in InnoDB

SELECT ... FROM ... : this is a consistent read, reading a snapshot of the database and setting no locks.
SELECT ... FROM ... LOCK IN SHARE MODE : sets shared next-key locks on all index records the read encounters.
SELECT ... FROM ... FOR UPDATE : sets exclusive next-key locks on all index records the read encounters.
INSERT INTO ... VALUES (...) : sets an exclusive lock on the inserted row; note that this lock is not a next-key lock and does not prevent other users from inserting to the gap before the inserted row. If a duplicate key error occurs, sets a shared lock on the duplicate index record.
INSERT INTO T SELECT ... FROM S WHERE ... sets an exclusive (non-next-key) lock on each row inserted into T. Does the search on S as a consistent read, but sets shared next-key locks on S if the MySQL logging is on. InnoDB has to set locks in the latter case because in roll-forward recovery from a backup every SQL statement has to be executed in exactly the same way as it was done originally.
CREATE TABLE ... SELECT ... performs the SELECT as a consistent read or with shared locks, like in the previous item.
REPLACE is done like an insert if there is no collision on a unique key. Otherwise, an exclusive next-key lock is placed on the row which has to be updated.
UPDATE ... SET ... WHERE ... : sets an exclusive next-key lock on every record the search encounters.
DELETE FROM ... WHERE ... : sets an exclusive next-key lock on every record the search encounters.
If a FOREIGN KEY constraint is defined on a table, any insert, update, or delete which requires checking of the constraint condition sets shared record level locks on the records it looks at to check the constraint. Also in the case where the constraint fails, InnoDB sets these locks.
LOCK TABLES ... : sets table locks. In the implementation the MySQL layer of code sets these locks. The automatic deadlock detection of InnoDB cannot detect deadlocks where such table locks are involved: see the following section. Also, since MySQL does know about row level locks, it is possible that you get a table lock on a table where another user currently has row level locks. But that does not put transaction integerity into danger. 「7.5.13 InnoDB テーブルの制限」節参照.

7.5.8.5 How to cope with deadlocks?

Deadlocks are a classic problem in transactional databases, but they are not dangerous, unless they are so frequent that you cannot run certain transactions at all. Normally you have to write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.

InnoDB uses automatic row level locking. You can get deadlocks even in the case of transactions which just insert or delete a single row. That is because these operations are not really 'atomic': they automatically set locks on the (possibly several) index records of the row inserted/deleted.

You can cope with deadlocks and reduce the number of them with the following tricks:

Use SHOW INNODB STATUS in MySQL versions >= 3.23.52 and >= 4.0.3 to determine the cause of the latest deadlock. That can help you to tune your application to avoid deadlocks.
Always be prepared to re-issue a transaction if it fails in a deadlock. Deadlocks are not dangerous. Just try again.
Commit your transactions often. Small transactions are less prone to collide.
If you are using locking reads SELECT ... FOR UPDATE or ... LOCK IN SHARE MODE, try using a lower isolation level READ COMMITTED.
Access your tables and rows in a fixed order. Then transactions will form nice queues, and do not deadlock.
Add good indexes to your tables. Then your queries need to scan less index records and consequently set less locks. Use EXPLAIN SELECT to determine that MySQL picks appropriate indexes for your queries.
Use less locking: if you can afford a SELECT to return data from an old snapshot, do not add the clause FOR UPDATE or LOCK IN SHARE MODE to it. Using READ COMMITTED isolation level is good here, because each consistent read within the same transaction reads from its own fresh snapshot.
If nothing helps, serialize your transactions with table level locks: LOCK TABLES t1 WRITE, t2 READ, ... ; [do something with tables t1 and t2 here]; UNLOCK TABLES. Table level locks make you transactions to queue nicely, and deadlocks are avoided. Note that LOCK TABLES implicitly starts a transaction, just like the command BEGIN, and UNLOCK TABLES implicitly ends the transaction in a COMMIT.
Another solution to serialize transactions is to create an auxiliary 'semaphore' table where there is just a single row. Each transaction updates that row before accessing other tables. In that way all transactions happen in a serial fashion. Note that then also the InnoDB instant deadlock detection algorithm works, because the serializing lock is a row level lock. In MySQL table level locks we have to resort to the timeout method to resolve a deadlock.

7.5.8.6 Deadlock Detection and Rollback

InnoDB automatically detects a deadlock of transactions and rolls back a transaction or transactions to prevent the deadlock. Starting from version 4.0.5, InnoDB will try to pick small transactions to roll back. The size of a transaction is determined by the number of rows it has inserted, updated, or deleted. Previous to 4.0.5, InnoDB always rolled back the transaction whose lock request was the last one to build a deadlock, that is, a cycle in the waits-for graph of transactions.

InnoDB cannot detect deadlocks where a lock set by a MySQL LOCK TABLES statement is involved, or if a lock set in another storage engine than InnoDB is involved. You have to resolve these situations using innodb_lock_wait_timeout set in `my.cnf'.

When InnoDB performs a complete rollback of a transaction, all the locks of the transaction are released. However, if just a single SQL statement is rolled back as a result of an error, some of the locks set by the SQL statement may be preserved. This is because InnoDB stores row locks in a format where it cannot afterwards know which was set by which SQL statement.

7.5.8.7 An Example of How the Consistent Read Works in InnoDB

Suppose you are running on the default REPEATABLE READ isolation level. When you issue a consistent read, that is, an ordinary SELECT statement, InnoDB will give your transaction a timepoint according to which your query sees the database. Thus, if transaction B deletes a row and commits after your timepoint was assigned, then you will not see the row deleted. Similarly with inserts and updates.

You can advance your timepoint by committing your transaction and then doing another SELECT.

This is called multi-versioned concurrency control.

                  User A                 User B

              SET AUTOCOMMIT=0;      SET AUTOCOMMIT=0;
time
|             SELECT * FROM t;
|             empty set
|                                    INSERT INTO t VALUES (1, 2);
|
v             SELECT * FROM t;
              empty set
                                     COMMIT;

              SELECT * FROM t;
              empty set;

              COMMIT;

              SELECT * FROM t;
              ---------------------
              |    1    |    2    |
              ---------------------

Thus user A sees the row inserted by B only when B has committed the insert, and A has committed his own transaction so that the timepoint is advanced past the commit of B.

If you want to see the ``freshest'' state of the database, you should use a locking read:

SELECT * FROM t LOCK IN SHARE MODE;

7.5.8.8 パフォーマンス・チューニング Tips

1. もし Unix `top' や Windows `Task Manager' が CPU 使用率を 70% 未満に表示している場合、おそらく、disk アクセスに処理が取られています。とても多くのトランザクションのコミットを作成しているか、バッファプールが小さいのでしょう。バッファプールを大きくすれば良くなりますが、しかし、バッファプールは物理メモリの 80% より大きくしないように。

2. Wrap several modifications into one transaction. InnoDB must flush the log to disk at each transaction commit, if that transaction made modifications to the database. Since the rotation speed of a disk is typically at most 167 revolutions/second, that constrains the number of commits to the same 167/second if the disk does not fool the operating system.

3. If you can afford the loss of some latest committed transactions, you can set the `my.cnf' parameter innodb_flush_log_at_trx_commit to zero. InnoDB tries to flush the log anyway once in a second, though the flush is not guaranteed.

4. Make your log files big, even as big as the buffer pool. When InnoDB has written the log files full, it has to write the modified contents of the buffer pool to disk in a checkpoint. Small log files will cause many unnecessary disk writes. The drawback in big log files is that recovery time will be longer.

5. Also the log buffer should be quite big, say 8 MB.

6. (Relevant from 3.23.39 up.) Linux や　Unix のいくつかのバージョンでは、disk のファイルのフラッシュに Unix fdatasync やそれに似た方法を使用しますが、これは驚くほど遅いです。 InnoDB のデフォルトの方法は、fdatasync 関数を使用します。もしデータベースの書き込みのパフォーマンスに満足しない場合には、 `my.cnf' ファイルで innodb_flush_method を O_DSYNC にセットしてもかまいません。しかし O_DSYNC はほとんどのシステムでは遅いようです。

7. InnoDB にデータを流し込む場合には、 MySQL の設定が autocommit=1 になっていないようにします。それぞれの全ての insert が log をディスクにフラッシュすることを要求するからです。取り込む SQL の最初に

SET AUTOCOMMIT=0;

を追加し、最後に

COMMIT;

を書きます。

もし `mysqldump' を --opt オプションで使用しているなら、上記のように SET AUTOCOMMIT=0; ... COMMIT; でダンプファイルを囲まないでも、早く InnoDB テーブルにダンプを取り込むことが可能です。

8. 大量のinsertの大きなロールバックに気をつけなさい； InnoDB は insert の時には、disk I/O を少なくするために insert buffer を使用します。しかし、ロールバックには同じような機構は使用されません。ロールバック時のディスク・バウンドは、対応するinsertに比べて 30倍の時間がかかります。データベースのプロセスを kill することは、何の解決にもなりません。なぜなら、データベースがスタートアップした時、再びロールバックが開始されるからです。制御しきれないロールバックから免れる方法は、バッファプールを増やすか(こうすればロールバックは CPU-bound でおさまり、速く終わります)、 InnoDB 全体を消し去ってしまうかしかありません。

9. Beware also of other big disk-bound operations. Use DROP TABLE or TRUNCATE (from MySQL-4.0 up) to empty a table, not DELETE FROM yourtable.

10. もしたくさんのレコードをインサートする必要があるならば、サーバーとクライアントのコミュニケーションのオーバーヘッドを軽減するために、マルチライン INSERT を使用します：

INSERT INTO yourtable VALUES (1, 2), (5, 5);

この方法は InnoDB だけではなく、他のテーブル型にインサートする時も使用できます。

7.5.8.9 InnoDB モニタ

バージョン 3.23.41 から、InnoDB は InnoDB Monitor を含むようになりました。これは InnoDB 内部の状態を表示するものです。このスイッチを on にすると、MySQL サーバは 15 秒毎に標準出力にデータを出力するようになります。 (note: the MySQL client will not print anything) このデータはパフォーマンス・チューニングに便利です。

On Windows you must start mysqld-max from a MS-DOS prompt with the --standalone --console options to direct the output to the MS-DOS prompt window.

There is a separate innodb_lock_monitor which prints the same information as innodb_monitor plus information on locks set by each transaction.

The printed information includes data on:

table and record locks held by each active transaction,
lock waits of a transactions,
semaphore waits of threads,
pending file I/O requests,
buffer pool statistics, and
purge and insert buffer merge activity of the main thread of InnoDB.

InnoDB モニタは、以下の SQL コマンドでスタートできます：

CREATE TABLE innodb_monitor(a int) type = innodb;

停止は：

DROP TABLE innodb_monitor;

CREATE TABLE 文は、MySQL SQL パーサーを通して InnoDB エンジンにアクセスするためだけに使用されているにすぎません：作成されたテーブルは InnoDB モニタとは関係しません。もし、モニタ実行中にデータベースを停止し、そのあと、モニタを再び起動したいなら、あなたはモニタを起動するための新しい CREATE TABLE 文を実行する前に、前に作ったテーブルを破棄しなくてはなりません。この構文は将来のリリースで変わるかもしれません。

InnoDB モニタの出力サンプル:

================================
010809 18:45:06 INNODB MONITOR OUTPUT
================================
--------------------------
LOCKS HELD BY TRANSACTIONS
--------------------------
LOCK INFO:
Number of locks in the record hash table 1294
LOCKS FOR TRANSACTION ID 0 579342744
TABLE LOCK table test/mytable trx id 0 582333343 lock_mode IX

RECORD LOCKS space id 0 page no 12758 n bits 104 table test/mytable index
PRIMARY trx id 0 582333343 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 74; 1-byte offs FALSE;
info bits 0
 0: len 4; hex 0001a801; asc ;; 1: len 6; hex 000022b5b39f; asc ";;
 2: len 7; hex 000002001e03ec; asc ;; 3: len 4; hex 00000001;
...
-----------------------------------------------
CURRENT SEMAPHORES RESERVED AND SEMAPHORE WAITS
-----------------------------------------------
SYNC INFO:
Sorry, cannot give mutex list info in non-debug version!
Sorry, cannot give rw-lock list info in non-debug version!
-----------------------------------------------------
SYNC ARRAY INFO: reservation count 6041054, signal count 2913432
4a239430 waited for by thread 49627477 op. S-LOCK file NOT KNOWN line 0
Mut ex 0 sp 5530989 r 62038708 sys 2155035;
rws 0 8257574 8025336; rwx 0 1121090 1848344
-----------------------------------------------------
CURRENT PENDING FILE I/O'S
--------------------------
Pending normal aio reads:
Reserved slot, messages 40157658 4a4a40b8
Reserved slot, messages 40157658 4a477e28
...
Reserved slot, messages 40157658 4a4424a8
Reserved slot, messages 40157658 4a39ea38
Total of 36 reserved aio slots
Pending aio writes:
Total of 0 reserved aio slots
Pending insert buffer aio reads:
Total of 0 reserved aio slots
Pending log writes or reads:
Reserved slot, messages 40158c98 40157f98
Total of 1 reserved aio slots
Pending synchronous reads or writes:
Total of 0 reserved aio slots
-----------
BUFFER POOL
-----------
LRU list length 8034
Free list length 0
Flush list length 999
Buffer pool size in pages 8192
Pending reads 39
Pending writes: LRU 0, flush list 0, single page 0
Pages read 31383918, created 51310, written 2985115
----------------------------
END OF INNODB MONITOR OUTPUT
============================
010809 18:45:22 InnoDB starts purge
010809 18:45:22 InnoDB purged 0 pages

出力に関するいくつかの注釈:

もし LOCKS HELD BY TRANSACTIONS セクションにロック待ちがレポートされているなら、あなたのアプリケーションがロックの競合を起こしています。出力はトランザクションのデッドロックの理由をたどるための手助けになります。
SYNC INFO セクションは予約されたセマフォを報告します。ただし、もし `univ.i' ファイルの UNIV_SYNC_DEBUG を有効にして、InnoDB をコンパイルしているならば。
Section SYNC ARRAY INFO reports threads waiting for a semaphore and statistics on how many times threads have needed a spin or a wait on a mutex or a rw-lock semaphore. A big number of threads waiting for semaphores may be a result of disk I/O, or contention problems inside InnoDB. Contention can be due to heavy parallelism of queries, or problems in operating system thread scheduling.
CURRENT PENDING FILE I/O'S セクションは、延期されたファイル I/O リクエストの一覧をだします。 A large number of these indicates that the workload is disk I/O -bound.
BUFFER POOL セクションは、ページの読み込みと書き出しの統計情報を与えます。 You can calculate from these numbers how many datafile I/Os your queries are currently doing.

7.5.9 Implementation of Multi-versioning

InnoDB は multiversioned database なので、テーブル空間中のレコードの古いバージョン情報が保持されなくてはなりません。 This information is stored in a data structure we call a rollback segment after an analogous data structure in Oracle.

InnoDB は内部では、データベースに保存されているそれぞれのレコードに対して 2 つのフィールドを付加しています。 A 6-byte field tells the transaction identifier for the last transaction which inserted or updated the row. Also a deletion is internally treated as an update where a special bit in the row is set to mark it as deleted. それぞれのレコードには、roll pointer と呼ばれる 7-byte のフィールドも含みます。 The roll pointer points to an undo log record written to the rollback segment. If the row was updated, then the undo log record contains the information necessary to rebuild the content of the row before it was updated.

InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.

ロールバックセグメントの中の Undo log は、 insert と update の undo log に分けれれます。 Insert undo log は、トランザクションのロールバックの時にだけ必要とされ、そのトランザクションがコミットするやいなやすぐに破棄されます。 Update undo logs are used also in consistent reads, and they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.

You must remember to commit your transactions regularly, also those transactions which only issue consistent reads. Otherwise InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.

The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space need for your rollback segment.

In our multi-versioning scheme a row is not physically removed from the database immediately when you delete it with an SQL statement. Only when InnoDB can discard the update undo log record written for the deletion, it can also physically remove the corresponding row and its index records from the database. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement which did the deletion.

7.5.10 テーブルとインデックスの構造

MySQL は、テーブルの data dictionary 情報を、データベースディレクトリ内の、 `.frm' ファイルに保存します。しかし、InnoDB タイプの全てのテーブルは、 InnoDB の内部のデータ・ディクショナリを (InnoDBの)テーブル空間に持っています。 MySQL がテーブルやデータベースを破棄する時は、 `.frm' ファイルとそれに対応する InnoDB のデータ・ディクショナリ内のエントリが消去されます。これは、`.frm' ファイルを移動するだけでは InnoDB テーブルを違うデータベースに移動することができない理由であり、また、MySQL 3.23.43 以下のバージョンで、InnoDB 型のテーブルでは DROP DATABASE が動作しなかった理由でもあります。

Every InnoDB table has a special index called the clustered index where the data of the rows is stored. If you define a PRIMARY KEY on your table, then the index of the primary key will be the clustered index.

もしテーブルにプライマリ・キーを定義しなかった場合には、 InnoDB は内部ではクラスタド・インデックス(clustered index)を作成します。そこでは、InnoDB がそのテーブルのレコードに割り当てた row id をもとにレコードが整理されます。 row id は 6-byte のフィールドで、新しいレコードが挿入されると単純に数が増加していきます。ということで、row id のよって整理されているレコードは、物理的にインサートされた順になっています。

Accessing a row through the clustered index is fast, because the row data will be on the same page where the index search leads us. In many databases the data is traditionally stored on a different page from the index record. If a table is large, the clustered index architecture often saves a disk I/O when compared to the traditional solution.

The records in non-clustered indexes (we also call them secondary indexes), in InnoDB contain the primary key value for the row. InnoDB uses this primary key value to search for the row from the clustered index. Note that if the primary key is long, the secondary indexes will use more space.

7.5.10.1 Physical Structure of an Index

InnoDB の全てのインデックスは B-Tree で、インデックスのレコードは、ツリーのリーフページ (原文 leaf page. ブロックと言い換える or not?) に保存されます。インデックスページのデフォルトの大きさは 16kB です。新しいレコードがインサートされた時、InnoDB はページの 1/16 を、将来のインデックスレコードの insert や update に備えて free にしようとします。

If index records are inserted in a sequential (ascending or descending) order, the resulting index pages will be about 15/16 full. If records are inserted in a random order, then the pages will be 1/2 - 15/16 full. If the fillfactor of an index page drops below 1/2, InnoDB will try to contract the index tree to free the page.

7.5.10.2 Insert Buffering

プライマリ・キーが一意で、新しいレコードがそのプライマリー・キーの並びの順番で insert されるというのは、データベースのアプリケーションではよくある事です。この場合は、clustered index に対するインサートはディスクからのランダムリードを要求しません。

これに対して、secondary indexes は通常非ユニークで、 secondary indexes に対するインサートされる順番はばらばらです。 This would cause a lot of random disk I/O's without a special mechanism used in InnoDB.

If an index record should be inserted to a non-unique secondary index, InnoDB checks if the secondary index page is already in the buffer pool. If that is the case, InnoDB will do the insertion directly to the index page. But, if the index page is not found from the buffer pool, InnoDB inserts the record to a special insert buffer structure. The insert buffer is kept so small that it entirely fits in the buffer pool, and insertions can be made to it very fast.

The insert buffer is periodically merged to the secondary index trees in the database. Often we can merge several insertions on the same page in of the index tree, and hence save disk I/Os. It has been measured that the insert buffer can speed up insertions to a table up to 15 times.

7.5.10.3 Adaptive Hash Indexes

If a database fits almost entirely in main memory, then the fastest way to perform queries on it is to use hash indexes. InnoDB has an automatic mechanism which monitors index searches made to the indexes defined for a table, and if InnoDB notices that queries could benefit from building of a hash index, such an index is automatically built.

But note that the hash index is always built based on an existing B-tree index on the table. InnoDB can build a hash index on a prefix of any length of the key defined for the B-tree, depending on what search pattern InnoDB observes on the B-tree index. A hash index can be partial: it is not required that the whole B-tree index is cached in the buffer pool. InnoDB will build hash indexes on demand to those pages of the index which are often accessed.

In a sense, through the adaptive hash index mechanism InnoDB adapts itself to ample main memory, coming closer to the architecture of main memory databases.

7.5.10.4 Physical Record Structure

InnoDB 中のそれぞれのインデックスレコードは、6 bytes のヘッダを含みます。 The header is used to link consecutive records together, and also in the row level locking.
Records in the clustered index contain fields for all user-defined columns. 1つの 6-byte フィールドが transaction id のために存在し、 1つの 7-byte フィールドが roll pointer のために存在します。
If the user has not defined a primary key for a table, then each clustered index record contains also a 6-byte row id field.
Each secondary index record contains also all the fields defined for the clustered index key.
A record contains also a pointer to each field of the record. If the total length of the fields in a record is < 128 bytes, then the pointer is 1 byte, else 2 bytes.

7.5.10.5 How an Auto-increment Column Works in InnoDB

データベースがスタートアップした後、ユーザーが auto-increment が定義されているテーブル T に対して初めて、insert を行なう時、もしユーザーが auto-increment のフィールドに明示的に値を与えなければ、 InnoDB は SELECT MAX(auto-inc-column) FROM T を実行し、そしてその結果得た値に１を加えた数を、そのフィールドとテーブルの auto-increment counter に与えます。これをもって、テーブル T の auto-increment counter が初期化されたと言います。

InnoDB follows the same procedure in initializing the auto-increment counter for a freshly created table.

もしユーザーが auto-increment のフィールドに 0 をインサートしたなら、 InnoDB はそれを、値が与えられなかったものとして扱うことに注意してください。

auto-increment counter が初期化された後、もしユーザーがauto-increment のフィールドに明示的に値を与えてレコードを insert すれば、その値が現在の auto-increment counter より大きければ auto-increment counter はその値にセットされます。もしユーザーが値を明示して与えなければ、InnODB は auto-increment counter を１つ増やし、そしてその値をフィールドの値としてセットします。

The auto-increment mechanism, when assigning values from the counter, bypasses locking and transaction handling. Therefore you may also get gaps in the number sequence if you roll back transactions which have got numbers from the counter.

The behaviour of auto-increment is not defined if a user gives a negative value to the column or if the value becomes bigger than the maximum integer that can be stored in the specified integer type.

7.5.11 File Space Management and Disk I/O

7.5.11.1 Disk I/O

ディスク I/O では、InnoDB は非同期 I/O を使用します. On Windows NT it uses the native asynchronous I/O provided by the operating system. On Unix, InnoDB uses simulated asynchronous I/O built into InnoDB: InnoDB creates a number of I/O threads to take care of I/O operations, such as read-ahead. In a future version we will add support for simulated aio on Windows NT and native aio on those versions of Unix which have one.

On Windows NT InnoDB uses non-buffered I/O. That means that the disk pages InnoDB reads or writes are not buffered in the operating system file cache. This saves some memory bandwidth.

Starting from 3.23.41 InnoDB uses a novel file flush technique called doublewrite. It adds safety to crash recovery after an operating system crash or a power outage, and improves performance on most Unix flavors by reducing the need for fsync operations.

Doublewrite means that InnoDB before writing pages to a datafile first writes them to a contiguous tablespace area called the doublewrite buffer. Only after the write and the flush to the doublewrite buffer has completed, InnoDB writes the pages to their proper positions in the datafile. If the operating system crashes in the middle of a page write, InnoDB will in recovery find a good copy of the page from the doublewrite buffer.

3.23.41 からは、データファイルにロー・ディスク(raw disk)パーティションが使用できますが、これはまだテスト段階です。新しいデータファイルを作成するとき、 innodb_data_file_path に指定するデータファイルのサイズのすぐ後に、 newraw キーワードを付加すれば実現されます。パーティションのサイズは、指定したサイズ以上(>=) でなければなりません。 InnoDB の 1M は、1024 x 1024 bytes を示します。現実のディスク製品の表記では、1000 000 bytes を 1MB と表示することに注意してください。

innodb_data_file_path=hdd1:5Gnewraw;hdd2:2Gnewraw

設定を記述してデータベースを再起動した後は、あなたはキーワードを raw に書き換えなければなりません。書き換えを忘れると、InnoDB はそのパーティションを新規に上書きしてしまいます！

innodb_data_file_path=hdd1:5Graw;hdd2:2Graw

By using a raw disk you can on some Unixes perform unbuffered I/O.

There are two read-ahead heuristics in InnoDB: sequential read-ahead and random read-ahead. In sequential read-ahead InnoDB notices that the access pattern to a segment in the tablespace is sequential. Then InnoDB will post in advance a batch of reads of database pages to the I/O system. In random read-ahead InnoDB notices that some area in a tablespace seems to be in the process of being fully read into the buffer pool. Then InnoDB posts the remaining reads to the I/O system.

7.5.11.2 File Space Management

設定ファイルに記述したデータファイルを InnoDB のテーブル空間を作ります。データファイルは単純に連結されてテーブル空間になります。 striping は使用されません。 Currently you cannot directly instruct where the space is allocated for your tables, except by using the following fact: from a newly created tablespace InnoDB will allocate space starting from the low end.

テーブル空間は、デフォルトサイズが 16kB のデータベースページから成ります。それらページは 64個までの連続する部分にグループ化されます。その、テーブル空間内の 'まとまり' は、InnoDB ではセグメントと呼ばれます。 The pages are grouped into extents of 64 consecutive pages. The 'files' inside a tablespace are called segments in InnoDB. The name of the rollback segment is somewhat misleading because it actually contains many segments in the tablespace.

InnoDB 中のそれぞれのインデックスには、2つのセグメントが割り当てられます；一つは B-tree の非リーフノードに、もう一つはリーフのノードに。 The idea here is to achieve better sequentiality for the leaf nodes, which contain the data.

テーブル空間内であるセグメントが大きくなった場合、 InnoDB は最初の 32 ページを個々に割り当てます。そののち、InnoDB は全領域をそのセグメントに割り当はじめます。 InnoDB はデータがより良く連続することを確実にするために、領域を 4つまで、一度に大きなセグメントに追加することが出来ます。 When a segment grows inside the tablespace, InnoDB allocates the first 32 pages to it individually. After that InnoDB starts to allocate whole extents to the segment. InnoDB can add to a large segment up to 4 extents at a time to ensure good sequentiality of data.

テーブル空間内のいくつかのページには、他のページのビットマップが含まれ、その結果、InnoDB テーブル空間内の数個の領域は、全てをセグメントに割り当てることができず、個々のページ毎に割り当てられます。 Some pages in the tablespace contain bitmaps of other pages, and therefore a few extents in an InnoDB tablespace cannot be allocated to segments as a whole, but only as individual pages.

テーブル空間の available free space を知るためにあなたが SHOW TABLE STATUS FROM ... LIKE ... クエリを実行した時、 InnoDB は、テーブル空間内の、確実に使用できるフリーな領域の合計をレポートします。 InnoDB は、clean-up と他の内部的な使用目的のために、常にいくつかの領域をリザーブします。これらのリザーブされた領域は free space には含まれません。

When you delete data from a table, InnoDB will contract the corresponding B-tree indexes. It depends on the pattern of deletes if that frees individual pages or extents to the tablespace, so that the freed space is available for other users. Dropping a table or deleting all rows from it is guaranteed to release the space to other users, but remember that deleted rows can be physically removed only in a purge operation after they are no longer needed in transaction rollback or consistent read.

7.5.11.3 テーブルのデフラグメント

If there are random insertions or deletions in the indexes of a table, the indexes may become fragmented. By fragmentation we mean that the physical ordering of the index pages on the disk is not close to the alphabetical ordering of the records on the pages, or that there are many unused pages in the 64-page blocks which were allocated to the index.

定期的に mysqldump を使用してテーブルをテキストファイルにダンプし、テーブルを削除し、ダンプファイルからテーブルを作り直す事を行なうことで、インデックスの走査を速くすることが可能です。デフラグメントを行なう他の方法は、テーブルのタイプを ALTER で一度 MyISAM にし、再び InnoDB に変更することです。 Note that a MyISAM table must fit in a single file on your operating system.

If the insertions to and index are always ascending and records are deleted only from the end, then the file space management algorithm of InnoDB guarantees that fragmentation in the index will not occur.

7.5.12 エラーハンドリング

The error handling in InnoDB is not always the same as specified in the ANSI SQL standards. According to the ANSI standard, any error during an SQL statement should cause the rollback of that statement. InnoDB sometimes rolls back only part of the statement, or the whole transaction. The following list specifies the error handling of InnoDB.

もしテーブルスペース中のファイルの空間を使いきれば、 MySQL 'Table is full' エラーとなり、 InnoDB は SQL 文をロール・バックします。
トランザクションのデッドロックやロック待ちのタイムアウトの場合は、 InnoDB はトランザクション全てをロール・バックします。
A duplicate key error only rolls back the insert of that particular row, even in a statement like INSERT INTO ... SELECT .... This will probably change so that the SQL statement will be rolled back if you have not specified the IGNORE option in your statement.
'row too long' エラーは、その SQL 文をロール・バックします。
Other errors are mostly detected by the MySQL layer of code, and they roll back the corresponding SQL statement.

7.5.13 InnoDB テーブルの制限

Warning: MySQL システムテーブルを MyISAM から InnoDB テーブルに 変換してはいけません！ これはサポートされていません；もし変換してしまうと、バックアップから前のシステムテーブルを復旧させるかあるいは mysql_install_db スクリプトを使用してシステムテーブルを再構築しない限り、MySQL は起動できません。
SHOW TABLE STATUS は、 InnoDB テーブルに関する統計を正確に返しません (ただしテーブルが使用している物理サイズは正確)。 The row count is only a rough estimate used in SQL optimisation.
フィールドの先頭部分にユニークなインデックスを作成しようとすると、エラーになります：
```
CREATE TABLE T (A CHAR(20), B INT, UNIQUE (A(5))) TYPE = InnoDB;
```
もし、フィールドの先頭部分に、非ユニークインデックスを作成しようとすると、 InnoDB はそのフィールドの全体にインデックスを張ります。
INSERT DELAYED は InnoDB テーブルではサポートされていません。
MySQL LOCK TABLES オペレーションは、 InnoDB のロー・レベル・ロック (row level lock) を感知しません。これは、たとえ他のユーザーによるトランザクション(row lovel lock をしている)があるテーブルに存在していたとしても、あなたが同じテーブルに対して、テーブル・ロックをかけれるということです。 Thus your operations on the table may have to wait if they collide with these locks of other users. Also a deadlock is possible. However, this does not endanger transaction integrity, because the row level locks set by InnoDB will always take care of the integrity. Also, a table lock prevents other transactions from acquiring more row level locks (in a conflicting lock mode) on the table.
BLOB や TEXT フィールドにインデックスを張ることができません。
1つのテーブルには 1000個以上のフィールドを持つことができません。
DELETE FROM TABLE does not regenerate the table but instead deletes all rows, one by one, which is not that fast. In future versions of MySQL you can use TRUNCATE which is fast.
InnoDB のデフォルトのデータベースのページサイズは 16kB です。コンパイルし直すことで、これは 8kB から　64kB まで変更可能です。 The maximun row length is slightly less than half of a database page in versions <= 3.23.40 of InnoDB. バージョン 3.23.41 からは、 BLOB と TEXT フィールドは 4GB 未満まで許されます。 1 レコードの最大容量も 4GB 未満までです。 InnoDB does not store fields whose size is <= 128 bytes on separate pages. After InnoDB has modified the row by storing long fields on separate pages, the remaining length of the row must be less than half a database page. The maximun key length is 7000 bytes.
いくつかのオペレーティングシステムでは、データファイルは < 2GB となります。 The combined size of log files must be < 4 GB.
テーブル空間のサイズの最大値は 40億データベースページです。これは 1つのテーブルの最大のサイズでもあります。テーブル空間の最低限のサイズは 10MB です。

7.5.14 InnoDB Change History

7.5.14.1 MySQL/InnoDB-4.1.0, April x, 2003

InnoDB now supports up to 64 GB of buffer pool memory in a Windows 32-bit Intel computer. This is possible because InnoDB can use the AWE extension of Windows to address memory over the 4 GB limit of a 32-bit process. A new startup variable innodb_buffer_pool_awe_mem_mb enables AWE and sets the size of the buffer pool in megabytes.
Reduced the size of buffer headers and the lock table. InnoDB uses 2 % less memory.

7.5.14.2 MySQL/InnoDB-3.23.56, March xx, 2003

Fixed a major bug in InnoDB query optimization: queries of type SELECT ... WHERE indexcolumn < x and SELECT ... WHERE indexcolumn > x could cause a table scan even if the selectivity would have been very good.

7.5.14.3 MySQL/InnoDB-4.0.12, March xx, 2003

In crash recovery InnoDB now prints the progress in percents of a transaction rollback.

7.5.14.4 MySQL/InnoDB-4.0.11, February 25, 2003

Fixed a bug introduced in 4.0.10: SELECT ... FROM ... ORDER BY ... DESC could hang in an infinite loop.
An outstanding bug: SET FOREIGN_KEY_CHECKS=0 is not replicated properly in the MySQL replication.

7.5.14.5 MySQL/InnoDB-4.0.10, February 4, 2003

In INSERT INTO t1 SELECT ... FROM t2 WHERE ... MySQL previously set a table level read lock on t2. This lock is now removed.
Increased SHOW INNODB STATUS max printed length to 200 kB.
Fixed a major bug in InnoDB query optimization: queries of type SELECT ... WHERE indexcolumn < x and SELECT ... WHERE indexcolumn > x could cause a table scan even if the selectivity would have been very good.
Fixed a bug: purge could cause a hang in a BLOB table where the primary key index tree was of height 1. Symptom: semaphore waits caused by an X-latch set in btr_free_externally_stored_field().
Fixed a bug: using InnoDB HANDLER commands on a fresh handle crashed mysqld in ha_innobase::change_active_index().
Fixed a bug: if MySQL estimated a query in the middle of a SELECT statement, InnoDB could hang on the adaptive hash index latch in btr0sea.c.
Fixed a bug: InnoDB could report table corruption and assert in page_dir_find_owner_slot() if an adaptive hash index search coincided with purge or an insert.
Fixed a bug: some file system snapshot tool in Windows 2000 could cause an InnoDB file write to fail with error 33 ERROR_LOCK_VIOLATION. In synchronous writes InnoDB now retries the write 100 times at 1 second intervals.
Fixed a bug: REPLACE INTO t1 SELECT ... did not work if t1 has an auto-inc column.
An outstanding bug: SET FOREIGN_KEY_CHECKS=0 is not replicated properly in the MySQL replication.

7.5.14.6 MySQL/InnoDB-3.23.55, January 24, 2003

In INSERT INTO t1 SELECT ... FROM t2 WHERE ... MySQL previously set a table level read lock on t2. This lock is now removed.
Fixed a bug: if the combined size of InnoDB log files was >= 2 GB in a 32-bit computer, InnoDB would write log in a wrong position. That could make crash recovery and InnoDB Hot Backup to fail in log scan.
Fixed a bug: index cursor restoration could theoretically fail.
Fixed a bug: an assertion in btr0sea.c, in function btr_search_info_update_slow could theoretically fail in a race of 3 threads.
Fixed a bug: purge could cause a hang in a BLOB table where the primary key index tree was of height 1. Symptom: semaphore waits caused by an X-latch set in btr_free_externally_stored_field().
Fixed a bug: if MySQL estimated a query in the middle of a SELECT statement, InnoDB could hang on the adaptive hash index latch in btr0sea.c.
Fixed a bug: InnoDB could report table corruption and assert in page_dir_find_owner_slot() if an adaptive hash index search coincided with purge or an insert.
Fixed a bug: some file system snapshot tool in Windows 2000 could cause an InnoDB file write to fail with error 33 ERROR_LOCK_VIOLATION. In synchronous writes InnoDB now retries the write 100 times at 1 second intervals.
An outstanding bug: SET FOREIGN_KEY_CHECKS=0 is not replicated properly in the MySQL replication. The fix will appear in 4.0.11 and will probably not be backported to 3.23.
Fixed bug in InnoDB page0cur.c in function page_cur_search_with_match which caused InnoDB to remain on the same page forever. This bug is evident only in tables with more than one page.

7.5.14.7 MySQL/InnoDB-4.0.9, January 14, 2003

Removed the warning message: 'InnoDB: Out of memory in additional memory pool.'
Fixed a bug: if the combined size of InnoDB log files was >= 2 GB in a 32-bit computer, InnoDB would write log in a wrong position. That could make crash recovery and InnoDB Hot Backup to fail.
Fixed a bug: index cursor restoration could theoretically fail.

7.5.14.8 MySQL/InnoDB-4.0.8, January 7, 2003

InnoDB now supports also FOREIGN KEY (...) REFERENCES ...(...) [ON UPDATE CASCADE | ON UPDATE SET NULL | ON UPDATE RESTRICT | ON UPDATE NO ACTION].
Tables and indexes now reserve 4 % less space in the tablespace. Also existing tables reserve less space. By upgrading to 4.0.8 you will see more free space in "InnoDB free" in SHOW TABLE STATUS.
Fixed bugs: updating the PRIMARY KEY of a row would generate a foreign key error on all FOREIGN KEYs which referenced secondary keys of the row to be updated. Also, if a referencing FOREIGN KEY constraint only referenced the first columns in an index, and there were more columns in that index, updating the additional columns generated a foreign key error.
Fixed a bug: if an index contains some column twice, and that column is updated, the table will become corrupt. From now on InnoDB prevents creation of such indexes.
Fixed a bug: removed superfluous error 149 and 150 printouts from the .err log when a locking SELECT caused a deadlock or a lock wait timeout.
Fixed a bug: an assertion in btr0sea.c, in function btr_search_info_update_slow could theoretically fail in a race of 3 threads.
Fixed a bug: one could not switch a session transaction isolation level back to REPEATABLE READ after setting it to something else.

7.5.14.9 MySQL/InnoDB-4.0.7, December 26, 2002

InnoDB in 4.0.7 is essentially the same as in 4.0.6.

7.5.14.10 MySQL/InnoDB-4.0.6, December 19, 2002

Since innodb_log_arch_dir has no relevance under MySQL, there is no need to specify it any more in my.cnf.
LOAD DATA INFILE in AUTOCOMMIT=1 mode no longer does implicit commits for each 1 MB of written binlog.
Fixed a bug introduced in 4.0.4: LOCK TABLES ... READ LOCAL should not set row locks on the rows read. This caused deadlocks and lock wait timeouts in mysqldump.
Fixed two bugs introduced in 4.0.4: in AUTO_INCREMENT, REPLACE could cause the counter to be left 1 too low. A deadlock or a lock wait timeout could cause the same problem.
Fixed a bug: TRUNCATE on a TEMPORARY table crashed InnoDB.
Fixed a bug introduced in 4.0.5: if binlogging was not switched on, INSERT INTO ... SELECT ... or CREATE TABLE ... SELECT ... could cause InnoDB to hang on a semaphore created in btr0sea.c, line 128. Workaround: switch binlogging on.
Fixed a bug: in replication issuing SLAVE STOP in the middle of a multi-statement transaction could cause that SLAVE START would only perform a part of the transaction. A similar error could occur if the slave crashed and was restarted.

7.5.14.11 MySQL/InnoDB-3.23.54, December 12, 2002

Fixed a bug: the InnoDB range estimator greatly exaggerated the size of a short index range if the paths to the endpoints of the range in the index tree happened to branch already in the root. This could cause unnecessary table scans in SQL queries.
Fixed a bug: ORDER BY could fail if you had not created a primary key to a table, but had defined several indexes of which at least one was a UNIQUE index with all its columns declared as NOT NULL.
Fixed a bug: a lock wait timeout in connection with ON DELETE CASCADE could cause corruption in indexes.
Fixed a bug: if a SELECT was done with a unique key from a primary index, and the search matched to a delete-marked record, InnoDB could erroneously return the NEXT record.
Fixed a bug introduced in 3.23.53: LOCK TABLES ... READ LOCAL should not set row locks on the rows read. This caused deadlocks and lock wait timeouts in mysqldump.
Fixed a bug: if an index contains some column twice, and that column is updated, the table will become corrupt. From now on InnoDB prevents creation of such indexes.

7.5.14.12 MySQL/InnoDB-4.0.5, November 18, 2002

InnoDB now supports also transaction isolation levels READ COMMITTED and READ UNCOMMITTED. READ COMMITTED more closely emulates Oracle and makes porting of applications from Oracle to MySQL easier.
Deadlock resolution is now selective: we try to pick as victims transactions with less modified or inserted rows.
FOREIGN KEY definitions are now aware of the lower_case_table_names setting in my.cnf.
SHOW CREATE TABLE does not output the database name to a FOREIGN KEY definition if the referred table is in the same database as the table.
InnoDB does a consistency check to most index pages before writing them to a data file.
If you set innodb_force_recovery > 0, InnoDB tries to jump over corrupt index records and pages when doing SELECT * FROM table. This helps in dumping.
InnoDB now again uses asynchronous unbuffered i/o in Windows 2000 and XP; only unbuffered simulated async i/o in NT, 95/98/ME.
Fixed a bug: the InnoDB range estimator greatly exaggerated the size of a short index range if the paths to the endpoints of the range in the index tree happened to branch already in the root. This could cause unnecessary table scans in SQL queries. The fix will also be backported to 3.23.54.
Fixed a bug present in 3.23.52, 4.0.3, 4.0.4: InnoDB startup could take very long or even crash on some Win 95/98/ME computers.
Fixed a bug: the AUTO-INC lock was held to the end of the transaction if it was granted after a lock wait. This could cause unnecessary deadlocks.
Fixed a bug: if SHOW INNODB STATUS, innodb_monitor, or innodb_lock_monitor had to print several hundred transactions in one report, and the output became truncated, InnoDB would hang, printing to the error log many waits for a mutex created at srv0srv.c, line 1621.
Fixed a bug: SHOW INNODB STATUS on Unix always reported average file read size as 0 bytes.
Fixed a potential bug in 4.0.4: InnoDB now does ORDER BY ... DESC like MyISAM.
Fixed a bug: DROP TABLE could cause crash or a hang if there was a rollback concurrently running on the table. The fix will only be backported to 3.23 if this appears a real problem for users.
Fixed a bug: ORDER BY could fail if you had not created a primary key to a table, but had defined several indexes of which at least one was a UNIQUE index with all its columns declared as NOT NULL.
Fixed a bug: a lock wait timeout in connection with ON DELETE CASCADE could cause corruption in indexes.
Fixed a bug: if a SELECT was done with a unique key from a primary index, and the search matched to a delete-marked record, InnoDB could return the NEXT record.
Outstanding bugs: in 4.0.4 two bugs were introduced to AUTO_INCREMENT. REPLACE can cause the counter to be left 1 too low. A deadlock or a lock wait timeout can cause the same problem. These will be fixed in 4.0.6.

7.5.14.13 MySQL/InnoDB-3.23.53, October 9, 2002

We again use unbuffered disk i/o to data files in Windows. Win XP and Win 2000 read performance seems to be very poor with normal i/o.
Tuned range estimator so that index range scans are preferred over full index scans.
Allow dropping and creating a table even if innodb_force_recovery is set. One can use this to drop a table which would cause a crash in rollback or purge, or if a failed table import causes a runaway rollback in recovery.
Fixed a bug present in 3.23.52, 4.0.3, 4.0.4: InnoDB startup could take very long or even crash on some Win 95/98/ME computers.
Fixed a bug: fast shutdown (which is the default) sometimes was slowed down by purge and insert buffer merge.
Fixed a bug: doing a big SELECT from a table where no rows were visible in a consistent read could cause a very long (> 600 seconds) semaphore wait in btr0cur.c line 310.
Fixed a bug: the AUTO-INC lock was held to the end of the transaction if it was granted after a lock wait. This could cause unnecessary deadlocks.
Fixed a bug: if you created a temporary table inside LOCK TABLES, and used that temporary table, that caused an assertion failure in ha_innobase.cc.
Fixed a bug: if SHOW INNODB STATUS, innodb_monitor, or innodb_lock_monitor had to print several hundred transactions in one report, and the output became truncated, InnoDB would hang, printing to the error log many waits for a mutex created at srv0srv.c, line 1621.
Fixed a bug: SHOW INNODB STATUS on Unix always reported average file read size as 0 bytes.

7.5.14.14 MySQL/InnoDB-4.0.4, October 2, 2002

We again use unbuffered disk i/o in Windows. Win XP and Win 2000 read performance seems to be very poor with normal i/o.
Increased the max key length of InnoDB tables from 500 to 1024 bytes.
Increased the table comment field in SHOW TABLE STATUS so that up to 16000 characters of foreign key definitions can be printed there.
The auto-increment counter is no longer incremented if an insert of a row immediately fails in an error.
Allow dropping and creating a table even if innodb_force_recovery is set. One can use this to drop a table which would cause a crash in rollback or purge, or if a failed table import causes a runaway rollback in recovery.
Fixed a bug: Using ORDER BY primarykey DESC in 4.0.3 causes an assertion failure in btr0pcur.c, line 203.
Fixed a bug: fast shutdown (which is the default) sometimes was slowed down by purge and insert buffer merge.
Fixed a bug: doing a big SELECT from a table where no rows were visible in a consistent read could cause a very long (> 600 seconds) semaphore wait in btr0cur.c line 310.
Fixed a bug: if the MySQL query cache was used, it did not get invalidated by a modification done by ON DELETE CASCADE or ...SET NULL.
Fixed a bug: if you created a temporary table inside LOCK TABLES, and used that temporary table, that caused an assertion failure in ha_innodb.cc.
Fixed a bug: if you set innodb_flush_log_at_trx_commit to 1, SHOW VARIABLES would show its value as 16 million.

7.5.14.15 MySQL/InnoDB-4.0.3, August 28, 2002

Removed unnecessary deadlocks when inserts have to wait for a locking read, update, or delete to release its next-key lock.
The MySQL HANDLER SQL commands now work also for InnoDB type tables. InnoDB does the HANDLER reads always as consistent reads. HANDLER is a direct access path to read individual indexes of tables. In some cases HANDLER can be used as a substitute of server-side cursors.
Fixed a bug in 4.0.2: even a simple insert could crash the AIX version.
Fixed a bug: if you used in a table name characters whose code is > 127, in DROP TABLE InnoDB could assert on line 155 of pars0sym.c.
Compilation from source now provides a working version both on HP-UX-11 and HP-UX-10.20. The source of 4.0.2 worked only on 11, and the source of 3.23.52 only on 10.20.
Fixed a bug: if compiled on 64-bit Solaris, InnoDB produced a bus error at startup.

7.5.14.16 MySQL/InnoDB-3.23.52, August 16, 2002

The feature set of 3.23 will be frozen from this version on. New features will go the 4.0 branch, and only bug fixes will be made to the 3.23 branch.
Many CPU-bound join queries now run faster. On Windows also many other CPU-bound queries run faster.
A new SQL command SHOW INNODB STATUS returns the output of the InnoDB Monitor to the client. The InnoDB Monitor now prints detailed info on the latest detected deadlock.
InnoDB made the SQL query optimizer to avoid too much index-only range scans and choose full table scans instead. This is now fixed.
"BEGIN" and "COMMIT" are now added in the binlog around transactions. The MySQL replication now respects transaction borders: a user will no longer see half transactions in replication slaves.
A replication slave now prints in crash recovery the last master binlog position it was able to recover to.
A new setting innodb_flush_log_at_trx_commit=2 makes InnoDB to write the log to the operating system file cache at each commit. This is almost as fast as the setting innodb_flush_log_at_trx_commit=0, and the setting 2 also has the nice feature that in a crash where the operating system does not crash, no committed transaction is lost. If the operating system crashes or there is a power outage, then the setting 2 is no safer than the setting 0.
Added checksum fields to log blocks.
SET FOREIGN_KEY_CHECKS=0 helps in importing tables in an arbitrary order which does not respect the foreign key rules.
SET UNIQUE_CHECKS=0 speeds up table imports into InnoDB if you have UNIQUE constraints on secondary indexes.
SHOW TABLE STATUS now lists also possible ON DELETE CASCADE or ON DELETE SET NULL in the comment field of the table.
When CHECK TABLE is run on any InnoDB type table, it now checks also the adaptive hash index for all tables.
If you defined ON DELETE CASCADE or SET NULL and updated the referenced key in the parent row, InnoDB deleted or updated the child row. This is now changed to conform to SQL-92: you get the error 'Cannot delete parent row'.
Improved the auto-increment algorithm: now the first insert or SHOW TABLE STATUS initializes the auto-inc counter for the table. This removes almost all surprising deadlocks caused by SHOW TABLE STATUS.
Aligned some buffers used in reading and writing to data files. This allows using unbuffered raw devices as data files in Linux.
Fixed a bug: If you updated the primary key of a table so that only the case of characters changed, that could cause assertion failures, mostly in page0page.ic line 515.
Fixed a bug: If you delete or update a row referenced in a foreign key constraint and the foreign key check has to wait for a lock, then the check may report an erroneous result. This affects also the ON DELETE... operation.
Fixed a bug: A deadlock or a lock wait timeout error in InnoDB causes InnoDB to roll back the whole transaction, but MySQL could still write the earlier SQL statements to the binlog, even though InnoDB rolled them back. This could, for example, cause replicated databases to get out-of-sync.
Fixed a bug: If the database happened to crash in the middle of a commit, then the recovery might leak tablespace pages.
Fixed a bug: If you specified a non-latin1 character set in my.cnf, then, in contrary to what is stated in the manual, in a foreign key constraint a string type column had to have the same length specification in the referencing table and the referenced table.
Fixed a bug: DROP TABLE or DROP DATABASE could fail if there simultaneously was a CREATE TABLE running.
Fixed a bug: If you configured the buffer pool bigger than 2 GB in a 32-bit computer, InnoDB would assert in buf0buf.ic line 214.
Fixed a bug: on 64-bit computers updating rows which contained the SQL NULL in some column could cause the undo log and the ordinary log to become corrupt.
Fixed a bug: innodb_log_monitor caused a hang if it suppressed lock prints for a page.
Fixed a bug: in the HP-UX-10.20 version mutexes would leak and cause race conditions and crashes in any part of InnoDB code.
Fixed a bug: if you ran in the AUTOCOMMIT mode, executed a SELECT, and immediately after that a RENAME TABLE, then RENAME would fail and MySQL would complain about error 1192.
Fixed a bug: if compiled on 64-bit Solaris, InnoDB produced a bus error at startup.

7.5.14.17 MySQL/InnoDB-4.0.2, July 10, 2002

InnoDB is essentially the same as InnoDB-3.23.51.
If no innodb_data_file_path is specified, InnoDB at the database creation now creates a 10 MB auto-extending data file ibdata1 to the datadir of MySQL. In 4.0.1 the file was 64 MB and not auto-extending.

7.5.14.18 MySQL/InnoDB-3.23.51, June 12, 2002

Fixed a bug: a join could result in a seg fault in copying of a BLOB or TEXT column if some of the BLOB or TEXT columns in the table contained SQL NULL values.
Fixed a bug: if you added self-referential foreign key constraints with ON DELETE CASCADE to tables and a row deletion caused InnoDB to attempt the deletion of the same row twice because of a cascading delete, then you got an assertion failure.
Fixed a bug: if you use MySQL 'user level locks' and close a connection, then InnoDB may assert in ha_innobase.cc, line 302.

7.5.14.19 MySQL/InnoDB-3.23.50, April 23, 2002

InnoDB now supports an auto-extending last data file. You do not need to preallocate the whole data file at the database startup.
Made several changes to facilitate the use of the InnoDB Hot Backup tool. It is a separate non-free tool you can use to take online backups of your database without shutting down the server or setting any locks.
If you want to run the InnoDB Hot Backup tool on an auto-extending data file you have to upgrade it to version ibbackup-0.35.
The log scan phase in crash recovery will now run much faster.
Starting from this server version, the hot backup tool truncates unused ends in the backup InnoDB data files.
To allow the hot backup tool to work, on Windows we no longer use unbuffered i/o or native async i/o; instead we use the same simulated async i/o as on Unix.
You can now define the ON DELETE CASCADE or ON DELETE SET NULL clause on foreign keys.
FOREIGN KEY constraints now survive ALTER TABLE and CREATE INDEX.
We suppress the FOREIGN KEY check if any of the column values in the foreign key or referenced key to be checked is the SQL NULL. This is compatible with Oracle, for example.
SHOW CREATE TABLE now lists also foreign key constraints. Also mysqldump no longer forgets about foreign keys in table definitions.
You can now add a new foreign key constraint with ALTER TABLE ... ADD CONSTRAINT FOREIGN KEY (...) REFERENCES ... (...).
FOREIGN KEY definitions now allow backquotes around table and column names.
MySQL command SET TRANSACTION ISOLATION LEVEL ... has now the following effect on InnoDB tables: if a transaction is defined as SERIALIZABLE then InnoDB conceptually adds LOCK IN SHARE MODE to all consistent reads. If a transaction is defined to have any other isolation level, then InnoDB obeys its default locking strategy which is REPEATABLE READ.
SHOW TABLE STATUS no longer sets an x-lock at the end of an auto-increment index if the auto-increment counter has already been initialized. This removes in almost all cases the surprising deadlocks caused by SHOW TABLE STATUS.
Fixed a bug: in a CREATE TABLE statement the string 'foreign' followed by a non-space character confused the FOREIGN KEY parser and caused table creation to fail with errno 150.

7.5.14.20 MySQL/InnoDB-3.23.49, February 17, 2002

Fixed a bug: if you called DROP DATABASE for a database on which there simultaneously were running queries, the MySQL server could crash or hang. Crashes fixed, but a full fix has to wait some changes in the MySQL layer of code.
Fixed a bug: on Windows one had to put the database name in lower case for DROP DATABASE to work. Fixed in 3.23.49: case no longer matters on Windows. On Unix the database name remains case-sensitive.
Fixed a bug: if one defined a non-latin1 character set as the default character set, then definition of foreign key constraints could fail in an assertion failure in dict0crea.c, reporting an internal error 17.

7.5.14.21 MySQL/InnoDB-3.23.48, February 9, 2002

Tuned the SQL optimizer to favor more often index searches over table scans.
Fixed a performance problem when several large SELECT queries are run concurrently on a multiprocessor Linux computer. Large CPU-bound SELECT queries will now also generally run faster on all platforms.
If MySQL binlogging is used, InnoDB now prints after crash recovery the latest MySQL binlog file name and the position in that file (= byte offset) InnoDB was able to recover to. This is useful, for example, when resynchronizing a master and a slave database in replication.
Added better error messages to help in installation problems.
One can now recover also MySQL temporary tables which have become orphaned inside the InnoDB tablespace.
InnoDB now prevents a FOREIGN KEY declaration where the signedness is not the same in the referencing and referenced integer columns.
Fixed a bug: calling SHOW CREATE TABLE or SHOW TABLE STATUS could cause memory corruption and make mysqld to crash. Especially at risk was mysqldump, because it calls frequently SHOW CREATE TABLE.
Fixed a bug: if on Unix you did an ALTER TABLE to an InnoDB table and simultaneously did queries to it, mysqld could crash with an assertion failure in row0row.c, line 474.
Fixed a bug: if inserts to several tables containing an auto-inc column were wrapped inside one LOCK TABLES, InnoDB asserted in lock0lock.c.
In 3.23.47 we allowed several NULLS in a UNIQUE secondary index. But CHECK TABLE was not relaxed: it reports the table as corrupt. CHECK TABLE no longer complains in this situation.
Fixed a bug: on Sparc and other high-endian processors SHOW VARIABLES showed innodb_flush_log_at_trx_commit and other boolean-valued startup parameters always OFF even if they were switched on.
Fixed a bug: if you ran mysqld-max-nt as a service on Windows NT/2000, the service shutdown did not always wait long enough for the InnoDB shutdown to finish.

7.5.14.22 MySQL/InnoDB-3.23.47, December 28, 2001

Recovery happens now faster, especially in a lightly loaded system, because background checkpointing has been made more frequent.
InnoDB allows now several similar key values in a UNIQUE secondary index if those values contain SQL NULLs. Thus the convention is now the same as in MyISAM tables.
InnoDB gives a better row count estimate for a table which contains BLOBs.
In a FOREIGN KEY constraint InnoDB is now case-insensitive to column names, and in Windows also to table names.
InnoDB allows a FOREIGN KEY column of CHAR type to refer to a column of VARCHAR type, and vice versa. MySQL silently changes the type of some columns between CHAR and VARCHAR, and these silent changes do not hinder FOREIGN KEY declaration any more.
Recovery has been made more resilient to corruption of log files.
Unnecessary statistics calculation has been removed from queries which generate a temporary table. Some ORDER BY and DISTINCT queries will now run much faster.
MySQL now knows that the table scan of an InnoDB table is done through the primary key. This will save a sort in some ORDER BY queries.
The maximum key length of InnoDB tables is again restricted to 500 bytes. The MySQL interpreter is not able to handle longer keys.
The default value of innodb_lock_wait_timeout was changed from infinite to 50 seconds, the default value of innodb_file_io_threads from 9 to 4.

7.5.14.23 MySQL/InnoDB-4.0.1, December 23, 2001

InnoDB is the same as in 3.23.47.
In 4.0.0 the MySQL interpreter did not know the syntax LOCK IN SHARE MODE. This has been fixed.
In 4.0.0 multi-table delete did not work for transactional tables. This has been fixed.

7.5.14.24 MySQL/InnoDB-3.23.46, November 30, 2001

This is the same as 3.23.45.

7.5.14.25 MySQL/InnoDB-3.23.45, November 23, 2001

This is a bugfix release.
In versions 3.23.42-.44 when creating a table on Windows you have to use lower case letters in the database name to be able to access the table. Fixed in 3.23.45.
InnoDB now flushes stdout and stderr every 10 seconds: if these are redirected to files, the file contents can be better viewed with an editor.
Fixed an assertion failure in .44, in trx0trx.c, line 178 when you drop a table which has the .frm file but does not exist inside InnoDB.
Fixed a bug in the insert buffer. The insert buffer tree could get into an inconsistent state, causing a crash, and also crashing the recovery. This bug could appear especially in large table imports or alterations.
Fixed a bug in recovery: InnoDB could go into an infinite loop constantly printing a warning message that it cannot find free blocks from the buffer pool.
Fixed a bug: when you created a temporary table of the InnoDB type, and then used ALTER TABLE to it, the MySQL server could crash.
Prevented creation of MySQL system tables 'mysql.user', 'mysql.host', or 'mysql.db', in the InnoDB type.
Fixed a bug which can cause an assertion failure in 3.23.44 in srv0srv.c, line 1728.

7.5.14.26 MySQL/InnoDB-3.23.44, November 2, 2001

You can define foreign key constraints on InnoDB tables. An example: FOREIGN KEY (col1) REFERENCES table2(col2).
You can create > 4 GB data files in those file systems that allow it.
Improved InnoDB monitors, including a new innodb_table_monitor which allows you to print the contents of the InnoDB internal data dictionary.
DROP DATABASE will now work also for InnoDB tables.
Accent characters in the default character set latin1 will be ordered according to the MySQL ordering.<br> NOTE: if you are using latin1 and have inserted characters whose code is > 127 to an indexed CHAR column, you should run CHECK TABLE on your table when you upgrade to 3.23.43, and drop and reimport the table if CHECK TABLE reports an error!
InnoDB will calculate better table cardinality estimates.
Change in deadlock resolution: in .43 a deadlock rolls back only the SQL statement, in .44 it will roll back the whole transaction.
Deadlock, lock wait timeout, and foreign key constraint violations (no parent row, child rows exist) now return native MySQL error codes 1213, 1205, 1216, 1217, respectively.
A new my.cnf parameter innodb_thread_concurrency helps in performance tuning in high concurrency environments.
A new my.cnf option innodb_force_recovery will help you in dumping tables from a corrupted database.
A new my.cnf option innodb_fast_shutdown will speed up shutdown. Normally InnoDB does a full purge and an insert buffer merge at shutdown.
Raised maximum key length to 7000 bytes from a previous limit of 500 bytes.
Fixed a bug in replication of auto-inc columns with multiline inserts.
Fixed a bug when the case of letters changes in an update of an indexed secondary column.
Fixed a hang when there are > 24 data files.
Fixed a crash when MAX(col) is selected from an empty table, and col is a not the first column in a multi-column index.
Fixed a bug in purge which could cause crashes.

7.5.14.27 MySQL/InnoDB-3.23.43, October 4, 2001

This is essentially the same as InnoDB-3.23.42.

7.5.14.28 MySQL/InnoDB-3.23.42, September 9, 2001

Fixed a bug which corrupted the table if the primary key of a > 8000-byte row was updated.
There are now 3 types of InnoDB Monitors: innodb_monitor, innodb_lock_monitor, and innodb_tablespace_monitor. innodb_monitor now prints also buffer pool hit rate and the total number of rows inserted, updated, deleted, read.
Fixed a bug in RENAME TABLE.
Fixed a bug in replication with an auto-increment column.

7.5.14.29 MySQL/InnoDB-3.23.41, August 13, 2001

Support for < 4 GB rows. The previous limit was 8000 bytes.
Use the doublewrite file flush method.
Raw disk partitions supported as data files.
InnoDB Monitor.
Several hang bugs fixed and an ORDER BY bug ('Sort aborted') fixed.

7.5.14.30 MySQL/InnoDB-3.23.40, July 16, 2001

Only a few rare bugs fixed.

7.5.14.31 MySQL/InnoDB-3.23.39, June 13, 2001

CHECK TABLE now works for InnoDB tables.
A new my.cnf parameter innodb_unix_file_flush_method introduced. It can be used to tune disk write performance.
An auto-increment column now gets new values past the transaction mechanism. This saves CPU time and eliminates transaction deadlocks in new value assignment.
Several bug fixes, most notably the rollback bug in 3.23.38.

7.5.14.32 MySQL/InnoDB-3.23.38, May 12, 2001

The new syntax SELECT ... LOCK IN SHARE MODE is introduced.
InnoDB now calls fsync after every disk write and calculates a checksum for every database page it writes or reads, which will reveal disk defects.
Several bug fixes.

7.5.15 InnoDB Contact Information

Contact information of Innobase Oy, producer of the InnoDB engine. Web site: http://www.innodb.com/. E-mail: Heikki.Tuuri@innodb.com

phone: 358-9-6969 3250 (office) 358-40-5617367 (mobile)
Innobase Oy Inc.
World Trade Center Helsinki
Aleksanterinkatu 17
P.O.Box 800
00101 Helsinki
Finland

7.6 `BDB` or `BerkeleyDB` Tables

7.6.1 Overview of `BDB` Tables

BerkeleyDB, available at http://www.sleepycat.com/ has provided MySQL with a transactional storage engine. Support for this storage engine is included in the MySQL source distribution starting from version 3.23.34 and is activated in the MySQL-Max binary. This storage engine is typically called BDB for short.

BDB tables may have a greater chance of surviving crashes and are also capable of COMMIT and ROLLBACK operations on transactions. The MySQL source distribution comes with a BDB distribution that has a couple of small patches to make it work more smoothly with MySQL. You can't use a non-patched BDB version with MySQL.

We at MySQL AB are working in close cooperation with Sleepycat to keep the quality of the MySQL/BDB interface high.

When it comes to supporting BDB tables, we are committed to help our users to locate the problem and help creating a reproducible test case for any problems involving BDB tables. Any such test case will be forwarded to Sleepycat who in turn will help us find and fix the problem. As this is a two-stage operation, any problems with BDB tables may take a little longer for us to fix than for other storage engines. However, as the BerkeleyDB code itself has been used by many other applications than MySQL, we don't envision any big problems with this. 「1.4.1 Support Offered by MySQL AB」節参照.

7.6.2 Installing `BDB`

If you have downloaded a binary version of MySQL that includes support for BerkeleyDB, simply follow the instructions for installing a binary version of MySQL. 「2.2.10 Installing a MySQL Binary Distribution」節参照. 「4.7.5 mysqld-max, An Extended mysqld Server」節参照.

To compile MySQL with Berkeley DB support, download MySQL Version 3.23.34 or newer and configure MySQL with the --with-berkeley-db option. 「2.3 MySQL ソースディストリビューションのインストール」節参照.

cd /path/to/source/of/mysql-3.23.34
./configure --with-berkeley-db

Please refer to the manual provided with the BDB distribution for more updated information.

Even though Berkeley DB is in itself very tested and reliable, the MySQL interface is still considered beta quality. We are actively improving and optimising it to get it stable very soon.

7.6.3 `BDB` startup options

If you are running with AUTOCOMMIT=0 then your changes in BDB tables will not be updated until you execute COMMIT. Instead of commit you can execute ROLLBACK to forget your changes. 「6.7.1 BEGIN/COMMIT/ROLLBACK 構文」節参照.

If you are running with AUTOCOMMIT=1 (the default), your changes will be committed immediately. You can start an extended transaction with the BEGIN WORK SQL command, after which your changes will not be committed until you execute COMMIT (or decide to ROLLBACK the changes).

The following options to mysqld can be used to change the behaviour of BDB tables:

Option	Description
`--bdb-home=directory`	Base directory for `BDB` tables. This should be the same directory you use for `--datadir`.
`--bdb-lock-detect=#`	Berkeley lock detect. One of (`DEFAULT`, `OLDEST`, `RANDOM`, or `YOUNGEST`).
`--bdb-logdir=directory`	Berkeley DB log file directory.
`--bdb-no-sync`	Don't synchronously flush logs.
`--bdb-no-recover`	Don't start Berkeley DB in recover mode.
`--bdb-shared-data`	Start Berkeley DB in multi-process mode (Don't use `DB_PRIVATE` when initialising Berkeley DB)
`--bdb-tmpdir=directory`	Berkeley DB temporary file directory.
`--skip-bdb`	Disable usage of `BDB` tables.
`-O bdb_max_lock=1000`	Set the maximum number of locks possible. 「4.5.6.4 `SHOW VARIABLES`」節参照.

If you use --skip-bdb, MySQL will not initialise the Berkeley DB library and this will save a lot of memory. Of course, you cannot use BDB tables if you are using this option. If you try to create a BDB table, MySQL will instead create a MyISAM table.

Normally you should start mysqld without --bdb-no-recover if you intend to use BDB tables. This may, however, give you problems when you try to start mysqld if the BDB log files are corrupted. 「2.4.2 MySQL サーバー起動時の問題」節参照.

With bdb_max_lock you can specify the maximum number of locks (10000 by default) you can have active on a BDB table. You should increase this if you get errors of type bdb: Lock table is out of available locks or Got error 12 from ... when you have do long transactions or when mysqld has to examine a lot of rows to calculate the query.

You may also want to change binlog_cache_size and max_binlog_cache_size if you are using big multi-line transactions. 「6.7.1 BEGIN/COMMIT/ROLLBACK 構文」節参照.

7.6.4 Characteristics of `BDB` tables:

To be able to rollback transactions, the BDB storage engine maintains log files. For maximum performance you should place these on another disk than your databases by using the --bdb-logdir option.
MySQL performs a checkpoint each time a new BDB log file is started, and removes any log files that are not needed for current transactions. One can also run FLUSH LOGS at any time to checkpoint the Berkeley DB tables. For disaster recovery, one should use table backups plus MySQL's binary log. 「4.4.1 データベースのバックアップ」節参照. Warning: If you delete old log files that are in use, BDB will not be able to do recovery at all and you may lose data if something goes wrong.
MySQL requires a PRIMARY KEY in each BDB table to be able to refer to previously read rows. If you don't create one, MySQL will create an maintain a hidden PRIMARY KEY for you. The hidden key has a length of 5 bytes and is incremented for each insert attempt.
If all columns you access in a BDB table are part of the same index or part of the primary key, then MySQL can execute the query without having to access the actual row. In a MyISAM table the above holds only if the columns are part of the same index.
The PRIMARY KEY will be faster than any other key, as the PRIMARY KEY is stored together with the row data. As the other keys are stored as the key data + the PRIMARY KEY, it's important to keep the PRIMARY KEY as short as possible to save disk and get better speed.
LOCK TABLES works on BDB tables as with other tables. If you don't use LOCK TABLE, MySQL will issue an internal multiple-write lock on the table to ensure that the table will be properly locked if another thread issues a table lock.
Internal locking in BDB tables is done on page level.
SELECT COUNT(*) FROM table_name is slow as BDB tables doesn't maintain a count of the number of rows in the table.
Sequential scanning is slower than with MyISAM tables as the data in BDB tables stored in B-trees and not in a separate datafile.
The application must always be prepared to handle cases where any change of a BDB table may make an automatic rollback and any read may fail with a deadlock error.
Keys are not prefix or suffix-compressed like keys in MyISAM tables. In other words, the key information will take a little more space in BDB tables compared to MyISAM tables.
There are often holes in the BDB table to allow you to insert new rows in the middle of the key tree. This makes BDB tables somewhat larger than MyISAM tables.
The optimiser needs to know an approximation of the number of rows in the table. MySQL solves this by counting inserts and maintaining this in a separate segment in each BDB table. If you don't issue a lot of DELETE or ROLLBACK statements, this number should be accurate enough for the MySQL optimiser, but as MySQL only stores the number on close, it may be incorrect if MySQL dies unexpectedly. It should not be fatal even if this number is not 100% correct. One can update the number of rows by executing ANALYZE TABLE or OPTIMIZE TABLE. 「4.5.2 ANALYZE TABLE Syntax」節参照 . 「4.5.1 OPTIMIZE TABLE 構文」節参照.
If you get full disk with a BDB table, you will get an error (probably error 28) and the transaction should roll back. This is in contrast with MyISAM and ISAM tables where mysqld will wait for enough free disk before continuing.

7.6.5 Things we need to fix for `BDB` in the near future:

It's very slow to open many BDB tables at the same time. If you are going to use BDB tables, you should not have a very big table cache (like >256) and you should use --no-auto-rehash with the mysql client. We plan to partly fix this in 4.0.
SHOW TABLE STATUS doesn't yet provide that much information for BDB tables.
Optimise performance.
Change to not use page locks at all when we are scanning tables.

7.6.6 Operating systems supported by `BDB`

Currently we know that the BDB storage engine works with the following operating systems:

Linux 2.x Intel
Solaris SPARC
Caldera (SCO) OpenServer
Caldera (SCO) UnixWare 7.0.1

It doesn't work with the following operating systems:

Linux 2.x Alpha
Max OS X

Note: The above list is not complete; we will update it as we receive more information.

If you build MySQL with support for BDB tables and get the following error in the log file when you start mysqld:

bdb: architecture lacks fast mutexes: applications cannot be threaded
Can't init dtabases

This means that BDB tables are not supported for your architecture. In this case you must rebuild MySQL without BDB table support.

7.6.7 Restrictions on `BDB` Tables

Here follows the restrictions you have when using BDB tables:

BDB tables store in the `.db' file the path to the file as it was created. (This was done to be able to detect locks in a multi-user environment that supports symlinks). The effect of this is that BDB tables are not movable between directories!
When taking backups of BDB tables, you have to either use mysqldump or take a backup of all table_name.db files and the BDB log files. The BDB log files are the files in the base data directory named log.XXXXXXXXXX (ten digits); The BDB storage engine stores unfinished transactions in the log files and requires these logs to be present when mysqld starts.

7.6.8 Errors That May Occur When Using `BDB` Tables

If you get the following error in the hostname.err log when starting mysqld:
```
bdb:  Ignoring log file: .../log.XXXXXXXXXX: unsupported log version #
```
it means that the new BDB version doesn't support the old log file format. In this case you have to delete all BDB logs from your database directory (the files with names that have the format log.XXXXXXXXXX) and restart mysqld. We would also recommend you to do a mysqldump --opt of your old BDB tables, delete the old tables, and restore the dump.
If you are not running in auto-commit mode and delete a table that is referenced in another transaction, you may get the following error messages in your MySQL error log:
```
001119 23:43:56  bdb:  Missing log fileid entry
001119 23:43:56  bdb:  txn_abort: Log undo failed for LSN:
                       1 3644744: Invalid
```
This is not fatal but we don't recommend that you delete tables if you are not in auto-commit mode, until this problem is fixed (the fix is not trivial).

Go to the first, previous, next, last section, table of contents.