[待] mysql 函数

时间函数

名称 介绍
FROM_UNIXTIME -
DATE_FORMAT -

字串

名称 介绍
LENGTH -
SOUNDEX -

数学

名称 介绍
RAND() -
ENCRYPT() -

原文地址 : [待] mysql 函数
本站是作者语雀文档的镜像站, 如对文章有任何疑问请移步语雀进行 提问

my.cnf 参数调优

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[mysqld]
# bin log
log_bin=mysql-bin

# charset
init-connect='SET NAMES utf8mb4'
character-set-server=utf8mb4

skip-name-resolve
back_log=300

max_connections=1258
max_connect_errors=6000
open_files_limit=65535
table_open_cache=1024
max_allowed_packet=500M
binlog_cache_size=1M
max_heap_table_size=8M
tmp_table_size=128M

参数说明

数据库属于IO密集型的应用程序,主要职责是数据的管理和存储工作。从内存中读取数据的速度远远大于从磁盘上读取数据,所以优化数据库就是尽可能的将磁盘IO转化为内存IO。接下来介绍的是IO相关参数(缓存参数):

skip-name-resolve:

如果启用此参数, mysql日志显示[Warning] IP address ‘xxxx’ could not be resolved: Name or service not known,那是因为mysql默认会反向解析DNS,对于访问者Mysql不会判断是hosts还是ip都会进行dns反向解析,频繁地查询数据库和权限检查,这大大增加了数据库的压力,导致数据库连接缓慢,严重的时候甚至死机,出现“连接数据库时出错”等字样。

back_log :

back_log 指MySQL暂时停止响应新的请求之前,多少个请求可以被存在堆栈中。(如果MySQL的连接达到max_connection后,新来的请求将会被存在堆栈中,以等待某一连接释放资源)

注:back_log超过TCP/IP连接的侦听队列的大小则无效,查看命令:

cat /proc/sys/net/ipv4/tcp_max_syn_backlog

wait_timeout(单位:秒) :

mysql客户端与服务端的连接最长闲置时间,到达后将会被强行关闭。MySQL默认的wait_timeout为8个小时。

interactive_timeout:服务器关闭交互式连接前等待的秒数。

wait_timeout:服务器关闭非交互连接之前等待活动的秒数。(心跳?)

这两个参数必须配合使用。单独设置wait_timeout则无效。

max_connections :

max_connections是指MySQL的最大连接数,如果服务器的并发连接请求量比较大,建议调高此值,以增加并发连接数据量。

MySQL服务器允许的最大连接数量: 16384

max_user_connection:

max_user_connections:是指每个数据库用户的最大连接。

默认值0:表示不受限制

max_used_connections:指MySQL服务启动到此刻,同一时刻并行连接的最大值。查看方式:

show status like “max_used_connections”

thread_concurrency:

thread_concurrency 对MySQL的性能影响很大,错误的设置thread_concurrency的值会导致MySQL不能充分利用CPU。

thread_concurrency应设为CPU核数的2倍。比如,2个双核的CPU,thread_concurrency的值应为2_2_2=8

全局缓存

key_buffer_size :

key_buffer_size是用于索引块的缓冲区大小,调整该值的大小能够更好的处理索引(对所有读和多重写),严格来说它决定了数据库索引的处理速度,尤其是读索引块的速度

检查状态值:Key_read_requests和Key_read的比值,检查该值设置是否合理, Key_read:Key_read_requests应尽可能小

MySQL > show status like “Key_read%”;

+——————-+————-+

| Variable_name     | Value       |

+——————-+————-+

| Key_read_requests | 35072256038 |

| Key_reads         | 593121872   |

+——————-+————-+

一般未命中率在0.1以下比较好

innodb_buffer_pool_size :

缓冲池是数据和索引缓存的地方。

检查状态值:Innodb_buffer_pool_read_requests 和Innodb_buffer_pool_reads 可计算出该值设置的是否合理, (Innodb_buffer_pool_read_requests – Innodb_buffer_pool_reads) / Innodb_buffer_pool_read_requests * 100% 计算缓存命中率,缓存命中率越高越好

mysql> show status like “innodb_buffer_pool_r%”;

+—————————————+————-+

| Variable_name                         | Value       |

+—————————————+————-+

| Innodb_buffer_pool_read_requests      | 10585231877 |

| Innodb_buffer_pool_reads              | 2083477     |

+—————————————+————-+

2 rows in set (0.01 sec)

innodb_additional_mem_pool_size:

设置了InnoDB存储引擎用来存放数据字典信息以及一些内部数据的内存空间大小

查看MySQL的 error 日志,如果发现很多warning警告就需要调大该值

innodb_log_buffer_size:

innodb_log_buffer_size:设置InnoDB将日志写入日志磁盘文件前的缓冲大小。大的日志缓冲允许事务运行时不需要将日志保存到磁盘直到事务被提交(commit)。大的日志缓冲可以减少磁盘I/O,默认值为8M,一般来说不建议超过32M

注:innodb_flush_log_trx_commit参数对innodb log的写入性能有非常关键的影响,默认值为1。

1
2
3
0:log buffer中的数据将以每秒一次的频率写入到log file中,且同时会进行文件系统到磁盘的同步操作,但是每个事务的commit并不会触发任何log buffer 到log file的刷新或者文件系统到磁盘的刷新操作;
1:在每次事务提交的时候将log buffer 中的数据都会写入到log file,同时也会触发文件系统到磁盘的同步;
2:事务提交会触发log buffer到log file的刷新,但并不会触发磁盘文件系统到磁盘的同步。此外,每秒会有一次文件系统到磁盘同步操作。

query_cache_size

query_cache_size主要用来缓存mysql的结果集,所以仅针对于select语句。致命缺陷是当某个表的数据有任何变化,都会导致所有引用该表的select语句在query cache中的缓存数据失效。当数据变化非常多的情况下,不建议使用。

query_cache_size和query_cache_type需要配合使用,query_cache_type设置在何种场景下使用,0(off)表示完全不使用,1(on)除显示要求不使用外均使用,2(demond)只有显示要求才使用

局部缓存

MySQL还为每个连接设置连接缓存,每个连接包含线程堆栈,网络缓存等,默认的大小为256K 。事务开始之后,则需要更多的空间。运行较小的查询可能仅给指定的线程增加少量的内存消耗,然而如果对数据表做复杂的操作例如全表扫描、排序或者临时表,则需要分配read_buffer_size、sort_buffer_size、read_rnd_buffer_size、tmp_table_size 大小的内存空间,不过它们是在需要的时候分配,并且在操作完成之后就释放了。tmp_table_size 可以达到MySQL所能分配给这个操作的最大内存空间。

read__buffer_size :

read_buffer_size是MySQL读入缓存区大小。对表进行顺序扫描的请求将分配一个读入缓冲区。

sort_buffer_size

sort_buffer_size是MySQL执行排序使用的缓冲大小

read_rnd_buffer_size

read_rnd_buffer_size是MySQL随机读缓冲区大小。

tmp_table_size

tmp_table_size是MySQL的heap(堆)表缓冲大小。需要与max_heap_table_size参数一起调整。

record_buffer

record_buffer每个进行顺序扫描的线程为其扫描的表分配一个缓冲区。如果做了很多顺序扫描,可以显示的改变该值

其他缓存

table_open_cache

mysql每打开一个表,都会读入一些数据到table_open_cache缓存中,当mysql在这个缓存中找不到相应的信息时,才会去磁盘读取

注:如果table_open_cache设置很大时,如果系统处理不了那么多文件描述符,那么就会出现客户端失效,连接不上。

mysql> show status like ‘Open%tables’;

+—————+——-+

| Variable_name | Value |

+—————+——-+

| Open_tables   | 0     |

| Opened_tables | 0     |

+—————+——-+

2 rows in set (0.00 sec)

open_tables表示当前打开的表缓存数,如果执行flush tables操作,则此系统会关闭一些当前没有使用的表缓存而使得此状态值减小;

opend_tables表示曾经打开的表缓存数,会一直进行累加,如果执行flush tables操作,值不会减小。

thread_cache_size

默认值8,表示可以重新利用保存在缓存中线程的数量,当断开连接时如果缓存中还有空间,那么客户端的线程将被放在缓存中。如果线程重新被请求,那么请求将从缓存中读取,如果缓存中是空或者是新的请求那么这个线程将被重新创建,如果很多新的线程增加这个值可以改善系统的性能,根据物理内存设置如下规则:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# 1G  —> 8  


# 2G —> 16


# 3G —> 32


# 大于3G —> 64

mysql> show status like 'thread%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| Threads_cached | 12 | <-当前被缓存的空闲线程的数量
| Threads_connected | 11 | <-处于连接状态的线程
| Threads_created | 38 | <-服务启动以来,建立了多少线程
| Threads_running | 1 | <-正使用中的线程(查询数据,传输数据等)
+-------------------+-------+
4 rows in set (0.00 sec)


mysql> show status like '%connection%';
+-----------------------------------+-------+
| Variable_name | Value |
+-----------------------------------+-------+
| Connections | 52997 | ->服务启动以来,历史连接数
| Max_used_connections | 29 |
+-----------------------------------+-------+
2 rows in set (0.01 sec)

通过以下公式判断设置值是否合适?命中率超过90%,设置合理
(Connections - Treads_created) / Connections * 100%

参考


原文地址 : my.cnf 参数调优
本站是作者语雀文档的镜像站, 如对文章有任何疑问请移步语雀进行 提问

mysql - 全文检索

全文检索在 MySQL 中就是一个 FULLTEXT 类型索引。FULLTEXT 索引用于   MyISAM 表(mysql 5.6+ 可以用于 innodb),可以在 CREATE TABLE 时或之后使用 ALTER TABLE 或 CREATE INDEX 在 CHAR、 VARCHAR 或 TEXT 列上创建。对于大的数据库,将数据装载到一个没有 FULLTEXT 索引的表中,然后再使用 ALTER TABLE   (或 CREATE INDEX) 创建索引,这将是非常快的。将数据装载到一个已经有 FULLTEXT 索引的表中,将是非常慢的。

1.使用Mysql全文检索fulltext的先决条件

表的类型必须是MyISAM 或者 mysql 5.6+ 使用 innodb

建立全文检索的字段类型必须是char,varchar,text

2.建立全文检索先期配置

由于Mysql的默认配置是索引的词的长度是4,所以要支持中文单字的话,首先更改这个.

*Unix 用户要修改 my.cnf,一般此文件在 /etc/my.cnf,如果没有找到,先查找一下find / -name 'my.cnf'

在[mysqld] 位置内加入:

阅读更多

[mysql] 13.9 Full-Text 搜索函数

1
2
3
4
5
6
7
8
9
[MATCH (col1,col2,...) AGAINST (expr [search_modifier])]

search_modifier:
{
IN NATURAL LANGUAGE MODE
| IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
| IN BOOLEAN MODE
| WITH QUERY EXPANSION
}

MySQL has support for full-text indexing and searching:

-
A full-text index in MySQL is an index of type FULLTEXT.

-
Full-text indexes can be used only with InnoDB or MyISAM tables, and can be created only for CHAR, VARCHAR, or TEXT columns.

-
As of MySQL 5.7.6, MySQL provides a built-in full-text ngram parser that supports Chinese, Japanese, and Korean (CJK), and an installable MeCab full-text parser plugin for Japanese. Parsing differences are outlined in  “ngram Full-Text Parser”  and  “MeCab Full-Text Parser Plugin”

-
A FULLTEXT index definition can be given in the CREATE TABLE statement when a table is created, or added later using ALTER TABLE or CREATE INDEX.

-
For large data sets, it is much faster to load your data into a table that has no FULLTEXT index and then create the index after that, than to load data into a table that has an existing FULLTEXT index.

Full-text searching is performed using MATCH() ... AGAINST syntax. MATCH() takes a comma-separated list that names the columns to be searched. AGAINST takes a string to search for, and an optional modifier that indicates what type of search to perform. The search string must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row.

There are three types of full-text searches:

-
A natural language search interprets the search string as a phrase in natural human language (a phrase in free text). There are no special operators. The stopword list applies. For more information about stopword lists, see[Section 13.9.4, “Full-Text Stopwords”].

Full-text searches are natural language searches if the IN NATURAL LANGUAGE MODE modifier is given or if no modifier is given. For more information, see [Section 13.9.1, “Natural Language Full-Text Searches”].

-
A boolean search interprets the search string using the rules of a special query language. The string contains the words to search for. It can also contain operators that specify requirements such that a word must be present or absent in matching rows, or that it should be weighted higher or lower than usual. Certain common words (stopwords) are omitted from the search index and do not match if present in the search string. The IN BOOLEAN MODE modifier specifies a boolean search. For more information, see [Section 13.9.2, “Boolean Full-Text Searches”].

-
A query expansion search is a modification of a natural language search. The search string is used to perform a natural language search. Then words from the most relevant rows returned by the search are added to the search string and the search is done again. The query returns the rows from the second search. The IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION or WITH QUERY EXPANSION modifier specifies a query expansion search. For more information, see [Section 13.9.3, “Full-Text Searches with Query Expansion”].

For information about FULLTEXT query performance, see [Section 9.3.4, “Column Indexes”].

For more information about InnoDB FULLTEXT indexes, see [Section 15.8.10, “InnoDB FULLTEXT Indexes”].

Constraints on full-text searching are listed in [Section 13.9.5, “Full-Text Restrictions”].

The [myisam_ftdump] utility dumps the contents of a MyISAM full-text index. This may be helpful for debugging full-text queries. See [Section 5.6.2, “myisam_ftdump — Display Full-Text Index information”].

1. 自然语言全文搜索

By default or with the IN NATURAL LANGUAGE MODE modifier, the MATCH() function performs a natural language search for a string against a text collection. A collection is a set of one or more columns included in a FULLTEXTindex. The search string is given as the argument to AGAINST(). For each row in the table, MATCH() returns a relevance value; that is, a similarity measure between the search string and the text in that row in the columns named in the MATCH() list.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
mysql> CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','DBMS stands for DataBase ...'),
('How To Use MySQL Well','After you went through a ...'),
('Optimizing MySQL','In this tutorial we will show ...'),
('1001 MySQL Tricks','1\. Never run mysqld as root. 2\. ...'),
('MySQL vs. YourSQL','In the following database comparison ...'),
('MySQL Security','When configured properly, MySQL ...');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----+-------------------+------------------------------------------+
| id | title | body |
+----+-------------------+------------------------------------------+
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)

By default, the search is performed in case-insensitive fashion. To perform a case-sensitive full-text search, use a binary collation for the indexed columns. For example, a column that uses the latin1 character set of can be assigned a collation of latin1_bin to make it case sensitive for full-text searches.

When MATCH() is used in a WHERE clause, as in the example shown earlier, the rows returned are automatically sorted with the highest relevance first. Relevance values are nonnegative floating-point numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row (document), the number of unique words in the row, the total number of words in the collection, and the number of rows that contain a particular word.

Note

The term “document” may be used interchangeably with the term “row”, and both terms refer to the indexed part of the row. The term “collection” refers to the indexed columns and encompasses all rows.

To simply count matches, you could use a query like this:

1
2
3
4
5
6
7
8
9
mysql> SELECT COUNT(*) FROM articles
WHERE MATCH (title,body)
AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----------+
| COUNT(*) |
+----------+
| 2 |
+----------+
1 row in set (0.00 sec)

You might find it quicker to rewrite the query as follows:

1
2
3
4
5
6
7
8
9
10
mysql> SELECT
COUNT(IF(MATCH (title,body) AGAINST ('database' IN NATURAL LANGUAGE MODE), 1, NULL))
AS count
FROM articles;
+-------+
| count |
+-------+
| 2 |
+-------+
1 row in set (0.03 sec)

The first query does some extra work (sorting the results by relevance) but also can use an index lookup based on the WHERE clause. The index lookup might make the first query faster if the search matches few rows. The second query performs a full table scan, which might be faster than the index lookup if the search term was present in most rows.

For natural-language full-text searches, the columns named in the MATCH() function must be the same columns included in some FULLTEXT index in your table. For the preceding query, note that the columns named in the MATCH() function (title and body) are the same as those named in the definition of the article table’s FULLTEXTindex. To search the title or body separately, you would create separate FULLTEXT indexes for each column.

You can also perform a boolean search or a search with query expansion. These search types are described in[Section 13.9.2, “Boolean Full-Text Searches”], and [Section 13.9.3, “Full-Text Searches with Query Expansion”].

A full-text search that uses an index can name columns only from a single table in the MATCH() clause because an index cannot span multiple tables. For MyISAM tables, a boolean search can be done in the absence of an index (albeit more slowly), in which case it is possible to name columns from multiple tables.

The preceding example is a basic illustration that shows how to use the MATCH() function where rows are returned in order of decreasing relevance. The next example shows how to retrieve the relevance values explicitly. Returned rows are not ordered because the SELECT statement includes neither WHERE nor ORDER BY clauses:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
mysql> SELECT id, MATCH (title,body)
AGAINST ('Tutorial' IN NATURAL LANGUAGE MODE) AS score
FROM articles;
+----+---------------------+
| id | score |
+----+---------------------+
| 1 | 0.22764469683170319 |
| 2 | 0 |
| 3 | 0.22764469683170319 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
+----+---------------------+
6 rows in set (0.00 sec)

The following example is more complex. The query returns the relevance values and it also sorts the rows in order of decreasing relevance. To achieve this result, specify MATCH() twice: once in the SELECT list and once in theWHERE clause. This causes no additional overhead, because the MySQL optimizer notices that the two MATCH()calls are identical and invokes the full-text search code only once.

1
2
3
4
5
6
7
8
9
10
11
12
13
mysql> SELECT id, body, MATCH (title,body) AGAINST
('Security implications of running MySQL as root'
IN NATURAL LANGUAGE MODE) AS score
FROM articles WHERE MATCH (title,body) AGAINST
('Security implications of running MySQL as root'
IN NATURAL LANGUAGE MODE);
+----+-------------------------------------+-----------------+
| id | body | score |
+----+-------------------------------------+-----------------+
| 4 | 1\. Never run mysqld as root. 2\. ... | 1.5219271183014 |
| 6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)

The MySQL FULLTEXT implementation regards any sequence of true word characters (letters, digits, and underscores) as a word. That sequence may also contain apostrophes ('), but not more than one in a row. This means that aaa'bbb is regarded as one word, but aaa''bbb is regarded as two words. Apostrophes at the beginning or the end of a word are stripped by the FULLTEXT parser; 'aaa'bbb' would be parsed as aaa'bbb.

The built-in FULLTEXT parser determines where words start and end by looking for certain delimiter characters; for example,  (space), , (comma), and . (period). If words are not separated by delimiters (as in, for example, Chinese), the built-in FULLTEXT parser cannot determine where a word begins or ends. To be able to add words or other indexed terms in such languages to a FULLTEXT index that uses the built-in FULLTEXT parser, you must preprocess them so that they are separated by some arbitrary delimiter such as ". Alternatively, as of MySQL 5.7.6, you can create FULLTEXT indexes using the ngram parser plugin (for Chinese, Japanese, or Korean) or the MeCab parser plugin (for Japanese).

It is possible to write a plugin that replaces the built-in full-text parser. For details, see [Section 26.2, “The MySQL Plugin API”]. For example parser plugin source code, see the plugin/fulltext directory of a MySQL source distribution.

Some words are ignored in full-text searches:

-
Any word that is too short is ignored. The default minimum length of words that are found by full-text searches is three characters for InnoDB search indexes, or four characters for MyISAM. You can control the cutoff by setting a configuration option before creating the index: innodb_ft_min_token_size configuration option for InnoDB search indexes, or ft_min_word_len for MyISAM.

Note

This behavior does not apply to FULLTEXT indexes that use the ngram parser. For the ngram parser, token length is defined by the ngram_token_size option.

-
Words in the stopword list are ignored. A stopword is a word such as “the” or “some” that is so common that it is considered to have zero semantic value. There is a built-in stopword list, but it can be overridden by a user-defined list. The stopword lists and related configuration options are different for InnoDB search indexes and MyISAM ones. Stopword processing is controlled by the configuration options innodb_ft_enable_stopword,innodb_ft_server_stopword_table, and innodb_ft_user_stopword_table for InnoDB search indexes, andft_stopword_file for MyISAM ones.

See [Section 13.9.4, “Full-Text Stopwords”] to view default stopword lists and how to change them. The default minimum word length can be changed as described in [Section 13.9.6, “Fine-Tuning MySQL Full-Text Search”].

Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Thus, a word that is present in many documents has a lower weight, because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row. This technique works best with large collections.

MyISAM Limitation

For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results for search indexes on MyISAM tables. For example, although the word “MySQL”is present in every row of the articles table shown earlier, a search for the word in a MyISAM search index produces no results:

1
2
3
4
mysql> SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('MySQL' IN NATURAL LANGUAGE MODE);
Empty set (0.00 sec)

The search result is empty because the word “MySQL” is present in at least 50% of the rows, and so is effectively treated as a stopword. This filtering technique is more suitable for large data sets, where you might not want the result set to return every second row from a 1GB table, than for small data sets where it might cause poor results for popular terms.

The 50% threshold can surprise you when you first try full-text searching to see how it works, and makes InnoDBtables more suited to experimentation with full-text searches. If you create a MyISAM table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results until the table contains more rows. Users who need to bypass the 50% limitation can build search indexes onInnoDB tables, or use the boolean search mode explained in [Section 13.9.2, “Boolean Full-Text Searches”].

2. 布尔值全文搜索

MySQL can perform boolean full-text searches using the IN BOOLEAN MODE modifier. With this modifier, certain characters have special meaning at the beginning or end of words in the search string. In the following query, the +and - operators indicate that a word must be present or absent, respectively, for a match to occur. Thus, the query retrieves all the rows that contain the word “MySQL” but that do not contain the word “YourSQL”:

1
2
3
4
5
6
7
8
9
10
11
mysql> SELECT * FROM articles WHERE MATCH (title,body)
AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);
+----+-----------------------+-------------------------------------+
| id | title | body |
+----+-----------------------+-------------------------------------+
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
| 2 | How To Use MySQL Well | After you went through a ... |
| 3 | Optimizing MySQL | In this tutorial we will show ... |
| 4 | 1001 MySQL Tricks | 1\. Never run mysqld as root. 2\. ... |
| 6 | MySQL Security | When configured properly, MySQL ... |
+----+-----------------------+-------------------------------------+

Note

In implementing this feature, MySQL uses what is sometimes referred to as implied Boolean logic, in which

-
+ stands for AND

-
- stands for NOT

-
[no operator] implies OR

Boolean full-text searches have these characteristics:

-
They do not automatically sort rows in order of decreasing relevance.

-
InnoDB tables require a FULLTEXT index on all columns of the MATCH() expression to perform boolean queries. Boolean queries against a MyISAM search index can work even without a FULLTEXT index, although a search executed in this fashion would be quite slow.

-
The minimum and maximum word length full-text parameters apply to FULLTEXT indexes created using the built-in FULLTEXT parser and MeCab parser plugin. innodb_ft_min_token_size and innodb_ft_max_token_size are used for InnoDB search indexes. ft_min_word_len and ft_max_word_len are used for MyISAM search indexes.

Minimum and maximum word length full-text parameters do not apply to FULLTEXT indexes created using the ngram parser. ngram token size is defined by the ngram_token_size option.

-
The stopword list applies, controlled by innodb_ft_enable_stopword, innodb_ft_server_stopword_table, and innodb_ft_user_stopword_table for InnoDB search indexes, and ft_stopword_file for MyISAM ones.

-
InnoDB full-text search does not support the use of multiple operators on a single search word, as in this example: '++apple'. Use of multiple operators on a single search word returns a syntax error to standard out. MyISAM full-text search will successfully process the same search ignoring all operators except for the operator immediately adjacent to the search word.

-
InnoDB full-text search only supports leading plus or minus signs. For example, InnoDB supports '+apple' but does not support 'apple+'. Specifying a trailing plus or minus sign causes InnoDB to report a syntax error.

-
InnoDB full-text search does not support the use of a leading plus sign with wildcard ('+*'), a plus and minus sign combination ('+-'), or leading a plus and minus sign combination ('+-apple'). These invalid queries return a syntax error.

-
InnoDB full-text search does not support the use of the @ symbol in boolean full-text searches. The @ symbol is reserved for use by the @distance proximity search operator.

-
They do not use the 50% threshold that applies to MyISAM search indexes.

The boolean full-text search capability supports the following operators:

-
+

A leading or trailing plus sign indicates that this word must be present in each row that is returned. InnoDB only supports leading plus signs.

-
-

A leading or trailing minus sign indicates that this word must not be present in any of the rows that are returned. InnoDB only supports leading minus signs.

Note: The - operator acts only to exclude rows that are otherwise matched by other search terms. Thus, a boolean-mode search that contains only terms preceded by - returns an empty result. It does not return “all rows except those containing any of the excluded terms.”

-
(no operator)

By default (when neither + nor - is specified), the word is optional, but the rows that contain it are rated higher. This mimics the behavior of MATCH() ... AGAINST() without the IN BOOLEAN MODE modifier.

-
@*distance*

This operator works on InnoDB tables only. It tests whether two or more words all start within a specified distance from each other, measured in words. Specify the search words within a double-quoted string immediately before the@*distance* operator, for example, MATCH(col1) AGAINST('"word1 word2 word3" @8' IN BOOLEAN MODE)

-
> <

These two operators are used to change a word’s contribution to the relevance value that is assigned to a row. The> operator increases the contribution and the < operator decreases it. See the example following this list.

-
( )

Parentheses group words into subexpressions. Parenthesized groups can be nested.

-
~

A leading tilde acts as a negation operator, causing the word’s contribution to the row’s relevance to be negative. This is useful for marking “noise” words. A row containing such a word is rated lower than others, but is not excluded altogether, as it would be with the - operator.

-
*

The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it is appended to the word to be affected. Words match if they begin with the word preceding the * operator.

If a word is specified with the truncation operator, it is not stripped from a boolean query, even if it is too short or a stopword. Whether a word is too short is determined from the innodb_ft_min_token_size setting for InnoDBtables, or ft_min_word_len for MyISAM tables. These options are not applicable to FULLTEXT indexes that use the ngram parser.

The wildcarded word is considered as a prefix that must be present at the start of one or more words. If the minimum word length is 4, a search for '+*word* +the*' could return fewer rows than a search for '+*word* +the', because the second query ignores the too-short search term the.

-
"

A phrase that is enclosed within double quote (") characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase".

If the phrase contains no words that are in the index, the result is empty. The words might not be in the index because of a combination of factors: if they do not exist in the text, are stopwords, or are shorter than the minimum length of indexed words.

The following examples demonstrate some search strings that use boolean full-text operators:

-
'apple banana'

Find rows that contain at least one of the two words.

-
'+apple +juice'

Find rows that contain both words.

-
'+apple macintosh'

Find rows that contain the word “apple”, but rank rows higher if they also contain “macintosh”.

-
'+apple -macintosh'

Find rows that contain the word “apple” but not “macintosh”.

-
'+apple ~macintosh'

Find rows that contain the word “apple”, but if the row also contains the word “macintosh”, rate it lower than if row does not. This is “softer” than a search for '+apple -macintosh', for which the presence of “macintosh” causes the row not to be returned at all.

-
'+apple +(>turnover <strudel)'

Find rows that contain the words “apple” and “turnover”, or “apple” and “strudel” (in any order), but rank “apple turnover” higher than “apple strudel”.

-
'apple*'

Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”.

-
'"some words"'

Find rows that contain the exact phrase “some words” (for example, rows that contain “some words of wisdom” but not “some noise words”). Note that the " characters that enclose the phrase are operator characters that delimit the phrase. They are not the quotation marks that enclose the search string itself.

InnoDB full-text search is modeled on the Sphinx full-text search engine, and the algorithms used are based onBM25 and TF-IDF ranking algorithms. For these reasons, relevancy rankings for InnoDB boolean full-text search may differ from MyISAM relevancy rankings.

InnoDB uses a variation of the “term frequency-inverse document frequency” (TF-IDF) weighting system to rank a document’s relevance for a given full-text search query. The TF-IDF weighting is based on how frequently a word appears in a document, offset by how frequently the word appears in all documents in the collection. In other words, the more frequently a word appears in a document, and the less frequently the word appears in the document collection, the higher the document is ranked.

How Relevancy Ranking is Calculated

The term frequency (TF) value is the number of times that a word appears in a document. The inverse document frequency (IDF) value of a word is calculated using the following formula, where total_records is the number of records in the collection, and matching_records is the number of records that the search term appears in.

${IDF} = log10( ${total_records} / ${matching_records} )

When a document contains a word multiple times, the IDF value is multiplied by the TF value:

${TF} * ${IDF}

Using the TF and IDF values, the relevancy ranking for a document is calculated using this formula:

${rank} = ${TF} _ ${IDF} _ ${IDF}

The formula is demonstrated in the following examples.

This example demonstrates the relevancy ranking calculation for a single-word search.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
mysql> CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
) ENGINE=InnoDB;
Query OK, 0 rows affected (1.04 sec)

mysql> INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','This database tutorial ...'),
("How To Use MySQL",'After you went through a ...'),
('Optimizing Your Database','In this database tutorial ...'),
('MySQL vs. YourSQL','When comparing databases ...'),
('MySQL Security','When configured properly, MySQL ...'),
('Database, Database, Database','database database database'),
('1001 MySQL Tricks','1\. Never run mysqld as root. 2\. ...'),
('MySQL Full-Text Indexes', 'MySQL fulltext indexes use a ..');
Query OK, 8 rows affected (0.06 sec)
Records: 8 Duplicates: 0 Warnings: 0

mysql> SELECT id, title, body, MATCH (title,body) AGAINST ('database' IN BOOLEAN MODE)
AS score FROM articles ORDER BY score DESC;
+----+------------------------------+-------------------------------------+---------------------+
| id | title | body | score |
+----+------------------------------+-------------------------------------+---------------------+
| 6 | Database, Database, Database | database database database | 1.0886961221694946 |
| 3 | Optimizing Your Database | In this database tutorial ... | 0.36289870738983154 |
| 1 | MySQL Tutorial | This database tutorial ... | 0.18144935369491577 |
| 2 | How To Use MySQL | After you went through a ... | 0 |
| 4 | MySQL vs. YourSQL | When comparing databases ... | 0 |
| 5 | MySQL Security | When configured properly, MySQL ... | 0 |
| 7 | 1001 MySQL Tricks | 1\. Never run mysqld as root. 2\. ... | 0 |
| 8 | MySQL Full-Text Indexes | MySQL fulltext indexes use a .. | 0 |
+----+------------------------------+-------------------------------------+---------------------+
8 rows in set (0.00 sec)

There are 8 records in total, with 3 that match the “database” search term. The first record (id 6) contains the search term 6 times and has a relevancy ranking of 1.0886961221694946. This ranking value is calculated using a TF value of 6 (the “database” search term appears 6 times in record id 6) and an IDF value of 0.42596873216370745, which is calculated as follows (where 8 is the total number of records and 3 is the number of records that the search term appears in):

${IDF} = log10( 8 / 3 ) = 0.42596873216370745

The TF and IDF values are then entered into the ranking formula:

${rank} = ${TF} _ ${IDF} _ ${IDF}

Performing the calculation in the MySQL command-line client returns a ranking value of 1.088696164686938.

1
2
3
4
5
6
7
mysql> SELECT 6*log10(8/3)*log10(8/3);
+-------------------------+
| 6*log10(8/3)*log10(8/3) |
+-------------------------+
| 1.088696164686938 |
+-------------------------+
1 row in set (0.00 sec)

Note

You may notice a slight difference in the ranking values returned by the SELECT ... MATCH ... AGAINSTstatement and the MySQL command-line client (1.0886961221694946 versus 1.088696164686938). The difference is due to how the casts between integers and floats/doubles are performed internally by InnoDB (along with related precision and rounding decisions), and how they are performed elsewhere, such as in the MySQL command-line client or other types of calculators.

This example demonstrates the relevancy ranking calculation for a multiple-word full-text search based on thearticles table and data used in the previous example.

If you search on more than one word, the relevancy ranking value is a sum of the relevancy ranking value for each word, as shown in this formula:

${rank} = ${TF} _ ${IDF} _ ${IDF} + ${TF} _ ${IDF} _ ${IDF}

Performing a search on two terms (‘mysql tutorial’) returns the following results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> SELECT id, title, body, MATCH (title,body)  AGAINST ('mysql tutorial' IN BOOLEAN MODE)
AS score FROM articles ORDER BY score DESC;
+----+------------------------------+-------------------------------------+----------------------+
| id | title | body | score |
+----+------------------------------+-------------------------------------+----------------------+
| 1 | MySQL Tutorial | This database tutorial ... | 0.7405621409416199 |
| 3 | Optimizing Your Database | In this database tutorial ... | 0.3624762296676636 |
| 5 | MySQL Security | When configured properly, MySQL ... | 0.031219376251101494 |
| 8 | MySQL Full-Text Indexes | MySQL fulltext indexes use a .. | 0.031219376251101494 |
| 2 | How To Use MySQL | After you went through a ... | 0.015609688125550747 |
| 4 | MySQL vs. YourSQL | When comparing databases ... | 0.015609688125550747 |
| 7 | 1001 MySQL Tricks | 1\. Never run mysqld as root. 2\. ... | 0.015609688125550747 |
| 6 | Database, Database, Database | database database database | 0 |
+----+------------------------------+-------------------------------------+----------------------+
8 rows in set (0.00 sec)

In the first record (id 8), ‘mysql’ appears once and ‘tutorial’ appears twice. There are six matching records for ‘mysql’ and two matching records for ‘tutorial’. The MySQL command-line client returns the expected ranking value when inserting these values into the ranking formula for a multiple word search:

1
2
3
4
5
6
7
mysql> SELECT (1*log10(8/6)*log10(8/6)) + (2*log10(8/2)*log10(8/2));
+-------------------------------------------------------+
| (1*log10(8/6)*log10(8/6)) + (2*log10(8/2)*log10(8/2)) |
+-------------------------------------------------------+
| 0.7405621541938003 |
+-------------------------------------------------------+
1 row in set (0.00 sec)

Note

The slight difference in the ranking values returned by the SELECT ... MATCH ... AGAINST statement and the MySQL command-line client is explained in the preceding example.

3. Full-Text Searches with Query Expansion

Full-text search supports query expansion (and in particular, its variant “blind query expansion”). This is generally useful when a search phrase is too short, which often means that the user is relying on implied knowledge that the full-text search engine lacks. For example, a user searching for “database” may really mean that “MySQL”, “Oracle”, “DB2”, and “RDBMS” all are phrases that should match “databases” and should be returned, too. This is implied knowledge.

Blind query expansion (also known as automatic relevance feedback) is enabled by adding WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION following the search phrase. It works by performing the search twice, where the search phrase for the second search is the original search phrase concatenated with the few most highly relevant documents from the first search. Thus, if one of these documents contains the word “databases” and the word “MySQL”, the second search finds the documents that contain the word“MySQL” even if they do not contain the word “database”. The following example shows this difference:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
mysql> SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----+-------------------+------------------------------------------+
| id | title | body |
+----+-------------------+------------------------------------------+
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database' WITH QUERY EXPANSION);
+----+-----------------------+------------------------------------------+
| id | title | body |
+----+-----------------------+------------------------------------------+
| 5 | MySQL vs. YourSQL | In the following database comparison ... |
| 1 | MySQL Tutorial | DBMS stands for DataBase ... |
| 3 | Optimizing MySQL | In this tutorial we will show ... |
| 6 | MySQL Security | When configured properly, MySQL ... |
| 2 | How To Use MySQL Well | After you went through a ... |
| 4 | 1001 MySQL Tricks | 1\. Never run mysqld as root. 2\. ... |
+----+-----------------------+------------------------------------------+
6 rows in set (0.00 sec)

Another example could be searching for books by Georges Simenon about Maigret, when a user is not sure how to spell “Maigret”. A search for “Megre and the reluctant witnesses” finds only “Maigret and the Reluctant Witnesses”without query expansion. A search with query expansion finds all books with the word “Maigret” on the second pass.

Note

Because blind query expansion tends to increase noise significantly by returning nonrelevant documents, use it only when a search phrase is short.

4. 全文禁止词

The stopword list is loaded and searched for full-text queries using the server character set and collation (the values of the character_set_server and collation_server system variables). False hits or misses might occur for stopword lookups if the stopword file or columns used for full-text indexing or searches have a character set or collation different from character_set_server or collation_server.

Case sensitivity of stopword lookups depends on the server collation. For example, lookups are case insensitive if the collation is latin1_swedish_ci, whereas lookups are case sensitive if the collation is latin1_general_cs orlatin1_bin.

Stopwords for InnoDB Search Indexes

InnoDB has a relatively short list of default stopwords, because documents from technical, literary, and other sources often use short words as keywords or in significant phrases. For example, you might search for “to be or not to be” and expect to get a sensible result, rather than having all those words ignored.

To see the default InnoDB stopword list, query the INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
+-------+
| value |
+-------+
| a |
| about |
| an |
| are |
| as |
| at |
| be |
| by |
| com |
| de |
| en |
| for |
| from |
| how |
| i |
| in |
| is |
| it |
| la |
| of |
| on |
| or |
| that |
| the |
| this |
| to |
| was |
| what |
| when |
| where |
| who |
| will |
| with |
| und |
| the |
| www |
+-------+
36 rows in set (0.00 sec)

To define your own stopword list for all InnoDB tables, define a table with the same structure as theINNODB_FT_DEFAULT_STOPWORD table, populate it with stopwords, and set the value of theinnodb_ft_server_stopword_table option to a value in the form *db_name*/*table_name* before creating the full-text index. The stopword table must have a single VARCHAR column named value. The following example demonstrates creating and configuring a new global stopword table for InnoDB.

– Create a new stopword table

1
2
mysql> CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;
Query OK, 0 rows affected (0.01 sec)

– Insert stopwords (for simplicity, a single stopword is used in this example)

1
2
mysql> INSERT INTO my_stopwords(value) VALUES ('Ishmael');
Query OK, 1 row affected (0.00 sec)

– Create the table

1
2
3
4
5
6
7
mysql> CREATE TABLE opening_lines (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
opening_line TEXT(500),
author VARCHAR(200),
title VARCHAR(200)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

– Insert data into the table

1
2
3
4
5
6
7
8
9
10
11
mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
('Call me Ishmael.','Herman Melville','Moby-Dick'),
('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
('I am an invisible man.','Ralph Ellison','Invisible Man'),
('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
('It was love at first sight.','Joseph Heller','Catch-22'),
('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0

– Set the innodb_ft_server_stopword_table option to the new stopword table

1
2
mysql> SET GLOBAL innodb_ft_server_stopword_table = 'test/my_stopwords';
Query OK, 0 rows affected (0.00 sec)

– Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)

1
2
3
mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
Query OK, 0 rows affected, 1 warning (1.17 sec)
Records: 0 Duplicates: 0 Warnings: 1

Verify that the specified stopword (‘Ishmael’) does not appear by querying the words inINFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE.

Note

By default, words less than 3 characters in length or greater than 84 characters in length do not appear in anInnoDB full-text search index. Maximum and minimum word length values are configurable using theinnodb_ft_max_token_size and innodb_ft_min_token_size variables. This default behavior does not apply to the ngram parser plugin. ngram token size is defined by the ngram_token_size option.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
mysql> SET GLOBAL innodb_ft_aux_table='test/opening_lines';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 15;
+-----------+
| word |
+-----------+
| across |
| all |
| burn |
| buy |
| call |
| comes |
| dalloway |
| first |
| flowers |
| happened |
| herself |
| invisible |
| less |
| love |
| man |
+-----------+
15 rows in set (0.00 sec)

To create stopword lists on a table-by-table basis, create other stopword tables and use theinnodb_ft_user_stopword_table option to specify the stopword table that you want to use before you create the full-text index.

Stopwords for MyISAM Search Indexes

The stopword file is loaded and searched using latin1 if character_set_server is ucs2, utf16, utf16le, or utf32.

To override the default stopword list for MyISAM tables, set the ft_stopword_file system variable. (See [Section 6.1.5, “Server System Variables”].) The variable value should be the path name of the file containing the stopword list, or the empty string to disable stopword filtering. The server looks for the file in the data directory unless an absolute path name is given to specify a different directory. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes.

The stopword list is free-form, separating stopwords with any nonalphanumeric character such as newline, space, or comma. Exceptions are the underscore character (_) and a single apostrophe (') which are treated as part of a word. The character set of the stopword list is the server’s default character set; see [Section 11.1.3.2, “Server Character Set and Collation”].

The following table shows the default list of stopwords for MyISAM search indexes. In a MySQL source distribution, you can find this list in the storage/myisam/ft_static.c file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
| a's | able | about | above | according |
| accordingly | across | actually | after | afterwards |
| again | against | ain't | all | allow |
| allows | almost | alone | along | already |
| also | although | always | am | among |
| amongst | an | and | another | any |
| anybody | anyhow | anyone | anything | anyway |
| anyways | anywhere | apart | appear | appreciate |
| appropriate | are | aren't | around | as |
| aside | ask | asking | associated | at |
| available | away | awfully | be | became |
| because | become | becomes | becoming | been |
| before | beforehand | behind | being | believe |
| below | beside | besides | best | better |
| between | beyond | both | brief | but |
| by | c'mon | c's | came | can |
| can't | cannot | cant | cause | causes |
| certain | certainly | changes | clearly | co |
| com | come | comes | concerning | consequently |
| consider | considering | contain | containing | contains |
| corresponding | could | couldn't | course | currently |
| definitely | described | despite | did | didn't |
| different | do | does | doesn't | doing |
| don't | done | down | downwards | during |
| each | edu | eg | eight | either |
| else | elsewhere | enough | entirely | especially |
| et | etc | even | ever | every |
| everybody | everyone | everything | everywhere | ex |
| exactly | example | except | far | few |
| fifth | first | five | followed | following |
| follows | for | former | formerly | forth |
| four | from | further | furthermore | get |
| gets | getting | given | gives | go |
| goes | going | gone | got | gotten |
| greetings | had | hadn't | happens | hardly |
| has | hasn't | have | haven't | having |
| he | he's | hello | help | hence |
| her | here | here's | hereafter | hereby |
| herein | hereupon | hers | herself | hi |
| him | himself | his | hither | hopefully |
| how | howbeit | however | i'd | i'll |
| i'm | i've | ie | if | ignored |
| immediate | in | inasmuch | inc | indeed |
| indicate | indicated | indicates | inner | insofar |
| instead | into | inward | is | isn't |
| it | it'd | it'll | it's | its |
| itself | just | keep | keeps | kept |
| know | known | knows | last | lately |
| later | latter | latterly | least | less |
| lest | let | let's | like | liked |
| likely | little | look | looking | looks |
| ltd | mainly | many | may | maybe |
| me | mean | meanwhile | merely | might |
| more | moreover | most | mostly | much |
| must | my | myself | name | namely |
| nd | near | nearly | necessary | need |
| needs | neither | never | nevertheless | new |
| next | nine | no | nobody | non |
| none | noone | nor | normally | not |
| nothing | novel | now | nowhere | obviously |
| of | off | often | oh | ok |
| okay | old | on | once | one |
| ones | only | onto | or | other |
| others | otherwise | ought | our | ours |
| ourselves | out | outside | over | overall |
| own | particular | particularly | per | perhaps |
| placed | please | plus | possible | presumably |
| probably | provides | que | quite | qv |
| rather | rd | re | really | reasonably |
| regarding | regardless | regards | relatively | respectively |
| right | said | same | saw | say |
| saying | says | second | secondly | see |
| seeing | seem | seemed | seeming | seems |
| seen | self | selves | sensible | sent |
| serious | seriously | seven | several | shall |
| she | should | shouldn't | since | six |
| so | some | somebody | somehow | someone |
| something | sometime | sometimes | somewhat | somewhere |
| soon | sorry | specified | specify | specifying |
| still | sub | such | sup | sure |
| t's | take | taken | tell | tends |
| th | than | thank | thanks | thanx |
| that | that's | thats | the | their |
| theirs | them | themselves | then | thence |
| there | there's | thereafter | thereby | therefore |
| therein | theres | thereupon | these | they |
| they'd | they'll | they're | they've | think |
| third | this | thorough | thoroughly | those |
| though | three | through | throughout | thru |
| thus | to | together | too | took |
| toward | towards | tried | tries | truly |
| try | trying | twice | two | un |
| under | unfortunately | unless | unlikely | until |
| unto | up | upon | us | use |
| used | useful | uses | using | usually |
| value | various | very | via | viz |
| vs | want | wants | was | wasn't |
| way | we | we'd | we'll | we're |
| we've | welcome | well | went | were |
| weren't | what | what's | whatever | when |
| whence | whenever | where | where's | whereafter |
| whereas | whereby | wherein | whereupon | wherever |
| whether | which | while | whither | who |
| who's | whoever | whole | whom | whose |
| why | will | willing | wish | with |
| within | without | won't | wonder | would |
| wouldn't | yes | yet | you | you'd |
| you'll | you're | you've | your | yours |
| yourself | yourselves | zero |   |   |

5. 全文限制

-
Full-text searches are supported for InnoDB and MyISAM tables only.

-
Full-text searches are not supported for partitioned tables. See [Section 20.6, “Restrictions and Limitations on Partitioning”].

-
Full-text searches can be used with most multibyte character sets. The exception is that for Unicode, the utf8character set can be used, but not the ucs2 character set. Although FULLTEXT indexes on ucs2 columns cannot be used, you can perform IN BOOLEAN MODE searches on a ucs2 column that has no such index.

The remarks for utf8 also apply to utf8mb4, and the remarks for ucs2 also apply to utf16, utf16le, and utf32.

-
Ideographic languages such as Chinese and Japanese do not have word delimiters. Therefore, the built-in full-text parser cannot determine where words begin and end in these and other such languages.

In MySQL 5.7.6, a character-based ngram full-text parser that supports Chinese, Japanese, and Korean (CJK), and a word-based MeCab parser plugin that supports Japanese are provided for use with InnoDB and MySIAM tables.

-
Although the use of multiple character sets within a single table is supported, all columns in a FULLTEXT index must use the same character set and collation.

-
The MATCH() column list must match exactly the column list in some FULLTEXT index definition for the table, unless this MATCH() is IN BOOLEAN MODE on a MyISAM table. For MyISAM tables, boolean-mode searches can be done on nonindexed columns, although they are likely to be slow.

-
The argument to AGAINST() must be a string value that is constant during query evaluation. This rules out, for example, a table column because that can differ for each row.

-
Index hints are more limited for FULLTEXT searches than for non-FULLTEXT searches. See [Section 9.9.4, “Index Hints”].

-
For InnoDB, all DML operations (INSERT, UPDATE, DELETE) involving columns with full-text indexes are processed at transaction commit time. For example, for an INSERT operation, an inserted string is tokenized and decomposed into individual words. The individual words are then added to full-text index tables when the transaction is committed. As a result, full-text searches only return committed data.

-
The ‘%’ character is not a supported wildcard character for full-text searches.

MySQL’s full-text search capability has few user-tunable parameters. You can exert more control over full-text searching behavior if you have a MySQL source distribution because some changes require source code modifications. See [Section 2.9, “Installing MySQL from Source”].

Full-text search is carefully tuned for effectiveness. Modifying the default behavior in most cases can actually decrease effectiveness. Do not alter the MySQL sources unless you know what you are doing.

Most full-text variables described in this section must be set at server startup time. A server restart is required to change them; they cannot be modified while the server is running.

Some variable changes require that you rebuild the FULLTEXT indexes in your tables. Instructions for doing so are given later in this section.

Configuring Minimum and Maximum Word Length

The minimum and maximum lengths of words to be indexed are defined by the innodb_ft_min_token_size andinnodb_ft_max_token_size for InnoDB search indexes, and ft_min_word_len and ft_max_word_len forMyISAM ones.

Note

Minimum and maximum word length full-text parameters do not apply to FULLTEXT indexes created using the ngram parser. ngram token size is defined by the ngram_token_size option.

After changing any of these options, rebuild your FULLTEXT indexes for the change to take effect. For example, to make two-character words searchable, you could put the following lines in an option file:

1
2
3
[mysqld]
innodb_ft_min_token_size=2
ft_min_word_len=2

Then restart the server and rebuild your FULLTEXT indexes. For MyISAM tables, note the remarks regarding[myisamchk] in the instructions that follow for rebuilding MyISAM full-text indexes.

Configuring the Natural Language Search Threshold

For MyISAM search indexes, the 50% threshold for natural language searches is determined by the particular weighting scheme chosen. To disable it, look for the following line in storage/myisam/ftdefs.h:

1
#define GWS_IN_USE GWS_PROB

Change that line to this:

1
#define GWS_IN_USE GWS_FREQ

Then recompile MySQL. There is no need to rebuild the indexes in this case.

Note

By making this change, you severely decrease MySQL’s ability to provide adequate relevance values for the MATCH() function. If you really need to search for such common words, it would be better to search using IN BOOLEAN MODE instead, which does not observe the 50% threshold.

Modifying Boolean Full-Text Search Operators

To change the operators used for boolean full-text searches on MyISAM tables, set the ft_boolean_syntax system variable. (InnoDB does not have an equivalent setting.) This variable can be changed while the server is running, but you must have the SUPER privilege to do so. No rebuilding of indexes is necessary in this case. See[Section 6.1.5, “Server System Variables”], which describes the rules governing how to set this variable.

Character Set Modifications

For the built-in full-text parser, you can change the set of characters that are considered word characters in several ways, as described in the following list. After making the modification, rebuild the indexes for each table that contains any FULLTEXT indexes. Suppose that you want to treat the hyphen character (‘-‘) as a word character. Use one of these methods:

-
Modify the MySQL source: In storage/innobase/handler/ha_innodb.cc (for InnoDB), or instorage/myisam/ftdefs.h (for MyISAM), see the true_word_char() and misc_word_char() macros. Add '-'to one of those macros and recompile MySQL.

-
Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type”table to distinguish letters and numbers from other characters. . You can edit the contents of the <ctype><map>array in one of the character set XML files to specify that '-' is a “letter.” Then use the given character set for yourFULLTEXT indexes. For information about the <ctype><map> array format, see [Section 11.3.1, “Character Definition Arrays”].

-
Add a new collation for the character set used by the indexed columns, and alter the columns to use that collation. For general information about adding collations, see [Section 11.4, “Adding a Collation to a Character Set”]. For an example specific to full-text indexing, see [Section 13.9.7, “Adding a Collation for Full-Text Indexing”].

Rebuilding InnoDB Full-Text Indexes

If you modify full-text variables that affect indexing (innodb_ft_min_token_size, innodb_ft_max_token_size,innodb_ft_server_stopword_table, innodb_ft_user_stopword_table, innodb_ft_enable_stopword,ngram_token_size you must rebuild your FULLTEXT indexes after making the changes. Modifying theinnodb_ft_min_token_size, innodb_ft_max_token_size, or ngram_token_size variables, which cannot be set dynamically, require restarting the server and rebuilding the indexes.

To rebuild the FULLTEXT indexes for an InnoDB table, use ALTER TABLE with the DROP INDEX and ADD INDEXoptions to drop and re-create each index.

Optimizing InnoDB Full-Text Indexes

Running OPTIMIZE TABLE on a table with a full-text index rebuilds the full-text index, removing deleted Document IDs and consolidating multiple entries for the same word, where possible.

To optimize a full-text index, enable innodb_optimize_fulltext_only and run OPTIMIZE TABLE.

1
2
3
4
5
6
7
8
9
10
mysql> set GLOBAL innodb_optimize_fulltext_only=ON;
Query OK, 0 rows affected (0.01 sec)

mysql> OPTIMIZE TABLE opening_lines;
+--------------------+----------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+--------------------+----------+----------+----------+
| test.opening_lines | optimize | status | OK |
+--------------------+----------+----------+----------+
1 row in set (0.01 sec)

To avoid lengthy rebuild times for full-text indexes on large tables, you can use theinnodb_ft_num_word_optimize option to perform the optimization in stages. Theinnodb_ft_num_word_optimize option defines the number of words that are optimized each time OPTIMIZE TABLE is run. The default setting is 2000, which means that 2000 words are optimized each time OPTIMIZE TABLEis run. Subsequent OPTIMIZE TABLE operations continue from where the preceding OPTIMIZE TABLE operation ended.

Rebuilding MyISAM Full-Text Indexes

If you modify full-text variables that affect indexing (ft_min_word_len, ft_max_word_len, or ft_stopword_file), or if you change the stopword file itself, you must rebuild your FULLTEXT indexes after making the changes and restarting the server.

To rebuild the FULLTEXT indexes for a MyISAM table, it is sufficient to do a QUICK repair operation:

1
mysql> REPAIR TABLE *`tbl_name`* QUICK;

Alternatively, use ALTER TABLE as just described. In some cases, this may be faster than a repair operation.

Each table that contains any FULLTEXT index must be repaired as just shown. Otherwise, queries for the table may yield incorrect results, and modifications to the table will cause the server to see the table as corrupt and in need of repair.

If you use [myisamchk] to perform an operation that modifies MyISAM table indexes (such as repair or analyze), theFULLTEXT indexes are rebuilt using the default full-text parameter values for minimum word length, maximum word length, and stopword file unless you specify otherwise. This can result in queries failing.

The problem occurs because these parameters are known only by the server. They are not stored in MyISAM index files. To avoid the problem if you have modified the minimum or maximum word length or stopword file values used by the server, specify the same ft_min_word_len, ft_max_word_len, and ft_stopword_file values for[myisamchk] that you use for [mysqld]. For example, if you have set the minimum word length to 3, you can repair a table with [myisamchk] like this:

shell> myisamchk –recover –ft_min_word_len=3 tbl_name.MYI

To ensure that [myisamchk] and the server use the same values for full-text parameters, place each one in both the [mysqld] and [myisamchk] sections of an option file:

1
2
3
4
5
[mysqld]
ft_min_word_len=3

[myisamchk]
ft_min_word_len=3

An alternative to using [myisamchk] for MyISAM table index modification is to use the REPAIR TABLE, ANALYZE TABLE, OPTIMIZE TABLE, or ALTER TABLE statements. These statements are performed by the server, which knows the proper full-text parameter values to use.

7. Adding a Collation for Full-Text Indexing

This section describes how to add a new collation for full-text searches using the built-in full-text parser. The sample collation is like latin1_swedish_ci but treats the '-' character as a letter rather than as a punctuation character so that it can be indexed as a word character. General information about adding collations is given in [Section 11.4, “Adding a Collation to a Character Set”]; it is assumed that you have read it and are familiar with the files involved.

To add a collation for full-text indexing, use this procedure:

1.
Add a collation to the Index.xml file. The collation ID must be unused, so choose a value different from 1000 if that ID is already taken on your system.

2.
Declare the sort order for the collation in the latin1.xml file. In this case, the order can be copied fromlatin1_swedish_ci:

00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F

20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F

30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F

40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F

50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F

60 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F

50 51 52 53 54 55 56 57 58 59 5A 7B 7C 7D 7E 7F

80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F

90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F

A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF

B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF

41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 49

44 4E 4F 4F 4F 4F 5D D7 D8 55 55 55 59 59 DE DF

41 41 41 41 5C 5B 5C 43 45 45 45 45 49 49 49 49

44 4E 4F 4F 4F 4F 5D F7 D8 55 55 55 59 59 DE FF

3.
Modify the ctype array in latin1.xml. Change the value corresponding to 0x2D (which is the code for the '-'character) from 10 (punctuation) to 01 (small letter). In the following array, this is the element in the fourth row down, third value from the end.

00

20 20 20 20 20 20 20 20 20 28 28 28 28 28 20 20

20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

48 10 10 10 10 10 10 10 10 10 10 10 10 01 10 10

84 84 84 84 84 84 84 84 84 84 10 10 10 10 10 10

10 81 81 81 81 81 81 01 01 01 01 01 01 01 01 01

01 01 01 01 01 01 01 01 01 01 01 10 10 10 10 10

10 82 82 82 82 82 82 02 02 02 02 02 02 02 02 02

02 02 02 02 02 02 02 02 02 02 02 10 10 10 10 20

10 00 10 02 10 10 10 10 10 10 01 10 01 00 01 00

00 10 10 10 10 10 10 10 10 10 02 10 02 00 02 01

48 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01

01 01 01 01 01 01 01 10 01 01 01 01 01 01 01 02

02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02

02 02 02 02 02 02 02 10 02 02 02 02 02 02 02 02

4.
Restart the server.

5.
To employ the new collation, include it in the definition of columns that are to use it:

1
2
3
4
5
6
7
8
mysql> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected (0.13 sec)

mysql> CREATE TABLE t1 (
a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci,
FULLTEXT INDEX(a)
) ENGINE=InnoDB;
Query OK, 0 rows affected (0.47 sec)

6.Test the collation to verify that hyphen is considered as a word character:

1
2
3
4
5
6
7
8
9
10
11
mysql> INSERT INTO t1 VALUEs ('----'),('....'),('abcd');
Query OK, 3 rows affected (0.22 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> SELECT * FROM t1 WHERE MATCH a AGAINST ('----' IN BOOLEAN MODE);
+------+
| a |
+------+
| ---- |
+------+
1 row in set (0.00 sec)

8. ngram Full-Text Parser

The built-in MySQL full-text parser uses the white space between words as a delimiter to determine where words begin and end, which is a limitation when working with ideographic languages that do not use word delimiters. To address this limitation, MySQL provides an ngram full-text parser that supports Chinese, Japanese, and Korean (CJK). The ngram full-text parser is supported for use with InnoDB and MyISAM.

Note

MySQL also provides a MeCab full-text parser plugin for Japanese, which tokenizes documents into meaningful words. For more information, see [Section 13.9.9, “MeCab Full-Text Parser Plugin”].

An ngram is a contiguous sequence of n characters from a given sequence of text. The ngram parser tokenizes a sequence of text into a contiguous sequence of n characters. For example, you can tokenize “abcd” for different values of n using the ngram full-text parser.

n=1: ‘a’, ‘b’, ‘c’, ‘d’

n=2: ‘ab’, ‘bc’, ‘cd’

n=3: ‘abc’, ‘bcd’

n=4: ‘abcd’

The ngram full-text parser, introduced in MySQL 5.7.6, is a built-in server plugin. As with other built-in server plugins, it is automatically loaded when the server is started.

The full-text search syntax described in [Section 13.9, “Full-Text Search Functions”] applies to the ngram parser plugin. Differences in parsing behavior are described in this section. Full-text-related configuration options, except for minimum and maximum word length options (innodb_ft_min_token_size, innodb_ft_max_token_size,ft_min_word_len, ft_max_word_len) are also applicable.

Configuring ngram Token Size

The ngram parser has a default ngram token size of 2 (bigram). For example, with a token size of 2, the ngram parser parses the string “abc def” into four tokens: “ab”, “bc”, “de” and “ef”.

ngram token size is configurable using the ngram_token_size configuration option, which has a minimum value of 1 and maximum value of 10.

Typically, ngram_token_size is set to the size of the largest token that you want to search for. If you only intend to search for single characters, set ngram_token_size to 1. A smaller token size produces a smaller full-text search index, and faster searches. If you need to search for words comprised of more than one character, setngram_token_size accordingly. For example, “Happy Birthday” is “生日快乐” in simplified Chinese, where “生日” is “birthday”, and “快乐” translates as “happy”. To search on two-character words such as these, setngram_token_size to a value of 2 or higher.

As a read-only variable, ngram_token_size may only be set as part of a startup string or in a configuration file:

  • Startup string:
1
mysqld --ngram_token_size=2
  • Configuration file:
1
2
[mysqld]
ngram_token_size=2

Note

The following minimum and maximum word length configuration options are ignored for FULLTEXT indexes that use the ngram parser: innodb_ft_min_token_size, innodb_ft_max_token_size, ft_min_word_len, andft_max_word_len.

Creating a FULLTEXT Index that Uses the ngram Parser

To create a FULLTEXT index that uses the ngram parser, specify WITH PARSER ngram with CREATE TABLE, ALTER TABLE, or CREATE INDEX.

The following example demonstrates creating a table with an ngram FULLTEXT index, inserting sample data (Simplified Chinese text), and viewing tokenized data in the INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHEtable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> USE test;

mysql> CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body) WITH PARSER ngram
) ENGINE=InnoDB CHARACTER SET utf8mb4;

mysql> SET NAMES utf8mb4;

INSERT INTO articles (title,body) VALUES
('数据库管理','在本教程中我将向你展示如何管理数据库'),
('数据库应用开发','学习开发数据库应用程序');

mysql> SET GLOBAL innodb_ft_aux_table="test/articles";

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position;

To add a FULLTEXT index to an existing table, you can use ALTER TABLE or CREATE INDEX. For example:

1
2
3
4
5
6
7
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT
) ENGINE=InnoDB CHARACTER SET utf8;

ALTER TABLE articles ADD FULLTEXT INDEX ft_index (title,body) WITH PARSER ngram;

Or:

1
CREATE FULLTEXT INDEX ft_index ON articles (title,body) WITH PARSER ngram;

ngram Parser Space Handling

The ngram parser eliminates spaces when parsing. For example:

-
“ab cd” is parsed to “ab”, “cd”

-
“a bc” is parsed to “bc”

ngram Parser Stopword Handling

The built-in MySQL full-text parser compares words to entries in the stopword list. If a word is equal to an entry in the stopword list, the word is excluded from the index. For the ngram parser, stopword handling is performed differently. Instead of excluding tokens that are equal to entries in the stopword list, the ngram parser excludes tokens that contain stopwords. For example, assuming ngram_token_size=2, a document that contains “a,b” is parsed to “a,” and “,b”. If a comma (“,”) is defined as a stopword, both “a,” and “,b” are excluded from the index because they contain a comma.

By default, the ngram parser uses the default stopword list, which contains a list of English stopwords. For a stopword list applicable to Chinese, Japanese, or Korean, you must create your own. For information about creating a stopword list, see [Section 13.9.4, “Full-Text Stopwords”].

Stopwords greater in length than ngram_token_size are ignored.

For natural language mode search, the search term is converted to a union of ngram terms. For example, the string “abc” (assuming ngram_token_size=2) is converted to “ab bc”. Given two documents, one containing “ab” and the other containing “abc”, the search term “ab bc” matches both documents.

For boolean mode search, the search term is converted to an ngram phrase search. For example, the string ‘abc’ (assuming ngram_token_size=2) is converted to ‘“ab bc”’. Given two documents, one containing ‘ab’ and the other containing ‘abc’, the search phrase ‘“ab bc”’ only matches the document containing ‘abc’.

Because an ngram FULLTEXT index contains only ngrams, and does not contain information about the beginning of terms, wildcard searches may return unexpected results. The following behaviors apply to wildcard searches using ngram FULLTEXT search indexes:

-
If the prefix term of a wildcard search is shorter than ngram token size, the query returns all indexed rows that contain ngram tokens starting with the prefix term. For example, assuming ngram_token_size=2, a search on “a*”returns all rows starting with “a”.

-
If the prefix term of a wildcard search is longer than ngram token size, the prefix term is converted to an ngram phrase and the wildcard operator is ignored. For example, assuming ngram_token_size=2, an “abc*” wildcard search is converted to “ab bc”.

Phrase searches are converted to ngram phrase searches. For example, The search phrase “abc” is converted to“ab bc”, which returns documents containing “abc” and “ab bc”.

The search phrase “abc def” is converted to “ab bc de ef”, which returns documents containing “abc def” and “ab bc de ef”. A document that contains “abcdef” is not returned.

9. MeCab Full-Text Parser Plugin

The built-in MySQL full-text parser uses the white space between words as a delimiter to determine where words begin and end, which is a limitation when working with ideographic languages that do not use word delimiters. To address this limitation for Japanese, MySQL provides a MeCab full-text parser plugin. The MeCab full-text parser plugin is supported for use with InnoDB and MyISAM.

Note

MySQL also provides an ngram full-text parser plugin that supports Japanese. For more information, see[Section 13.9.8, “ngram Full-Text Parser”].

The MeCab full-text parser plugin, introduced in MySQL 5.7.6, is a full-text parser plugin for Japanese that tokenizes a sequence of text into meaningful words. For example, MeCab tokenizes “データベース管理” (“Database Management”) into “データベース” (“Database”) and “管理” (“Management”). By comparison, the [ngram] full-text parser tokenizes text into a contiguous sequence of n characters, where n represents a number between 1 and 10.

In addition to tokenizing text into meaningful words, MeCab indexes are typically smaller than ngram indexes, and MeCab full-text searches are generally faster. One drawback is that it may take longer for the MeCab full-text parser to tokenize documents, compared to the ngram full-text parser.

The full-text search syntax described in [Section 13.9, “Full-Text Search Functions”] applies to the MeCab parser plugin. Differences in parsing behavior are described in this section. Full-text related configuration options are also applicable.

For additional information about the MeCab parser, refer to the MeCab Documentation on the Google Developerssite.

Installing the MeCab Parser Plugin

The MeCab parser plugin requires mecab and mecab-ipadic.

On supported Fedora, Debian and Ubuntu platforms (except Ubuntu 12.04 where the system mecab version is too old), MySQL dynamically links to the system mecab installation if it is installed to the default location. On other supported Unix-like platforms, libmecab.so is statically linked in libpluginmecab.so, which is located in the MySQL plugin directory. mecab-ipadic is included in MySQL binaries and is located in *MYSQL_HOME*\lib\mecab.

You can install mecab and mecab-ipadic using a native package management utility (on Fedora, Debian, and Ubuntu), or you can build mecab and mecab-ipadic from source. For information about installing mecab andmecab-ipadic using a native package management utility, see [Installing MeCab From a Binary Distribution (Optional)]”). If you want to build mecab and mecab-ipadic from source, see [Building MeCab From Source (Optional)]”).

On Windows, libmecab.dll is found in the MySQL bin directory. mecab-ipadic is located in*MYSQL_HOME*/lib/mecab.

To install and configure the MeCab parser plugin, perform the following steps:

1.
In the MySQL configuration file, set the mecab_rc_file configuration option to the location of the mecabrcconfiguration file, which is the configuration file for MeCab. If you are using the MeCab package distributed with MySQL, the mecabrc file is located in MYSQL_HOME/lib/mecab/etc/.

1
2
[mysqld]
loose-mecab-rc-file=MYSQL_HOME/lib/mecab/etc/mecabrc


The loose prefix is an [option modifier]. The mecab_rc_file option is not recognized by MySQL until the MeCaB parser plugin is installed but it must be set before attempting to install the MeCaB parser plugin. The loose prefix allows you restart MySQL without encountering an error due to an unrecognized variable.

If you use your own MeCab installation, or build MeCab from source, the location of the mecabrc configuration file may differ.

For information about the MySQL configuration file and its location, see [Section 5.2.6, “Using Option Files”].

2.
Also in the MySQL configuration file, set the minimum token size to 1 or 2, which are the values recommended for use with the MeCab parser. For InnoDB tables, minimum token size is defined by the innodb_ft_min_token_sizeconfiguration option, which has a default value of 3. For MyISAM tables, minimum token size is defined by ft_min_word_len, which has a default value of 4.

[mysqld]

innodb_ft_min_token_size=1

3.
Modify the mecabrc configuration file to specify the dictionary you want to use. The mecab-ipadic package distributed with MySQL binaries includes three dictionaries (ipadic_euc-jp, ipadic_sjis, and ipadic_utf-8). The mecabrc configuration file packaged with MySQL contains and entry similar to the following:

dicdir =  /path/to/mysql/lib/mecab/lib/mecab/dic/ipadic_euc-jp

To use the ipadic_utf-8 dictionary, for example, modify the entry as follows:

dicdir=MYSQL_HOME/lib/mecab/dic/ipadic_utf-8

If you are using your own MeCab installation or have built MeCab from source, the default dicdir entry in the mecabrc file will differ, as will the dictionaries and their location.

Note

After the MeCab parser plugin is installed, you can use the mecab_charset status variable to view the character set used with MeCab. The three MeCab dictionaries provided with the MySQL binary support the following character sets.

-
The ipadic_euc-jp dictionary supports the ujis and eucjpms character sets.

-
The ipadic_sjis dictionary supports the sjis and cp932 character sets. cp932 support was added in MySQL 5.7.7.

-
The ipadic_utf-8 dictionary supports the utf8 and utf8mb4 character sets. utf8mb4 support was added in MySQL 5.7.7.

mecab_charset``ipadic_utf-8``utf8``utf8mb4``mecab_charset``utf8
4.
Restart MySQL.

5.
Install the MeCab parser plugin:

The MeCab parser plugin is installed using INSTALL PLUGIN syntax. The plugin name is mecab, and the shared library name is libpluginmecab.so. For additional information about installing plugins, see [Section 6.5.2, “Installing and Uninstalling Plugins”].

INSTALL PLUGIN mecab SONAME ‘libpluginmecab.so’;

Once installed, the MeCab parser plugin loads at every normal MySQL restart.

6.
Verify that the MeCab parser plugin is loaded using the SHOW PLUGINS statement.

1
mysql> SHOW PLUGINS;


A mecab plugin should appear in the list of plugins.

Creating a FULLTEXT Index that uses the MeCab Parser

To create a FULLTEXT index that uses the mecab parser, specify WITH PARSER ngram with CREATE TABLE, ALTER TABLE, or CREATE INDEX.

This example demonstrates creating a table with a mecab FULLTEXT index, inserting sample data, and viewing tokenized data in the INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE table:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> USE test;

mysql> CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body) WITH PARSER mecab
) ENGINE=InnoDB CHARACTER SET utf8;

mysql> SET NAMES utf8;

mysql> INSERT INTO articles (title,body) VALUES
('データベース管理','このチュートリアルでは、私はどのようにデータベースを管理する方法を紹介します'),
('データベースアプリケーション開発','データベースアプリケーションを開発することを学ぶ');

mysql> SET GLOBAL innodb_ft_aux_table="test/articles";

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_CACHE ORDER BY doc_id, position;

To add a FULLTEXT index to an existing table, you can use ALTER TABLE or CREATE INDEX. For example:

1
2
3
4
5
6
7
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT
) ENGINE=InnoDB CHARACTER SET utf8;

ALTER TABLE articles ADD FULLTEXT INDEX ft_index (title,body) WITH PARSER mecab;

Or:

1
CREATE FULLTEXT INDEX ft_index ON articles (title,body) WITH PARSER mecab;

MeCab Parser Space Handling

The MeCab parser uses spaces as separators in query strings. For example, the MeCab parser tokenizes ‘データベース管理’ as ‘データベース’ and ‘管理’.

MeCab Parser Stopword Handling

By default, the MeCab parser uses the default stopword list, which contains a short list of English stopwords. For a stopword list applicable to Japanese, you must create your own. For information about creating stopword lists, see[Section 13.9.4, “Full-Text Stopwords”].

For natural language mode search, the search term is converted to a union of tokens. For example, ‘データベース管理’ is converted to ‘データベース 管理’ .

1
SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('データベース管理' IN NATURAL LANGUAGE MODE);

For boolean mode search, the search term is converted to a search phrase. For example, ‘データベース管理’ is converted to ‘“データベース 管理”‘.

1
SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('データベース管理' IN BOOLEAN MODE);

Wildcard search terms are not tokenized. A search on ‘データベース管理*’ is performed on the prefix, ‘データベース管理’.

1
SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST('データベース*' IN BOOLEAN MODE);

Phrases are tokenized. For example, “データベース管理” is tokenized as “データベース 管理”.

SELECT COUNT(*) FROM articles WHERE MATCH(title,body) AGAINST(‘“データベース管理”‘ IN BOOLEAN MODE);

Installing MeCab From a Binary Distribution (Optional)

This section describes how to install mecab and mecab-ipadic from a binary distribution using a native package management utility. For example, on Fedora, you can use Yum to perform the installation:

1
yum mecab-devel

On Debian or Ubuntu, you can perform an APT installation:

1
2
apt-get install mecab
apt-get install mecab-ipadic

Installing MeCab From Source (Optional)

If you want to build mecab and mecab-ipadic from source, basic installation steps are provided below. For additional information, refer to the MeCab documentation.

1.
Download the tar.gz packages for mecab and mecab-ipadic from http://taku910.github.io/mecab/#download. As of February, 2016, the latest available packages are mecab-0.996.tar.gz and mecab-ipadic-2.7.0-20070801.tar.gz.

2.
Install mecab:

1
2
3
4
5
6
7
tar zxfv mecab-0.996.tar
cd mecab-0.996
./configure
make
make check
su
make install

3.
Install mecab-ipadic:

1
2
3
4
5
6
tar zxfv mecab-ipadic-2.7.0-20070801.tar
cd mecab-ipadic-2.7.0-20070801
./configure
make
su
make install

4.
Compile MySQL using the WITH_MECAB CMake option. Set the WITH_MECAB option to system if you have installedmecab and mecab-ipadic to the default location.

-DWITH_MECAB=system

If you defined a custom installation directory, set WITH_MECAB to the custom directory. For example:

-DWITH_MECAB=/path/to/mecab


原文地址 : [mysql] 13.9 Full-Text 搜索函数
本站是作者语雀文档的镜像站, 如对文章有任何疑问请移步语雀进行 提问

Mysql 二进制日志 binlog 介绍以及使用

介绍

binlog 就是binary log,二进制日志文件,这个文件记录了MySQL所有的DML操作。通过binlog日志我们可以做数据恢复,增量备份,主主复制和主从复制等等。对于开发者可能对binlog并不怎么关注,但是对于运维或者架构人员来讲是非常重要的。

开启

MySQL 5.7这个版本默认是不开启binlog日志的,具体的开启方式如下

my.conf主配置文件中直接添加三行

1
2
3
4
5
[mysqld]
log_bin=mysql-bin

# for mysql 5.7+
server-id=1
1
log-bin=mysql-bin

mysql 会根据这个配置自动设置 log_bin 为 on 状态,自动设置 log_bin_index 文件为你指定的文件名后跟 .index

server-id 这个参数对于 5.7+ 版本是必须指定的, 为了便于区分, 一般主服务器会设置为 1, 其他服务器根据数量多少以及网段来进行相应的定义.

配置好了之后重启 mysql 便可以查看二进制日志

1
2
3
4
5
6
7
8
9
10
11
mysql root@192.168.1.21:(none)> show variables like "%log_bin%"
+---------------------------------+--------------------------------+
| Variable_name | Value |
+---------------------------------+--------------------------------+
| log_bin | ON |
| log_bin_basename | /var/lib/mysql/mysql-bin |
| log_bin_index | /var/lib/mysql/mysql-bin.index |
| log_bin_trust_function_creators | OFF |
| log_bin_use_v1_row_events | OFF |
| sql_log_bin | ON |
+---------------------------------+--------------------------------+

下面我们可以找到这个目录 /var/lib/mysql 来看一下

1
2
3
4
5
6
...
-rw-r----- 1 mysql mysql 2729 3月 26 11:38 mysql-bin.000001
-rw-r----- 1 mysql mysql 188510 3月 26 11:44 mysql-bin.000002
-rw-r----- 1 mysql mysql 97946 3月 26 11:46 mysql-bin.000003
-rw-r----- 1 mysql mysql 83 3月 26 11:44 mysql-bin.index
...

至此便开启了 binlog

查询主服务器的状态

1
2
3
4
5
6
mysql root@192.168.1.21:(none)> show master status
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000003 | 5765231 | | | |
+------------------+----------+--------------+------------------+-------------------+

查询当前 binlog 的事件状态

1
2
3
4
5
6
7
8
9
10
11
12
13
mysql root@192.168.1.21:(none)> show binlog events
+------------------+------+----------------+-----------+-------------+---------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+------+----------------+-----------+-------------+---------------------------------------+
| mysql-bin.000001 | 4 | Format_desc | 1 | 123 | Server ver: 5.7.29-log, Binlog ver: 4 |
| mysql-bin.000001 | 123 | Previous_gtids | 1 | 154 | |
| mysql-bin.000001 | 154 | Anonymous_Gtid | 1 | 219 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS' |
| mysql-bin.000001 | 219 | Query | 1 | 294 | BEGIN |
| mysql-bin.000001 | 294 | Table_map | 1 | 653 | table_id: 109 (play_v1.user_profile) |
| mysql-bin.000001 | 653 | Update_rows | 1 | 2675 | table_id: 109 flags: STMT_END_F |
| mysql-bin.000001 | 2675 | Xid | 1 | 2706 | COMMIT /* xid=13 */ |
| mysql-bin.000001 | 2706 | Stop | 1 | 2729 | |
+------------------+------+----------------+-----------+-------------+---------------------------------------+

参考


原文地址 : Mysql 二进制日志 binlog 介绍以及使用
本站是作者语雀文档的镜像站, 如对文章有任何疑问请移步语雀进行 提问

mysql 模式匹配

匹配规则

模式 描述
. 匹配任意单个字符
[] 匹配括号内的任何字符, 也可以匹配一定范围的字符, [0-9] 数字
* 匹配该符号前面字符的零个或多个实例
+ 匹配该符号前面字符的一个或多个实例
? 匹配该符号前面字符的零个或多个
^ 跟在符号后边的内容必须为值的开头
$ 跟在符号前边的内容必须为值的结尾
&#124; 或匹配
{n} 重复N次
{n,m} n-m次

示例

like

1
2
3
4
# 不区分大小写
[SELECT caluse] WHERE column LIKE '%JAVA';
# 要想找出正好包含n个字符的名字,使用“_”模式字符:
[SELECT caluse] WHERE column LIKE '____';

REGEXP

1
2
3
4
5
6
# 文章中有 v 的, 不区分大小写
[SELECT caluse] WHERE column REGEXP 'v';
# v开始
[SELECT caluse] WHERE column REGEXP '^v';
# v结束
[SELECT caluse] WHERE column REGEXP 'v$';

MATCH AGAINST

1
2
3
4
SELECT url FROM document WHERE MATCH(title, content) AGAINST ('java');

# IN BOOLEAN MODE, 它使用的语法与搜索引擎使用的语法相同
SELECT url FROM document WHERE MATCH(content) AGAINST ('+java -MYSQL' IN BOOLEAN MODE );

运算符

符号 说明
+ 单词必须在被匹配的索引中
- 单词不允许出现在被匹配的索引中
~ 单词的相关程度逆置
< 降低该单词对整个匹配模式的贡献
> 提高该单词对整个匹配模式的贡献
* 该运算符紧跟在所修饰的单词之后, 它于通配符很相似, 匹配以指定的单词开头的任意单词
() 将单词分组为子表达式

原文地址 : mysql 模式匹配
本站是作者语雀文档的镜像站, 如对文章有任何疑问请移步语雀进行 提问

mysql 事务

事务处理在各种管理系统中都有着广泛的应用,比如人员管理系统,很多同步数据库操作大都需要用到事务处理。比如说,在人员管理系统中,你删除一个人员,你即需要删除人员的基本资料,也要删除和该人员相关的信息,如信箱,文章等等,这样,这些数据库操作语句就构成一个事务!
删除的SQL语句

阅读更多

mysql 运算符

运算符

运算符 上下文 描述
+ 算术 加法
- 算术 减法
* 算术 乘法
/ 算术 除法
= 比较 相等
<>, != 比较 不相等
< 比较 小于
> 比较 大于
<= 比较 小于或等于
>= 比较 大于或等于
AND 逻辑
OR 逻辑
NOT 逻辑

运算符优先级

  1. BINARY

  2. NOT, !

    • (负号)
  3. *, /, %

  4. +, -

  5. <<, >>

  6. &

  7. |

  8. <, <=, >, >=, =, <=>, <>, IN, IS, LIKE, REGEXP, RLIKE

  9. BETWEEN

  10. AND, &&

  11. OR, ||

null 运算符

1
SELECT title FROM books WHERE author IS NOT NULL

原文地址 : mysql 运算符
本站是作者语雀文档的镜像站, 如对文章有任何疑问请移步语雀进行 提问

[转]MySQL配置文件my.cnf 例子最详细翻译

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
#BEGIN CONFIG INFO
#DESCR: 4GB RAM, 只使用InnoDB, ACID, 少量的连接, 队列负载大
#TYPE: SYSTEM
#END CONFIG INFO

#
# 此mysql配置文件例子针对4G内存。
# 主要使用INNODB
#处理复杂队列并且连接数量较少的mysql服务器
#
# 将此文件复制到/etc/my.cnf 作为全局设置,
# mysql-data-dir/my.cnf 作为服务器指定设置
# (@localstatedir@ for this installation) 或者放入
# ~/.my.cnf 作为用户设置.
#
# 在此配置文件中, 你可以使用所有程序支持的长选项.
# 如果想获悉程序支持的所有选项
# 请在程序后加上"--help"参数运行程序.
#
# 关于独立选项更多的细节信息可以在手册内找到
#

#
# 以下选项会被MySQL客户端应用读取.
# 注意只有MySQL附带的客户端应用程序保证可以读取这段内容.
# 如果你想你自己的MySQL应用程序获取这些值
# 需要在MySQL客户端库初始化的时候指定这些选项

#
[client]
#password = [your_password]
port = @MYSQL_TCP_PORT@
socket = @MYSQL_UNIX_ADDR@

# *** 应用定制选项 ***

#
# MySQL 服务端
#
[mysqld]

# 一般配置选项
port = @MYSQL_TCP_PORT@
socket = @MYSQL_UNIX_ADDR@

# back_log 是操作系统在监听队列中所能保持的连接数,
# 队列保存了在MySQL连接管理器线程处理之前的连接.
# 如果你有非常高的连接率并且出现"connection refused" 报错,
# 你就应该增加此处的值.
# 检查你的操作系统文档来获取这个变量的最大值.
# 如果将back_log设定到比你操作系统限制更高的值,将会没有效果
back_log = 50

# 不在TCP/IP端口上进行监听.
# 如果所有的进程都是在同一台服务器连接到本地的mysqld,
# 这样设置将是增强安全的方法
# 所有mysqld的连接都是通过Unix sockets 或者命名管道进行的.
# 注意在windows下如果没有打开命名管道选项而只是用此项
# (通过 "enable-named-pipe" 选项) 将会导致mysql服务没有任何作用!
#skip-networking

# MySQL 服务所允许的同时会话数的上限
# 其中一个连接将被SUPER权限保留作为管理员登录.
# 即便已经达到了连接数的上限.
max_connections = 100
# 每个客户端连接最大的错误允许数量,如果达到了此限制.
# 这个客户端将会被MySQL服务阻止直到执行了"FLUSH HOSTS" 或者服务重启
# 非法的密码以及其他在链接时的错误会增加此值.
# 查看 "Aborted_connects" 状态来获取全局计数器.
max_connect_errors = 10

# 所有线程所打开表的数量.
# 增加此值就增加了mysqld所需要的文件描述符的数量
# 这样你需要确认在[mysqld_safe]中 "open-files-limit" 变量设置打开文件数量允许至少4096
table_cache = 2048

# 允许外部文件级别的锁. 打开文件锁会对性能造成负面影响
# 所以只有在你在同样的文件上运行多个数据库实例时才使用此选项(注意仍会有其他约束!)
# 或者你在文件层面上使用了其他一些软件依赖来锁定MyISAM表
#external-locking

# 服务所能处理的请求包的最大大小以及服务所能处理的最大的请求大小(当与大的BLOB字段一起工作时相当必要)
# 每个连接独立的大小.大小动态增加
max_allowed_packet = 16M

# 在一个事务中binlog为了记录SQL状态所持有的cache大小
# 如果你经常使用大的,多声明的事务,你可以增加此值来获取更大的性能.
# 所有从事务来的状态都将被缓冲在binlog缓冲中然后在提交后一次性写入到binlog中
# 如果事务比此值大, 会使用磁盘上的临时文件来替代.
# 此缓冲在每个连接的事务第一次更新状态时被创建
binlog_cache_size = 1M

# 独立的内存表所允许的最大容量.
# 此选项为了防止意外创建一个超大的内存表导致永尽所有的内存资源.
max_heap_table_size = 64M

# 排序缓冲被用来处理类似ORDER BY以及GROUP BY队列所引起的排序
# 如果排序后的数据无法放入排序缓冲,
# 一个用来替代的基于磁盘的合并分类会被使用
# 查看 "Sort_merge_passes" 状态变量.
# 在排序发生时由每个线程分配
sort_buffer_size = 8M

# 此缓冲被使用来优化全联合(full JOINs 不带索引的联合).
# 类似的联合在极大多数情况下有非常糟糕的性能表现,
# 但是将此值设大能够减轻性能影响.
# 通过 "Select_full_join" 状态变量查看全联合的数量
# 当全联合发生时,在每个线程中分配
join_buffer_size = 8M

# 我们在cache中保留多少线程用于重用
# 当一个客户端断开连接后,如果cache中的线程还少于thread_cache_size,
# 则客户端线程被放入cache中.
# 这可以在你需要大量新连接的时候极大的减少线程创建的开销
# (一般来说如果你有好的线程模型的话,这不会有明显的性能提升.)
thread_cache_size = 8

# 此允许应用程序给予线程系统一个提示在同一时间给予渴望被运行的线程的数量.
# 此值只对于支持 thread_concurrency() 函数的系统有意义( 例如Sun Solaris).
# 你可可以尝试使用 [CPU数量]*(2..4) 来作为thread_concurrency的值
thread_concurrency = 8

# 查询缓冲常被用来缓冲 SELECT 的结果并且在下一次同样查询的时候不再执行直接返回结果.
# 打开查询缓冲可以极大的提高服务器速度, 如果你有大量的相同的查询并且很少修改表.
# 查看 "Qcache_lowmem_prunes" 状态变量来检查是否当前值对于你的负载来说是否足够高.
# 注意: 在你表经常变化的情况下或者如果你的查询原文每次都不同,
# 查询缓冲也许引起性能下降而不是性能提升.
query_cache_size = 64M

# 只有小于此设定值的结果才会被缓冲
# 此设置用来保护查询缓冲,防止一个极大的结果集将其他所有的查询结果都覆盖.
query_cache_limit = 2M

# 被全文检索索引的最小的字长.
# 你也许希望减少它,如果你需要搜索更短字的时候.
# 注意在你修改此值之后,
# 你需要重建你的 FULLTEXT 索引
ft_min_word_len = 4

# 如果你的系统支持 memlock() 函数,你也许希望打开此选项用以让运行中的mysql在在内存高度紧张的时候,数据在内存中保持锁定并且防止可能被swapping out
# 此选项对于性能有益
#memlock

# 当创建新表时作为默认使用的表类型,
# 如果在创建表示没有特别执行表类型,将会使用此值
default_table_type = MYISAM

# 线程使用的堆大小. 此容量的内存在每次连接时被预留.
# MySQL 本身常不会需要超过64K的内存
# 如果你使用你自己的需要大量堆的UDF函数
# 或者你的操作系统对于某些操作需要更多的堆,
# 你也许需要将其设置的更高一点.
thread_stack = 192K

# 设定默认的事务隔离级别.可用的级别如下:
# READ-UNCOMMITTED, READ-COMMITTED, REPEATABLE-READ, SERIALIZABLE
transaction_isolation = REPEATABLE-READ

# 内部(内存中)临时表的最大大小
# 如果一个表增长到比此值更大,将会自动转换为基于磁盘的表.
# 此限制是针对单个表的,而不是总和.
tmp_table_size = 64M

# 打开二进制日志功能.
# 在复制(replication)配置中,作为MASTER主服务器必须打开此项
# 如果你需要从你最后的备份中做基于时间点的恢复,你也同样需要二进制日志.
log-bin=mysql-bin

# 如果你在使用链式从服务器结构的复制模式 (A->B->C),
# 你需要在服务器B上打开此项.
# 此选项打开在从线程上重做过的更新的日志,
# 并将其写入从服务器的二进制日志.
#log_slave_updates

# 打开全查询日志. 所有的由服务器接收到的查询 (甚至对于一个错误语法的查询)
# 都会被记录下来. 这对于调试非常有用, 在生产环境中常常关闭此项.
#log

# 将警告打印输出到错误log文件. 如果你对于MySQL有任何问题
# 你应该打开警告log并且仔细审查错误日志,查出可能的原因.
#log_warnings

# 记录慢速查询. 慢速查询是指消耗了比 "long_query_time" 定义的更多时间的查询.
# 如果 log_long_format 被打开,那些没有使用索引的查询也会被记录.
# 如果你经常增加新查询到已有的系统内的话. 一般来说这是一个好主意,
log_slow_queries

# 所有的使用了比这个时间(以秒为单位)更多的查询会被认为是慢速查询.
# 不要在这里使用"1", 否则会导致所有的查询,甚至非常快的查询页被记录下来(由于MySQL 目前时间的精确度只能达到秒的级别).
long_query_time = 2

# 在慢速日志中记录更多的信息.
# 一般此项最好打开.
# 打开此项会记录使得那些没有使用索引的查询也被作为到慢速查询附加到慢速日志里
log_long_format

# 此目录被MySQL用来保存临时文件.例如,
# 它被用来处理基于磁盘的大型排序,和内部排序一样.
# 以及简单的临时表.
# 如果你不创建非常大的临时文件,将其放置到 swapfs/tmpfs 文件系统上也许比较好
# 另一种选择是你也可以将其放置在独立的磁盘上.
# 你可以使用";"来放置多个路径
# 他们会按照roud-robin方法被轮询使用.
#tmpdir = /tmp


# *** 复制有关的设置


# 唯一的服务辨识号,数值位于 1 到 2^32-1之间.
# 此值在master和slave上都需要设置.
# 如果 "master-host" 没有被设置,则默认为1, 但是如果忽略此选项,MySQL不会作为master生效.
server-id = 1

# 复制的Slave (去掉master段的注释来使其生效)
#
# 为了配置此主机作为复制的slave服务器,你可以选择两种方法:
#
# 1) 使用 CHANGE MASTER TO 命令 (在我们的手册中有完整描述) -
# 语法如下:
#
# CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
# MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
#
# 你需要替换掉 <host>, <user>, <password> 等被尖括号包围的字段以及使用master的端口号替换<port> (默认3306).
#
# 例子:
#
# CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
# MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# 或者
#
# 2) 设置以下的变量. 不论如何, 在你选择这种方法的情况下, 然后第一次启动复制(甚至不成功的情况下,
# 例如如果你输入错密码在master-password字段并且slave无法连接),
# slave会创建一个 master.info 文件,并且之后任何对于包含在此文件内的参数的变化都会被忽略
# 并且由 master.info 文件内的内容覆盖, 除非你关闭slave服务, 删除 master.info 并且重启slave 服务.
# 由于这个原因,你也许不想碰一下的配置(注释掉的) 并且使用 CHANGE MASTER TO (查看上面) 来代替
#
# 所需要的唯一id号位于 2 和 2^32 - 1之间
# (并且和master不同)
# 如果master-host被设置了.则默认值是2
# 但是如果省略,则不会生效
#server-id = 2
#
# 复制结构中的master - 必须
#master-host = <hostname>
#
# 当连接到master上时slave所用来认证的用户名 - 必须
#master-user = <username>
#
# 当连接到master上时slave所用来认证的密码 - 必须
#master-password = <password>
#
# master监听的端口.
# 可选 - 默认是3306
#master-port = <port>

# 使得slave只读.只有用户拥有SUPER权限和在上面的slave线程能够修改数据.
# 你可以使用此项去保证没有应用程序会意外的修改slave而不是master上的数据
#read_only


#*** MyISAM 相关选项


# 关键词缓冲的大小, 一般用来缓冲MyISAM表的索引块.
# 不要将其设置大于你可用内存的30%,
# 因为一部分内存同样被OS用来缓冲行数据
# 甚至在你并不使用MyISAM 表的情况下, 你也需要仍旧设置起 8-64M 内存由于它同样会被内部临时磁盘表使用.
key_buffer_size = 32M

# 用来做MyISAM表全表扫描的缓冲大小.
# 当全表扫描需要时,在对应线程中分配.
read_buffer_size = 2M

# 当在排序之后,从一个已经排序好的序列中读取行时,行数据将从这个缓冲中读取来防止磁盘寻道.
# 如果你增高此值,可以提高很多ORDER BY的性能.
# 当需要时由每个线程分配
read_rnd_buffer_size = 16M

# MyISAM 使用特殊的类似树的cache来使得突发插入
# (这些插入是,INSERT SELECT, INSERT VALUES (), (), , 以及 LOAD DATA
# INFILE) 更快. 此变量限制每个进程中缓冲树的字节数.
# 设置为 0 会关闭此优化.
# 为了最优化不要将此值设置大于 "key_buffer_size".
# 当突发插入被检测到时此缓冲将被分配.
bulk_insert_buffer_size = 64M

# 此缓冲当MySQL需要在 REPAIR, OPTIMIZE, ALTER 以及 LOAD DATA INFILE 到一个空表中引起重建索引时被分配.
# 这在每个线程中被分配.所以在设置大值时需要小心.
myisam_sort_buffer_size = 128M

# MySQL重建索引时所允许的最大临时文件的大小 (当 REPAIR, ALTER TABLE 或者 LOAD DATA INFILE).
# 如果文件大小比此值更大,索引会通过键值缓冲创建(更慢)
myisam_max_sort_file_size = 10G

# 如果被用来更快的索引创建索引所使用临时文件大于制定的值,那就使用键值缓冲方法.
# 这主要用来强制在大表中长字串键去使用慢速的键值缓冲方法来创建索引.
myisam_max_extra_sort_file_size = 10G

# 如果一个表拥有超过一个索引, MyISAM 可以通过并行排序使用超过一个线程去修复他们.
# 这对于拥有多个CPU以及大量内存情况的用户,是一个很好的选择.
myisam_repair_threads = 1

# 自动检查和修复没有适当关闭的 MyISAM 表.
myisam_recover


# 默认关闭 Federated
skip-federated

# *** BDB 相关选项 ***

# 如果你运行的MySQL服务有BDB支持但是你不准备使用的时候使用此选项. 这会节省内存并且可能加速一些事.
skip-bdb


# *** INNODB 相关选项 ***

# 如果你的MySQL服务包含InnoDB支持但是并不打算使用的话,
# 使用此选项会节省内存以及磁盘空间,并且加速某些部分
#skip-innodb

# 附加的内存池被InnoDB用来保存 metadata 信息
# 如果InnoDB为此目的需要更多的内存,它会开始从OS这里申请内存.
# 由于这个操作在大多数现代操作系统上已经足够快, 你一般不需要修改此值.
# SHOW INNODB STATUS 命令会显示当先使用的数量.
innodb_additional_mem_pool_size = 16M

# InnoDB使用一个缓冲池来保存索引和原始数据, 不像 MyISAM.
# 这里你设置越大,你在存取表里面数据时所需要的磁盘I/O越少.
# 在一个独立使用的数据库服务器上,你可以设置这个变量到服务器物理内存大小的80%
# 不要设置过大,否则,由于物理内存的竞争可能导致操作系统的换页颠簸.
# 注意在32位系统上你每个进程可能被限制在 2-3.5G 用户层面内存限制,
# 所以不要设置的太高.
innodb_buffer_pool_size = 2G

# InnoDB 将数据保存在一个或者多个数据文件中成为表空间.
# 如果你只有单个逻辑驱动保存你的数据,一个单个的自增文件就足够好了.
# 其他情况下.每个设备一个文件一般都是个好的选择.
# 你也可以配置InnoDB来使用裸盘分区 - 请参考手册来获取更多相关内容
innodb_data_file_path = ibdata1:10M:autoextend

# 设置此选项如果你希望InnoDB表空间文件被保存在其他分区.
# 默认保存在MySQL的datadir中.
#innodb_data_home_dir = <directory>

# 用来同步IO操作的IO线程的数量. This value is
# 此值在Unix下被硬编码为4,但是在Windows磁盘I/O可能在一个大数值下表现的更好.
innodb_file_io_threads = 4

# 如果你发现InnoDB表空间损坏, 设置此值为一个非零值可能帮助你导出你的表.
# 从1开始并且增加此值知道你能够成功的导出表.
#innodb_force_recovery=1

# 在InnoDb核心内的允许线程数量.
# 最优值依赖于应用程序,硬件以及操作系统的调度方式.
# 过高的值可能导致线程的互斥颠簸.
innodb_thread_concurrency = 16

# 如果设置为1 ,InnoDB会在每次提交后刷新(fsync)事务日志到磁盘上,
# 这提供了完整的ACID行为.
# 如果你愿意对事务安全折衷, 并且你正在运行一个小的食物, 你可以设置此值到0或者2来减少由事务日志引起的磁盘I/O
# 0代表日志只大约每秒写入日志文件并且日志文件刷新到磁盘.
# 2代表日志写入日志文件在每次提交后,但是日志文件只有大约每秒才会刷新到磁盘上.
innodb_flush_log_at_trx_commit = 1

# 加速InnoDB的关闭. 这会阻止InnoDB在关闭时做全清除以及插入缓冲合并.
# 这可能极大增加关机时间, 但是取而代之的是InnoDB可能在下次启动时做这些操作.
#innodb_fast_shutdown

# 用来缓冲日志数据的缓冲区的大小.
# 当此值快满时, InnoDB将必须刷新数据到磁盘上.
# 由于基本上每秒都会刷新一次,所以没有必要将此值设置的太大(甚至对于长事务而言)

innodb_log_buffer_size = 8M

# 在日志组中每个日志文件的大小.
# 你应该设置日志文件总合大小到你缓冲池大小的25%~100%
# 来避免在日志文件覆写上不必要的缓冲池刷新行为.
# 不论如何, 请注意一个大的日志文件大小会增加恢复进程所需要的时间.
innodb_log_file_size = 256M

# 在日志组中的文件总数.
# 通常来说2~3是比较好的.
innodb_log_files_in_group = 3

# InnoDB的日志文件所在位置. 默认是MySQL的datadir.
# 你可以将其指定到一个独立的硬盘上或者一个RAID1卷上来提高其性能
#innodb_log_group_home_dir

# 在InnoDB缓冲池中最大允许的脏页面的比例.
# 如果达到限额, InnoDB会开始刷新他们防止他们妨碍到干净数据页面.
# 这是一个软限制,不被保证绝对执行.
innodb_max_dirty_pages_pct = 90

# InnoDB用来刷新日志的方法.
# 表空间总是使用双重写入刷新方法
# 默认值是 "fdatasync", 另一个是 "O_DSYNC".
#innodb_flush_method=O_DSYNC

# 在被回滚前,一个InnoDB的事务应该等待一个锁被批准多久.
# InnoDB在其拥有的锁表中自动检测事务死锁并且回滚事务.
# 如果你使用 LOCK TABLES 指令, 或者在同样事务中使用除了InnoDB以外的其他事务安全的存储引擎
# 那么一个死锁可能发生而InnoDB无法注意到.
# 这种情况下这个timeout值对于解决这种问题就非常有帮助.
innodb_lock_wait_timeout = 120


[mysqldump]
# 不要在将内存中的整个结果写入磁盘之前缓存. 在导出非常巨大的表时需要此项
quick

max_allowed_packet = 16M

[mysql]
no-auto-rehash

# 仅仅允许使用键值的 UPDATEs 和 DELETEs .
#safe-updates

[isamchk]
key_buffer = 512M
sort_buffer_size = 512M
read_buffer = 8M
write_buffer = 8M

[myisamchk]
key_buffer = 512M
sort_buffer_size = 512M
read_buffer = 8M
write_buffer = 8M

[mysqlhotcopy]
interactive-timeout

[mysqld_safe]
# 增加每个进程的可打开文件数量.
# 警告: 确认你已经将全系统限制设定的足够高!
# 打开大量表需要将此值设b
open-files-limit = 8192

原文地址 : [转]MySQL配置文件my.cnf 例子最详细翻译
本站是作者语雀文档的镜像站, 如对文章有任何疑问请移步语雀进行 提问

mysql - 自带的命令行工具说明及用法

MYSQL服务器和服务器启动脚本

-
mysqld

MySQL服务器

-
mysqld_safe, mysql.server

服务器启动脚本

直接运行mysqld程序来启动MySQL服务的方法很少见,mysqld_safe脚本会在启动MySQL服务器后继续监控其运行情况,并在其死机时重新启动它。用mysqld_safe脚本来启动MySQL服务器的做法在BSD风格的unix系统上很常见,非BSD风格的UNIX系统中的mysql.server脚本其实也是调用mysqld_safe脚本去启动MySQL服务器的

-
mysqld-max

扩展

-
mysqld_multi

可以启动或停止系统上安装的多个服务器

-
mysql_install_db

初始化数据目录和初始数据库

访问服务器的客户程序

-
mysql

命令行客户程序,用于交互式或以批处理模式执行SQL语句

-
mysqladmin

用于管理功能的客户程序, 管理数据库服务器, 包括数据库的创建和删除

-
mysqlcheck

执行表维护操作

-
mysqldump

负责数据库备份, 将数据库或表的定义和内容转储到文件

-
mysqlhotcopy

负责数据库热备份

-
mysqlimport

导入数据文件

-
mysqlshow

显示信息数据库和表的相关信息

独立于服务器操作的工具程序

-
myisamchk

描述、检查、优化和维护MyISAM表的实用工具

-
myisampack

产生压缩、只读的表

-
mysqlbinlog

是处理二进制日志文件的实用工具

-
perror

显示错误代码的含义


原文地址 : mysql - 自带的命令行工具说明及用法
本站是作者语雀文档的镜像站, 如对文章有任何疑问请移步语雀进行 提问