Working with Boolean Queries in MySQL/PHP

Boolean mode queries became available in MySQL in version 4, and allow expressions to make use of a complex set of boolean rules to refine their searches. These queries are very powerful when applied to fulltext searching and sorting of results. The MySQL/PHP search tool I've created is very basic, and only takes advantage of the most basic operators in Boolean mode searching. This document goes a bit more into depth on the power available to Boolean mode queries.

The basic format of a boolean mode query (a subset of fulltext queries) is as follows:

SELECT * FROM mytable WHERE MATCH(title,content)

This query performs a SELECT retrieving all fields in "mytable" where the fields "title" and "content" are MATCHed AGAINST your search terms, with the additional specification to place the query into boolean mode. Without the "IN BOOLEAN MODE" modifier, you are performing a natural full text search which will search for your search terms as a phrase. This natural language search is useful, since it applies a list of common "stop words" which it eliminates from your search, as well as ignoring any term present in more than 50 percent of the rows.

All full text searches are aided by providing a FULLTEXT index on the searched table - but this is not required, simply more efficient. Fulltext searches can only be performed on MySQL tables of the type MyISAM.

MySQL's Boolean Operators

These definitions are modified from the MySQL 5.0 manual pages on fulltext boolean searching.

+ (plus)
Indicates that a term is REQUIRED to be present in each returned row.
- (minus)
Indicates that the term must NOT be present in each row returned. Only acts to exclude rows which are matched by other search terms. Used by itself, the minus operator will return an empty set.
(no operator)
If no operator is specified, the term is optional. Its presence, however, will give that row a higher relevancy in the search results.
< (less than) and > (greater than)
Used to increase or decrease the relevancy ranking of a term. > increases the value of that term and < decreases that value.
( ) (parentheses)
Parentheses are used to group subexpressions, and can be nested.
~ (tilde)
Similar to the minus operator, the tilde causes the presence of a term to decrease the relevancy of a returned row, but does not exclude that row from the results.
* (asterisk)
The asterisk is a wildcard operator - it must be appended to a word, instead of preceding it, and will match any term which begins with that sequence of characters.
“ ” (quotation characters)
Quotations will require the enclosed phrase to be matched exactly as typed. Prior to MySQL 5.03, word characters and non-word characters were matched exactly. As of MySQL 5.03 and later, non-word characters no longer need to match exactly. Example: "test phrase" can now return "test,phrase".

Applying MySQL's Boolean Operators

Combining these operators in various ways can provide logically precise or highly relevant searches. For example, in the following query, we want to find records which are about searching, but we're not really interested in discussions of search terms - we're more interested in search engines. However, we know that these terms are frequently grouped together. We don't really want to exclude anything, since these terms appear together quite frequently - but "terms" is definitely less important to us.

SELECT * FROM mytable WHERE MATCH(title,content)
AGAINST('+search* ~term' IN BOOLEAN MODE)

The above query will return all records containing the word "search" including terms stemming from "search", such as "searching", "searched", and so on. Records which also include the word "term" will be included in the search, but will be less relevant. Boolean mode searching does not, however, automatically sort by relevance. In order to order this by relevance, you will need to add an ORDER BY specification to the query:

SELECT * FROM mytable WHERE MATCH(title,content)
AGAINST('+search* ~term' IN BOOLEAN MODE) ORDER BY relevance DESC

The query will now be sorted providing the best results first - that is, those results containing the required phrase the most times and the undesired term the fewest times.

Warning: include(includes/amazon468x60.php) [function.include]: failed to open stream: No such file or directory in /home/joedolso/www/www/boolean-query-in-mysql.php on line 108

Warning: include(includes/amazon468x60.php) [function.include]: failed to open stream: No such file or directory in /home/joedolso/www/www/boolean-query-in-mysql.php on line 108

Warning: include() [function.include]: Failed opening 'includes/amazon468x60.php' for inclusion (include_path='.:/usr/local/php53/lib/php') in /home/joedolso/www/www/boolean-query-in-mysql.php on line 108

A particularly confusing thing to understand is how MySQL Boolean searching applies relevance. In order of relevance, the operators are + (absolutely required), > (preferred), < (less important), [no operator] (optional), ~ (undesirable, but not excluded), and - (excluded entirely). Deciding on your preference between < and ~ can be quite difficult. What's important is that < will increase the relevance of a term, where as the ~ will actually decrease the relevance. The < simply increases it very minimally. Either ~ or - will return no results when used by themselves - < will still return something.

For detailed information on MySQL's boolean relevancy scoring, I'd recommend reading Using Fulltext Indexes in MySQL - Part 2, Boolean searches, by Ian Gilfillan. Part 1 is also valuable, but his discussion of relevancy can be particularly useful!

For Further Reading:

If you want better information on MySQL's advanced uses and techniques, I recommend buying a book such as High Performance MySQL, by Jeremy Zawodny and Derek Balling or the MySQL Cookbook, by Paul DuBois. Either of these will give you great information - the first book is more about specific methods for improving the performance of your MySQL queries, the second will give you valuable solutions to a wide variety of MySQL programming problems.