Do you need more than one second to choose PHP search engine?


There’s a new article @ O’Reilly OnLamp.com — “search engine showdown” — that compares search engines for site. Here’s short excerpt to get the general idea of what they are all about:

iSearch has an excellent range of options for the needs of nearly any site, yet the core functions are encrypted and highly unchangeable. Also, in testing, the spider would trap itself in a loop or unreachable page every 20 minutes or so, making a cron-based update most unreliable.


MnogoSearch is quite powerful and versatile, but unlike most of its PHP-minded competitors, it must be compiled before usage and has the most substantial learning curve. It is immediately compatible with every major database, including SQLite, and comes with front ends for PHP, C, and Perl. There is a command-line interface to perform all maintenance and indexing; once you have configured it correctly, it is also useful for automation. It has a wide variety of features, including searches of your site, FTP archive searches, news article and newspaper searches, and more.

PHPDig uses a MySQL database, building a glossary with words from the pages you index. The search result displays the pages ranked by keyword density. Though PHPDig’s fame and clean code would suggest otherwise, this search engine is far from being one of the best available. The indexing speed is quite slow, especially in comparison with MnogoSearch or RiSearch. It’s overflowing with features and plugins for any format of data and has built-in index scheduling routines.

RiSearch is powerful and has a very fast search script, designed to work with hundreds of megabytes of text data. It does not use libraries or databases but is Perl code with PHP front ends. RiSearch is surprisingly fast to search for a file-based storage back end. However, this affects the search result relevancy, which is poorer than other options. It is therefore better for finding unique phrases, like names of species, than for searching concepts.

Sphider is PHP code that uses MySQL for indexing pages. It works for sites up to 20,000 pages. It also works great as a tool for site analysis, such as finding broken links and gathering statistics about the site. It has an efficient back end and search algorithm, but its crawling methods function poorly.

TSEP causes a long delay when executing the crawler if the data to index is extensive. This was a problem on one server with time-out/keep-alive of 8/15, though adding ignore_user_abort() to the top of indexer.php bypasses it.

My personal experience with PHPDig was totally positive, it shows really fast indexing. But today I’d better stick with Lucene and Zend_Search_Lucene and advice you so.