| Item |
Description |
| UTF-8 support |
Index and search for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. |
| Media support |
Index and search for images, audio and video. EXIF and ID3 information are also indexed and herewith become searchable, thumbnails for images, open the media with player software. |
| Multiple database support |
Individual config and activation of databases for Admin, Search User and Suggest URL User. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. |
| Result cache |
Extremely reduced response time for queries already cached. Controller to keep the 'Most Popular Queries' always in cache. Separate caches for text and media results. Admin configurable. |
| Multithreaded indexing |
In order to reduce the time for indexing, 1-10 parallel running threads are to be activated in Admin settings. |
Stemming algorithms implemented for 15 languages |
Admin selectable for: Bulgarian, Chinese, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Russian, Spanish and Swedish. |
| RDF, RSD, RSS and Atom feed support |
Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives. |
| Various search modes |
Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific). |
9 different modes of sorting the result listing |
Admin selectable: -By relevance (weight %) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted catchwords on top. |
| Same results for queries with and without quotes |
Admin selectable; will deliver the same results for queries like: d'information <-> information and dei'largi <-> largi Admin selectabe: equalization for different quotes like: ' ` ´ |
| Extensive user statistics |
Search log, Most popular text links, Most popular media links, User IP, Country code, Host name, Last queried, Top keywords, etc. |
| Segmentation of Chinese and Korean words |
Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 , so that all will become searchable. Dictionaries with 106.800 radicals. |
| Segmentation of Japanese words |
Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems. |
| Transliterate Latin characters into their Greek equivalents |
Transforms query input alla to find ἀλλὰ and baptismatos to find βαπτίσματος Also accepts queries containing Greek vowels without accents. |
| Index of password protected sites |
Index also .htacces protected sites (basic authorization). Up to 3 different zones could be registered and will be indexed. |
Index framesets and iframes |
If enabled, both options will index html and image frames. Not available for dynamically reloaded frames (e.g. JavaScript). |
| Follow redirections and cannonical links |
Automatical forwarding for the indexer. |
Index of RAR and ZIP compressed files and archives |
Supports compressed (X)HTML, XML and also PDFs, all kind of feeds, frames and iframes in archives. Links found in the compressed files are followed. |
Included converter for PDF, ODT, ODS, CSV and XLS files |
Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed. |
| Debug mode |
Offering detailed information during index/re-index: New links, keywords, frames and media found per link. |
| Follow Sitemap files |
If available sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple Sitemap files are processed. |
Common word lists holding stop words. Included for 24 languages |
Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish. |
| Erase & Re-index |
Individual (site specific) or bulk update of database. |
| Use of whitelist and blacklist |
Page must contain any / all of the words in whitelist. Admin selectable is also a blacklist, holding words to prevent indexing of pages containing these forbidden words. |
Ignore parts of a site. <div> id value driven |
A common list of div id values is used to ignore parts of a page. Content between <div id='this_value'> and </div> will be ignored, however links in it are followed. Multiple and nested divs will be attended. |
| User URL suggest |
User may suggest new sites to become part of database. With Admin approval, reject and banned domains manager. |