|UTF-8 and UTF-16 support
||Index pro'gram and search for Chinese, Cyrillic, Georgian, Hebrew etc. charsets.
|Support for non-ASCII domains
||'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and processed.
||Index and search for images, audio and video. EXIF and ID3 information are also indexed and herewith become searchable, thumbnails for images, open the media with player software.
|Multiple database support
||Individual configuration and activation of databases for Admin, Search User and Suggest URL User. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases.
Individual Admin settings for each db and each set of tables.
||Extremely reduced response time for queries already cached. Controller to keep the 'Most Popular Queries' always in cache. Separate caches for text and media results. Admin configurable.
|Follow Sitemap files
||If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . >
is detected, also multiple Sitemap files are processed.
||Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month.
||In order to reduce the time for indexing, 1-10 parallel running threads might be activated in Admin settings.
||While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed..
|Erase & Re-index
||Individual (site specific) or bulk update of database.
|RDF, RSD, RSS and Atom feed support
|| Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds.
Follow / ignore CDATA directives.
|Various search modes
||Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific).
|Show thumbnails of each page presented in text results
||Admin selectable, this feature will present a web shot as part of the text result listing. Created during index procedure for the URLs, which had been indexed on the Internet.
|Prevent indexing of known malware and pishing pages
||This feature is supplied by a Google web service, to prevent indexing of pages, suspected to contain malware or phishing content.
|9 different modes of
sorting the text results
|Admin selectable: -By relevance (weight %) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted catchwords on top.
|5 different modes of
sorting the media results
|Admin selectable: -By title(alphabetic) -By file suffix
-By image size -By 'Last queried' -By 'Most popular'.
|Same results for queries typed with pure vowels
or with accents
|Will deliver the same results for queries like: cafe and café.
To be activated in Admin backend.
|Same results for queries with and without quotes
||Will deliver the same results for queries like:
d'information <-> information dei'largi <-> largi
Also Admin selectabe: Equalize the different quotes like: ' ` ´
|Same text results for queries with and without ligatures
||Admin selectable; will deliver the same results for queries like:
cœur and coeur. Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures.
|Extensive user statistics
||Search log, Most popular text links, Most popular media links, User IP, Country code, Host name, Last queried, Top keywords, etc.
|Segmentation of Chinese and Korean words
||Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 , so that all will become searchable. Dictionaries with 106.800 radicals.
|Segmentation of Japanese words
||Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems.
|Transliterate Latin characters into their Greek equivalents
||Transforms query input alla to find ἀλλὰ and baptismatos to find βαπτίσματος
Also accepts queries containing Greek vowels without accents.
|Index of password protected sites
||Index also .htaccess protected sites (basic authorization).
Up to 3 different zones could be registered and will be indexed.
|If enabled, both options will index html and image frames.
|Follow HTTP redirections
|Follow header redirections, refresh tags and cannonical links
||Automatical forwarding for the indexer.
document.write(' <a href="new12.pdf">All news 2012</a> ');
and index the content of:
document.write(' this content ');
|Accept gzip formatted transmission
||In order to reduce the transfer time, gzip formatted content is requested by the crawler.
|Index of RAR and ZIP
compressed files and archives
|Supports compressed (X)HTML, XML and also PDFs, all kind of feeds, frames and iframes in archives.
Links found in the compressed files are followed.
|Converter included for
PDF, DOCX, XLSX, ODT, ODS, CSV and XLS files
|Converting also non-Latin text like:
Arabic, Cyrillic, Chinese, Greece and Hebrew.
Links found in the converted files will be followed.
Separated PDF converter for 32 and 64 bit Operating Systems.
||Offering detailed information during index/re-index:
New links, keywords, frames and media found per link.
To be activated separately for Admin backend and User interface.
|Common word lists
holding stop words.
Included for 25 languages
|Admin selectable for:
Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish.
implemented for 15 languages
|Admin selectable for: Bulgarian, Chinese, Cyrillic, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Spanish and Swedish.
|Use of whitelist and blacklist
||Page must contain any / all of the words in whitelist. Admin selectable is also a blacklist, holding words to prevent indexing of pages containing these forbidden words.
|Ignore parts of a site.
<div> id/class value driven
|A common list of div id values is used to ignore parts of a page.
Content between <div id='this_value'> and </div>
as well as <div class='this_value'> and </div> will be ignored.
However links inside the tags are followed. Multiple and nested divs will be attended.
|Index only parts of a site.
<div> id/class value driven
|A common list of div id values is used to select parts of a page.
Only the content between <div id='this_value'> and </div>
as well as <div class='this_value'> and </div>; will be indexed, however links outside are followed. Multiple and nested divs will be attended.
|Indexing only parts of a page defined by
<element> . . . </element>
|Foreseen to cooperate with the new HTML5 elements like:
section, nav, aside, hgroup, article, header, footer
Vice versa function also included to ignore the parts of a page
between <element> . . . </element>
|User may suggest URLs
||User may suggest new sites to become part of database.
With Admin approval, reject and banned domains manager.
Optional: suggested site needs meta tag with authentication code.
|Intrusion Detection System
||Admin selectable, the IDS will block further user input, create a log file, present a warning message, or even block any traffic of IP’s known to be evil.
||Option to delete all keyword relationships, exceeding a definable amount of query results. Will significantly speed up the search procedure for huge databases.
||Besides the standard HTML output, optionally the results are presented as XML files.
|MySQLi Improved Extension implemeted
||SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5 is supported.