Sphider-plus is a search engine, based on the scripts of original Sphider.

[ About Sphider-plus]

More than 400 new features (additional mods, functions, template designs and debugging)
have been added to the original Sphider. For details about all the improvements
and changes, please read the Documentation section.

[ Main features ]

Item	Description
UTF-8 and UTF-16 support	Indexation and search procedure for Chinese, Cyrillic, Georgian, Hebrew etc. charsets. UNICODE support including astral symbols.
Support for non-ASCII domains	'Internationalized Domain Names' (IDN) like 'http://президент.рф/' and 'http://müller.de/' are accepted and processed.
Responsive design	Automatically adapting the size of search form, result listing and addurl form to display size of computers, tablets, smartphones, etc.
Media support	Index and search for images (incl. Open Graph images), audio and video (incl. Youtube videos). EXIF and ID3 information are also indexed and herewith become searchable. Thumbnails for all indexed media. Open the indexed media with according player software.
Multiple database support	Individual configuration and activation of databases for 'Admin', 'Search User' and 'Suggest URL'. Support of multiple table sets in each db, MySQL query cache, individual index for each db, individual or bulk search in predefined databases. Individual Admin settings for each db and each set of tables.
Result cache	Extremely reduced response time for queries already cached. Controller to keep the 'Most Popular Queries' always in cache. Separate caches for text and media results. Admin configurable.
Follow sitemap files	If available, sitemap.xml as well as gzip compressed files will be used to follow the links of a site. If <sitemapindex . . . > is detected, also multiple sitemap files are processed.
Periodical Re-index	Re-indexing could be performed automatically, repeated every selected time interval. Admin selectable intervals for 3 hours, 12 hours, 1 day, 1 week, or 1 month.
Multithreaded indexing	In order to reduce the time for indexing, 1-10 parallel running threads might be activated in Admin settings.
Preferred re-index	While invoking this option, the admin may select a suitable level for the next index procedure. Thus, only those URLs, containing the according level, will be re-indexed..
Erase & Re-index and Continue suspended index procedures	Individual (site specific) or bulk update of database.
Support of XML product feeds	Index and search of feed content, inclusive formatting the search results.
RDF, RSD, RSS and Atom feed support	Index and search of feed content, inclusive RDF 'Dublin Core' tags. Obey / ignore 'preferred' tags in RSD feeds. Follow / ignore CDATA directives.
Various search modes	Search with wildcards, Tolerant search, Search strict, Search only in one domain, Search all links of a site, Search for media (link-specific).
Add thumbnails to each page presented in text results	Admin selectable, this feature will present a web shot as part of the text result listing. Created during index procedure for all URLs, which had been indexed on the Internet.
Prevent indexing of known malware and pishing pages	This feature is supplied by a Google web service, to prevent indexing of pages, suspected to contain malware or phishing content.
11 different modes of sorting the text results	Admin selectable: -By relevance (weight %) -By hit counts in full text -Most popular links on top -By indexdate -By URL names -By file suffix -Main URL (domain) on top -Like Google (Top 2 per URL) - Promoted domain on top - Links holding promoted catchwords on top.
5 different modes of sorting the media results	Admin selectable: -By title(alphabetic) -By file suffix -By image size -By 'Last queried' -By 'Most popular'.
Same results for queries typed with pure vowels, or with accents	Will deliver the same results for queries like: cafe and café. To be activated in Admin backend.
Same results for queries with and without quotes	Will deliver the same results for queries like: d'information <-> information dei'largi <-> largi Also Admin selectabe: Equalize the different quotes like: ' ` ´
Same text results for queries with and without ligatures	Admin selectable; will deliver the same results for queries like: cœur and coeur. Worked out for Latin ligatures in Unicode (Latin-derived alphabets) and also ligatures used only in phonetic transcription, but not taking into consideration medieval ligatures.
Present all results for singular and plural at Russian nouns	will deliver all search results for автокреслО and/or автокреслA. Independent from singular or plural query string.
Extensive user statistics	Search log, Most popular text links, Most popular media links, User IP, Country code, Host name, Last queried, Top keywords, etc.
GDPR support	By default, Sphider-plus collects and processes user data in the General Data Protection Regulation (GDPR) compliant manner.
Segmentation of Chinese and Korean words	Will divide phrases like 帽子和服装 into the base words 帽子 and 和 and 服装 , so that all will become searchable. Dictionaries with 106.800 radicals.
Segmentation of Japanese words	Segmentation of 5.724 kanji (new, old and half width), hiragana, katakana and jinmeiyo Japanese character writing systems.
Transliterate Latin characters into their Greek equivalents	Transforms query input alla to find ἀλλὰ and baptismatos to find βαπτίσματος Also accepts queries containing Greek vowels without accents.
Index of password protected sites	Index also .htaccess protected sites (basic authorization). Up to 3 different zones could be registered and will be indexed.
Index framesets and iframes	If enabled, both options will index html and image frames. Not available for dynamically reloaded frames (e.g. JavaScript).
Follow HTTP redirections	Follow URL redirections caused by HTTP 301, 302, 303 and 307 status codes. Also obeying JavaScript, sent as HTML content like: <SCRIPT language="javascript"> win.loc="mp.php?mcv=59";</SCRIPT>
Follow header redirections, refresh tags and canonical links	Automatical forwarding for the indexer.
Follow links found in JavaScript and index also the content of document.write	Will index JavaScript commands. Detect and follow links like: document.write(' <a href="new12.pdf">All news 2012</a> '); and index the content of: document.write(' this content '); Not indexing content created in real-time by JavaScript.
Accept gzip formatted transmission	In order to reduce the transfer time, gzip formatted content is requested by the crawler.
Index of RAR and ZIP compressed files and archives	Supports compressed (X)HTML, XML and also PDFs, all kind of feeds, frames and iframes in archives. Links found in the compressed files are followed.
Converter included for PDF, DOCX, XLSX, ODT, ODS, CSV, PPTX and XLS files	Converting also non-Latin text like: Arabic, Cyrillic, Chinese, Greece and Hebrew. Links found in the converted files will be followed.
Debug mode	Offering detailed information during index/re-index: New links, keywords, frames and media found per link. To be activated separately for Admin backend and User interface.
Automatic detection of users preferred dialog language.	Admin selectable for self-detection of the preferred language of the search engine user. Included for 33 languages.
Common word lists holding stop words. Included for 25 languages	Admin selectable for: Arabic, Bengali, Bulgarian, Catalan, Chinese, Cyrillic, Czech, Danish, Dutch, English, Farsi, Finnish, French, Greek, German, Hindi, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Spanish, Swedish and Turkish.
Stemming algorithms implemented for 15 languages	Admin selectable for: Bulgarian, Chinese, Cyrillic, Czech, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Portuguese, Spanish and Swedish.
Use of whitelist and blacklist	Page must contain any / all of the words in whitelist. Admin selectable is also a blacklist, holding words to prevent indexing of pages containing these forbidden words.
Ignore parts of a site. <div> id/class value driven, <ul> class value driven, <pre> class value driven.	A common list of div id values is used to ignore parts of a page. Content between <div id='this_value'> and </div> as well as <div class='this_value'> and </div> will be ignored. However links inside the tags are followed. Multiple and nested divs are attended. The same feature is available for classes in ul and pre tags.
Index only parts of a site. <div> id/class value driven	A common list of div id values is used to select parts of a page. Only the content between <div id='this_value'> and </div> as well as <div class='this_value'> and </div>; will be indexed, however links outside are followed. Multiple and nested divs will be attended.
Do not index parts of a page defined by HTML5 elements <tag> . . . </tag>	Foreseen to cooperate with the HTML5 elements like: section, nav, aside, hgroup, article, header, footer Vice versa function also included in order to index only parts of a page between <tag> . . . </tag>
Accept or ignore Emoji characters during index procedure.	Emoji characters like: 😀 😜 🥵 👀 🥂 ⛔️ 🆗 ⏏️ 🕷 🕸 🧑 ‍🎄 and UNICODE symbols ‍ 🎄 👆🏼 ★ ⦿ ♜ ♞ ➨ ➤ ➽ 🂡 ☀︎ ✈︎ 🞧 ❗️ (Foreseen for next release of Sphider-plus)
User may suggest URLs	User may suggest new sites to become part of database. With Admin approval, reject and banned domains manager. Optional: suggested site needs meta tag with authentication code.
Intrusion Detection System	Admin selectable, the IDS will block further user input, create a log file, present a warning message, or even block any traffic of IP’s known to be evil.
Admin backend protected against XSRF and SFA attacks	Admin selectable and independent from IDS, 'Cross-Site-Forgery-Requests' and 'Session-Fixation-Attacks' are prevented.
Block all queries sent by known bots, harvester and spammer	Admin selectable and independent from IDS, about 190.000 IPs are blocked. The IP list is automatically updated every 24 hours by web service. Also queries of about 47.000 evil bots could be blocked.
Search form protected against flood attempts	Admin selectable and independent from IDS, attempts to flood the search form by too many queries per unit of time is prevented. Inclusive log file and further access protection.
Bounded database	Option to delete all keyword relationships, exceeding a definable amount of query results. Will significantly speed up the search procedure for huge databases.
HTML, JSON, XML and RSS result output	Besides standard HTML output, optionally the results are presented as JSON, XML and RSS files. Separated for text and media results.
MySQLi Improved Extension implemented	SQLi connector implemented between PHP and a MySQL database. Performed by OOP, also PHP v.5.5+ is supported.
Compatible with MySQL and MariaDB	Proven up to: - MySQL version 8.0.32 - MariaDB version 10.4.28 sp_executesql
Ready to run in PHP 8 environment	Latest version of Sphider-plus version 4.2024a is proven up to PHP version 8.3.2

[ Proven ]

Successfully implemented as search engine on a customer site with a database capacity such as:

25.206 sites + 324.595 page links + 1.260.698 keywords + 169.251 media links.

Imprint Private Notice Private Policy

Sphider-plus version 4.2024a - The PHP Search Engine

Sphider-plus is a search engine, based on the scripts of original Sphider.

Successfully implemented as search engine on a customer site with a database capacity such as:

25.206 sites + 324.595 page links + 1.260.698 keywords + 169.251 media links.