Release date: July 28, 2013
Build up with Sphider: v.1.3.5
Honor:
This version of Sphider-plus is dedicated to Anton Cygankov. For all the month, he followed the development with an ongoing testing. Verifying especially the index procedure with thousands of URLs and reviewing all the bugs, I intermediately implemented. Thank you very much for this big effort. I wouldn't be able to get up to the current status of development, without your enormous support.
It took several months to come up with the v.3 release, but together with the SQLi support, it seems to be a reasonable step into the future of this search engine.
In front of version 2.9 the following modifications have been added:
New feature:
Index DOCX files. To be activated in Admin settings.
Implemented as PHP script, the converter needs no adoption to the Operating Syst
New feature:
Index XLSX files. To be activated in Admin settings.
Implemented as PHP script, the converter needs no adoption to the Operating System.
New feature:
Index only prioritized sites. Level depended re-index of only those URLs, containing the according level.
For details please notice the chapter: Prioritized indexing
New option:
Admin's 'Sites' table sorted by index priority.
New feature:
Create a thumbnail of all Internet URLs during index procedure. Will be presented as part of the text result listing for each link. To be activated in Admin backend.
For details please notice the chapter: Create thumbnails during index procedure
New feature:
Prevent indexing of known malware and pishing pages. This feature is supplied by a Google web service to prevent indexing of pages that contain malware or phishing content.
For details please notice the chapter: Prevent indexing of known malware and pishing pages
New feature:
If the blacklist is met too often, automatically abort the indexation of the regarding site. Defined to a count of 20.
New option:
Check correct converting of content into UTF-8
Will detect invalid charset definitions in Meta tags of HTML header, or invalid charset definition supplied via HTTP by the client server. If an invalid charset is detected, the index procedure will be aborted for the regarding link.
New feature:
The addurl form now will only store domain name and TLD. Something like 'sphider-plus.eu'
Thus, www. and any sub folder of the suggested URL will be ignored.
New feature:
Ignore the content of style="display:none" in div elements. Something like:
<div style="display:none">ignore_this_content</div>
New feature:
In order to enable immediate query input, auto focus is set to the search form.
New suggest framework.
The auto-complete feature of Sphider-plus is now based on the JavaScript library jQuery.
For details please notice the chapter: Suggest framework
New feature:
Separate search fields for text and media queries. Consequently also separate suggestions
will be offered. To be activated in Admin 'Settings'.
New feature:
Restrict the search results by means of up to 5 categories simultaneously.
Import and export of URLs with multiple category definitions assigned to each site.
For details please notice the chapter: Parallel structure of search in categories.
New feature:
Now indexing also site URLs containing the https scheme.
Improved index procedure:
- Now treating link URLs with and without 'www' as equal, and excluding them as duplicate pages.
- Linking in it selves caused by HTTP 301/302/307 redirections are intercepted. Thus infinite indexation is prevented.
- Multiple attempts to redirect in it selves will force Sphider-plus to abort the index procedure for the involved site.
New option in Admin 'Settings' menu:
Define count of redirections followed for each link (1-9) while indexing.
New options in Admin 'Settings' menu:
Follow URL redirections, which are invoked by JavaScript like
<sript . . . 'window.location.replace . . . . '
<script . . . var cURL = . . . .
<script . . . window.location = . . . . AND " + location.host + "
and several other script directives.
New options in Admin 'Settings' menu:
Follow URL redirections, which are invoked by body tags like
<BODY onLoad = "parent.location = 'home.asp'">
'HTTP-EQUIV= . . refresh . . content= . . .'
and several other tags
New option in Admin 'Settings' menu:
Obey refresh delay directives, placed in meta tags like
<meta http-equiv="refresh" content="180;url=http://www.moodys.com.ar">
New option in Admin 'Settings' menu:
Do not index comment parts and scripts outside the HTML tags.
New option in Admin 'Settings' menu:
If not already exist, add a final slash to the path for all detected links.
If a file name exists as part of the path, this option will be bypassed.
Also, if the http request for the main URL is only excepted without slash, this option will not be obeyed.
New option in Admin 'Settings' menu:
Convert all link URLs to lower case characters.
New option in Admin 'Settings' menu:
Convert all link URLs found during indexation into UTF-8
Will convert URLs like:
/3v/catalog/%C1%E0%E2%E0%F0%E8%FF+%E8%E7%F0%E0%E7%E5%F6/
into: /3v/catalog/Бавария+изразец/
Improved link detection:
- Invalid URLs containing duplicate slashes in its path will be ignored.
- The following links are followed now:
<script>window.document.location ="/this.path";</script>
<script>window.document.location.href="/this.path";</script>
<script>window.location.replace("/this.path")</script>
<script>"https|http this URL"</script>
<body onload "/this.path">
and several other.
New option in Admin backend 'Clean' menu:
Truncate all tables in database.
Improved 'NOHOST' detection during index procedure:
Now trying 5 times to get in contact with the server.
Each attempt is performed by 2 different HTTP requests.
Improved 'Add site' function in Admin backend.
Now treating URLs with the scheme 'http' and 'https' as equal, and excluding them as duplicate sites.
Support added for Windows-31J (CP932) charset as extension of Shift JIS. (CP932 contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit set to 1)
UTF-8 support implemented for media titles, file names and ID-3 tags.
SQLi connector implemented between PHP and a MySQL database. Performed by OOP.
Bug fixed in option: Do not index the full text.
Bug fixed for URLs containing CP1252 coded paths.
Bug fixed in detection of www/non www links. Now preventing double indexing.
Bug fixed in 'Strip session ids'.
Bug fixed in Korean word segmentation.
Some small bugs killed.
Involved files that have been modified / added for this release:
As the SQLi connector is implemented between PHP and a MySQL database, nearly all scripts are renewed.
It is strongly recommended to perform a fresh installation for this version.