Shouldn't the spider follow 301 HTTP redirects?
Yes, Sphider-plus follows 301 and 302 redirects. But it might be necessary to enable
'Allow to index other hosts in same domain'
Details about this option are explained in documentation chapter 2.2 Allow to index other hosts in same domain
In case that also foreign domains should be indexed, because links are redirected to them, it is necessary to enable:
'Spider can leave domain'
in Sites view / Options / Edit / Advanced Options
Why do I get the message 'The search string was not found as part of the text'?
Only a warning message. Will be presented in result listing, if the found keywords are not part of the full text, but were found only in URL or meta tags.
You may disable this warning message in Admin / Settings/ Search Settings / by unchecking the checkbox:
Show warning message if query was not found in full text; but only in 'Title' of page, 'Keywords' 'Meta tags' or 'URL'
How to bypass the Admin log in.
For Intranet applications and during debugging it might be more comfortable to bypass the Admin authorization. There are two possibilities:
1. This version still shows the Log In page as warning that you now enter into the Admin section, but you just have to click on the Login button.
In .../admin/auth.php set $admin = ""; $admin_pw = "";
In .../admin/auth.php set
$admin = ""; $admin_pw = "";
$admin = "";
$admin_pw = "";
2. This version removes the Log In page totaly:
Rename the file .../admin/auth.php into auth_backup.php Rename the file .../admin/auth_bypass.php into auth.php
Rename the file .../admin/auth.php into auth_backup.php
Rename the file .../admin/auth_bypass.php into auth.php
Unable to log in as Admin. Always re-directed to the log in form. Why?
Verify that you really use the access authorization as defined in the script:
.../admin/auth.php
if still the empty log in form is presented after entering 'Name' and 'Password', there might be a problem with session control. This option must be enabled for PHP scripts on the server. In case you are running Suhosin on the server, attention, as it encrypts sessions differently. Adding the following to the php.ini in the Sphider-plus root directory, will let you get usual access to the Admin backend
suhosin.session.encrypt = Off;
Links are not followed during Re-index, only main URL is indexed (option 1).
It is not a bug, it is a feature. If 'Follow sitemap.xml' is activated in Admin settings, links will only be followed if:
- 'last modified' date in sitemap.xml is newer than Sphiders 'last indexed' date. - New link that is not jet known in Sphiders link table.
- 'last modified' date in sitemap.xml is newer than Sphiders 'last indexed' date.
- New link that is not jet known in Sphiders link table.
The main URL will always be indexed, because status and content of the sitemap file is required for further decision what necessarily has to be indexed. Because only relevant pages will be indexed, this approach significant reduces the time required for index and re-index.
Links are not followed during Re-index, only main URL is indexed (option 2).
If a .htaccess file is used in order to redirect requests, or to 'produce' seo friendly link names, it might be helpful to enable the checkbox
'Allow other hosts in same domain'
in Admin settings, section 'Spider settings'.
Additionally it might become necessary also to activate:
Otherwise Sphider will not follow the redirect directive of your .htaccess. file.
How to integrate Sphider's search field into existing pages.
Add the following code at the according position into the HTML code of your page and personalize the path_to_sphider-plus address relativ to the HTML code:
<form action="/path_to_sphider-plus/search.php" method="get"> <table border="2" width="150" cellpadding="0" cellspacing="2"> <tr> <td align="center"><input type="text" name="query" size="30" value="" /></td> <td align="center"><input type="submit" value="Search" /> <input type="hidden" name="search" value="1" /></td> </tr> </table> </form>
<form action="/path_to_sphider-plus/search.php" method="get">
<table border="2" width="150" cellpadding="0" cellspacing="2">
<tr>
<td align="center"><input type="text" name="query" size="30" value="" /></td>
<td align="center"><input type="submit" value="Search" />
<input type="hidden" name="search" value="1" /></td>
</tr>
</table>
</form>
This simple example does not support all facilities of Sphider-plus. It is forseen only as first step into your personal adaption. For example if you add
<input type="hidden" name="mark" value="markyellow">
the found keywords will be marked yellow.
For more details about embedded operation of Sphider-plus, please notice the chapter:
Integration of Sphider-plus into an existing site
Error message: "Warning: set_time_limit() . . . "
Sphider does not work if the server is in 'safe' mode. That server setting must be disabled in the PHP initialisation file (e.g.: .../apache/php/php.ini).
safe_mode = Off
The current state is shown in Admin / Statistics / Server Info / php.ini file key: safe_mode
Before modifing this value, stop your server and afterwards restart the server again.
Error message: "Unable to flush table 'addurl' "
Sphider has not enough privileges to close the tables of your database. Sphider needs the privilege 'Reload' to perform the flush instruction (MySQL-Manual chapter 13.5.5.2). Please check your database installation, grant enough privileges to Sphider and shut down other scripts that could use the Sphider database. If you don't succeed with these fundamentals because you use a shared hosting server, open the file
.../admin/db_common.php
and delete the row
mysql_query("FLUSH TABLE $row[0]") or die("Unable to flush table $row[0].");
Also open the file
.../admin/spiderfuncs.php
mysql_query("FLUSH QUERY CACHE");
Please keep in mind that by deleting these rows you will loose parts of the 'Optimize database' and 'Clean resources during index/re-index' functions.
Error message: " Access denied; you need the RELOAD privilege. . . "
The same problem as error message: "Unable to flush table 'addurl' " This time your server sends the error message. Sphider has not enough privileges to flush the tables of your database. Sphider needs the privilege 'Reload' to perform the mysql flush instruction. For more details, see the FAQ above.
Error message: " Access-Denied: You need the SUPER privilege for this operation "
Another server limitation. This time facing a restriction concerning the MySQL server. In order to solve it, uncheck the setting:
"Enable 32 MByte MySQL query cache"
Fatal error: "Allowed memory size of xxx bytes exhausted (tried to allocate yyy bytes) "
This is a limitation of your server that does not allow PHP to allocate enough memory. In order to prevent this error message, increase the memory size in the PHP initialisation file (e.g.: .../apache/bin/php.ini)
memory_limit = 64M
The currently allocated memory size is shown in Admin / Statistics / Server Info / php.ini file
key: memory_limit
PDF documents are not indexed
If you are sure that physical path to the converter is correct (see: Admin / Statistics / Server-Info / PDF-converter), but your PDF documents are not converted, there might be another (final?) approach. Technical support for your hosting service may tell that you could run scripts from any directory, but it looks like that is not true for all providers. Meanwhile there are some according user reports.
Move the 2 scripts
pdftotext and pdftotext.script
pdftotext
and
pdftotext.script
to a directory called 'cgi-local' or something similar that your provider offers for cgi, set the proper permissions, change the $pdftotext_path in all involved scripts to the new destination and then run the index / re-index procedure.
PHP security info is not presented in Admin Statistics
Unfortunately not all servers are supporting this feature. They take their security settings as a secret. A 'blank' admin is the typical response. As consequence, this feature per default is disabled. In order to get the security info, perform the following steps:
In .../admin/admin_header.php search for the row:
// require_once('PhpSecInfo/PhpSecInfo.php');
Uncomment that row by deleting the //
Also in .../admin/admin.php search for the row:
// phpsecinfo();
What kind of input validation is performed (vulnerability)?
The following protections are implemented:
- Prevent SQL-injections. - Prevent XSS-attacks. - Prevent Shell-executes. - Suppress JavaScript executions. - Suppress Tag inclusions. - Prevent Directory Traversal attacks. - Delete input if query contains any word of (editable) blacklist. - Prevent buffer overflow errors. - Suppress JavaScript execution and tag inclusions masked as XSS attacks. - Prevent C-function 'format-string' vulnerability.
- Prevent SQL-injections.
- Prevent XSS-attacks.
- Prevent Shell-executes.
- Suppress JavaScript executions.
- Suppress Tag inclusions.
- Prevent Directory Traversal attacks.
- Delete input if query contains any word of (editable) blacklist.
- Prevent buffer overflow errors.
- Suppress JavaScript execution and tag inclusions masked as XSS attacks.
- Prevent C-function 'format-string' vulnerability.
Additionally an 'Intrusion Detection System' could be enabled as part of the Admin settings. If activated, all attempts to hack Sphider-plus are logged, a warning message is presented and further Internet traffic is blocked for the IP causing the attack. The IDS will additionally protect against:
- Cross-site request forgery - Denial of service - Information disclosure - Local file inclusion - Remote file execution - Lightweight directory access
- Cross-site request forgery
- Denial of service
- Information disclosure
- Local file inclusion
- Remote file execution
- Lightweight directory access
How to protect Database management against Admin access?
As per default, the submenu 'Configuration' is already protected by a separate username and password. This protection could be extended to the complete Database management by uncomment the row:
//include "auth_db.php";
in the following scripts:
.../admin/db_activate.php .../admin/db_common.php .../admin/db_copy.php .../admin/db_main.php
.../admin/db_activate.php
.../admin/db_copy.php
.../admin/db_main.php
On top of result listing messages like: "Results found in cache. Results from database 1" are displayed.
If in Admin settings the 'Debug' mode is enabled, several warnings and messages are displayed.
To suppress these messages, the checkbox 'Enable Debug mode' in Admin settings needs to be unchecked.
Please keep in mind that there are separated setting available for 'Admin' and 'Search User'.
Unable to search for several words like clock, file and system. Why?
In order to prevent vulnerabilities like XSS attacks, SQL-injection etc, Sphider-plus is checking all user input as well as all client data sent to the server. Input containing 'bad' words is rejected
All input has to pass the function cleaninput($input) in the script .../include/commonfuns.php
By meaans of several preg_match(...) functions the bad words are detected and filtered. In order to avoid conflicts with common user queries, the corresponding filter words could be deleted. Always together with the following OR selector ( for example clock| ).
Indexing stopped after 20 links, but my site contains more than 650 pages.
Indexing with a search engine like Sphider-plus may become problematic on a 'Shared Hosting' server. Indexing huge amount of links might be interrupted, because the granted time slice can be finished before index procedure is finished. Sphider-plus tries 3 times to reconnect to the database. But if the script was canceled, it will become necessary to manually invoke again the index procedure to continue. Sphider-plus will remember the last indexed link and continue the suspended process.
Don't see the new links, keywords and thumbnails on my screen during indexing, why?
There are some Admin settings that need to be attended:
- Enable Debug mode => must be activated - Suppress browser output of logging data during index/re-index => must not be activated
- Enable Debug mode => must be activated
- Suppress browser output of logging data during index/re-index => must not be activated
If 'Multithreaded indexing' is activated, Sphider-plus takes control over these options. Because. in order to speed up the index procedure, all not required options will be switched off. But if you return to single thread indexing, Sphider-plus does not remember the old settings.So it is up to the admin to reactivate the options with respect to his personal preferences
In the search results I'm seeing the full text information repeated. Why?
There is an Admin settings (in section Search Settings) :
"Define maximum count of result hits per page, displayed in search results (if multiple occurrence is available on a page)"
If you enter any value > 1 into this field, Sphider-plus may present several text extracts of one page. Because, if the keyword was found for example. 2 times in full text of a page, the result listing will present some text 'around' the found keyword position two times.
Receiving 'server error 500' on a fresh installed Sphider-plus (option 1)
Using the Apache suEXEC module, which allows users to run CGI and SSI applications as a different user, may cause this error message. Using the suEXEC may result in a conflict with the chmod 777 performed by the Sphider-plus Admin backend, which tries to get full write access to several subfolders of the Sphider-plus installation. It may become necessary to disable all chmod 777 commands in .../admin/admin.php
Receiving 'server error 500' on a fresh installed Sphider-plus (option 2)
Sphider-plus is delivered with several .htaccess files in some folders. These scripts contain directives, which try to overwrite the server settings. Unfortunately some server do not accept these .htaccess directives. If receiving the above error message and the server settings do not allow 'overwriting', all .htaccess files of the Sphider-plus distribution need to be deleted. This issue was reported e.g. for servers like Mageia 1, Mandriva 2010.2 and CentOS 6.0
Receiving 'server error 500' on a fresh installed Sphider-plus (option 3)
If the Sphider-plus scripts are installed on a server hosted by e.g. 'Hosteurope', it was reported to be a server conflict for the script .../admin/geoip.php
It might become necessary to disable this script and the GEOIP functions in Sphider-plus.
In the addurl form, is there a way to remove "none" as a category option?
Open the script .../addurl.php and delete the row:
print "<option ".$selected." value="0"> none ";
For the addurl form, how to make the captcha text input not case sensitive?
Open the script .../addurl.php and find the row:
if ($_SESSION['CAPTCHAString'] != $_POST['captchastring']){
Delete that row and replace it with
if (strtolower($_SESSION['CAPTCHAString']) != strtolower($_POST['captchastring'])){
Unable to rename the default search script. I am always redirected to search.php
If .../search.php is no longer the default script, you will have to modify the .htaccess file in the root folder of your Sphider-plus installation for your personal requirements.
In .htaccess you will find:
# 2. Redirect client enquiries to search.php RewriteEngine on RewriteRule ^search\.html$ ./search.php ... ... # 4. Always start with this file DirectoryIndex search.php
Parse error: syntax error, unexpected ';' in ..\sphider\settings\db1\conf_search1_.php on line 33
This error message is presented, if someone manually edited the configuration file. It is not foreseen to edit any configuration file. All modifications need to be done in the Admin backend in menu 'Settings'.
The above error message is a total knockout for Sphider-plus. Delete the corresponding configuration file in the subfolder as defined in your error message (e.g. .../sphider/settings/db1/conf_search1_.php).
Additionally restore the script
.../admin/configset.php
with the original script as of your Sphider-plus download.
Afterwards open the Admin backend and find the default settings replaced by Sphider-plus into your configuration file. Now modify the standard settings with all your individual settings in the 'Settings' menu. At the end of all, press any of the 'Save' buttons. If you stored a valid configuration backup file before starting your manual manipulation that causes the above error message, you may also restore this backup.
Only the first part of a page gets indexed. The rest of the text got lost. Why?
Might be a problem of incorrect defined HTML tags. In case that a tag is not closed correctly, indexing for that page will be ended with the incorrect tag. Words inside of tags are not part of the full text. But only the text of a page should be indexed. The indexer is using the PHP function strip_tags() to delete the tags from the page content.
Cit from the PHP manual:
"Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected."
In order to validate the HTML code, the following link might be helpful:
http://validator.w3.org
This problem is solved since Sphider-plus version 2.7, because the PHP function strip_tags() is no longer used. A new function was created, now accepting also unclosed and invalid HTML and PHP tags.
Indexing from command line shows "Fatal error: Call to undefined function getHttpVars()"
The indexation script is placed in subfolder .../admin/sphider.php If the index procedure is invoked from command line, the error message will appear, because sphider.php was called from a folder different than .../admin/