Sphider-plus version 2.4 - The PHP Search Engine




FAQ's

[ Summary ]


[ Answers ]

Shouldn't the spider follow 301 HTTP redirects?

Yes, Sphider-plus follows 301 and 302 redirects. But it might be necessary to enable

'Allow to index other hosts in same domain'

Details about this option are explained in documentation chapter 2.2 Allow to index other hosts in same domain

In case that also foreign domains should be indexed because links are redirected to them, it is necessary to enable:

'Spider can leave domain'

in Sites view / Options / Edit / Advanced Options


Top

Why do I get the message 'The search string was not found as part of the text'?

Only a warning message. Will be presented in result listing, if the found keywords are not part of the full text, but were found only in URL or meta tags.

You may disable this warning message in Admin / Settings/ Search Settings / by unchecking the checkbox:

Show warning message if query was not found in full text; but only in 'Title' of page, 'Keywords' 'Meta tags' or 'URL'




How to bypass the Admin log in.

For Intranet applications and during debugging it might be more comfortable to bypass the Admin authorization. There are two possibilities:

1. This version still shows the Log In page as warning that you now enter into the Admin section, but you just have to click on the Login button.

In .../admin/auth.php set

$admin = "";

$admin_pw = "";

2. This version removes the Log In page totaly:

Rename the file .../admin/auth.php into auth_backup.php

Rename the file .../admin/auth_bypass.php into auth.php


Top

Links are not followed during Re-index, only main URL is indexed (option 1).

It is not a bug, it is a feature. If 'Follow sitemap.xml' is activated in Admin settings, links will only be followed if:

- 'last modified' date in sitemap.xml is newer than Sphiders 'last indexed' date.

- New link that is not jet known in Sphiders link table.

The main URL will always be indexed, because status and content of the sitemap file is required for further decision what necessarily has to be indexed. Because only relevant pages will be indexed, this approach significant reduces the time required for index and re-index.




Links are not followed during Re-index, only main URL is indexed (option 2).

If a .htaccess file is used in order to redirect requests, or to 'produce' seo friendly link names, it might be helpful to enable the checkbox

'Allow other hosts in same domain'

in Admin settings, section 'Spider settings'.

Additionally it might become necessary also to activate:

'Spider can leave domain'

in Sites view / Options / Edit / Advanced Options

Otherwise Sphider will not follow the redirect directive of your .htaccess. file.


Top

How to integrate Sphider's search field into existing pages.

Add the following code at the according position into the HTML code of your page and personalize the path_to_sphider-plus address relativ to the HTML code:

<form action="/path_to_sphider-plus/search.php" method="get">

<table border="2" width="150" cellpadding="0" cellspacing="2">

<tr>

<td align="center"><input type="text" name="query" size="30" value="" /></td>

<td align="center"><input type="submit" value="Search" />

<input type="hidden" name="search" value="1" /></td>

</tr>

</table>

</form>

This simple example does not support all facilities of Sphider-plus. It is forseen only as first step into your personal adaption. For example if you add

<input type="hidden" name="mark" value="markyellow">

the found keywords will be marked yellow.

More details and examples how to integrate Sphider-plus into existing pages may be found on the original Sphider forum. For some more examples see:

http://www.sphider.eu/forum/read.php?2,4505




Error message: "Warning: set_time_limit() . . . "

Sphider does not work if the server is in 'safe' mode. That server setting must be disabled in the PHP initialisation file (e.g.: .../apache/php/php.ini).

safe_mode = Off

The current state is shown in Admin / Statistics / Server Info / php.ini file key: safe_mode

Before modifing this value, stop your server and afterwards restart the server again.


Top

Error message: "Unable to flush table 'addurl' "

Sphider has not enough privileges to close the tables of your database. Sphider needs the privilege 'Reload' to perform the flush instruction (MySQL-Manual chapter 13.5.5.2). Please check your database installation, grant enough privileges to Sphider and shut down other scripts that could use the Sphider database.

If you don't succeed with these fundamentals because you use a shared hosting server, open the file

.../admin/db_common.php

and delete the row

mysql_query("FLUSH TABLE $row[0]") or die("Unable to flush table $row[0].");

Also open the file

.../admin/spiderfuncs.php

and delete the row

mysql_query("FLUSH QUERY CACHE");

Please keep in mind that by deleting these rows you will loose parts of the 'Optimize database' and 'Clean resources during index/re-index' functions.




Error message: " Access denied; you need the RELOAD privilege. . . "

The same problem as error message: "Unable to flush table 'addurl' " This time your server sends the error message. Sphider has not enough privileges to flush the tables of your database. Sphider needs the privilege 'Reload' to perform the mysql flush instruction. For more details, see the FAQ above.


Top

Fatal error: "Allowed memory size of xxx bytes exhausted (tried to allocate yyy bytes) "

This is a limitation of your server that does not allow PHP to allocate enough memory. In order to prevent this error message, increase the memory size in the PHP initialisation file (e.g.: .../apache/bin/php.ini)

memory_limit = 64M

The currently allocated memory size is shown in Admin / Statistics / Server Info / php.ini file

key: memory_limit

Before modifing this value, stop your server and afterwards restart the server again.

 



PDF documents are not indexed

If you are sure that physical path to the converter is correct (see: Admin / Statistics / Server-Info / PDF-converter), but your PDF documents are not converted, there might be another (final?) approach. Technical support for your hosting service may tell that you could run scripts from any directory, but it looks like that is not true for all providers. Meanwhile there are some according user reports.

Move the 2 scripts

pdftotext

and

pdftotext.script

to a directory called 'cgi-local' or something similar that your provider offers for cgi, set the proper permissions, change the $pdftotext_path in all involved scripts to the new destination and then run the index / re-index procedure.


Top

PHP security info is not presented in Admin Statistics

Unfortunately not all servers are supporting this feature. They take their security settings as a secret. A 'blank' admin is the typical response. As consequence, this feature per default is disabled. In order to get the security info, perform the following steps:

In .../admin/admin_header.php search for the row:

// require_once('PhpSecInfo/PhpSecInfo.php');

Uncomment that row by deleting the //

Also in .../admin/admin.php search for the row:

// phpsecinfo();

Uncomment that row by deleting the //




What kind of input validation is performed (vulnerability)?

The following protections are implemented:

- Prevent SQL-injections.

- Prevent XSS-attacks.

- Prevent Shell-executes.

- Suppress JavaScript executions.

- Suppress Tag inclusions.

- Prevent Directory Traversal attacks.

- Delete input if query contains any word of (editable) blacklist.

- Prevent buffer overflow errors.

- Suppress JavaScript execution and tag inclusions masked as XSS attacks.

- Prevent C-function 'format-string' vulnerability.


Top

How to protect Database management against Admin access?

As per default, the submenu 'Configuration' is already protected by a separate username and password. This protection could be extended to the complete Database management by uncomment the row:

//include "auth_db.php";

in the following scripts:

.../admin/db_activate.php

.../admin/db_common.php

.../admin/db_copy.php

.../admin/db_main.php




Error message: "Access-Denied: You need the SUPER privilege for this operation."

If you try to run 'Re-Index' or 'Erase and Re-Index' you might get the above messsage

This message might be a problem with a Shared Hosting application. Eventually your Hoster does not allow you to reset the MySQL cache (as the cache is used also by some other customers on that server).

In order to solve it for you:

In .../admin/admin.php search for the following rows and delete them:

$mysql_cachereset = @mysql_query("RESET QUERY CACHE");

echo "<div>&nbsp;</div>

<p class='cntr em sml'>MySQL query cache cleared.</p>

";

You will find this 'reset query cache' three times in admin.php. Delete all three presences.


Top

On top of result listing messages like: "Results found in cache. Results from database 1" are displayed.

If in Admin settings the 'Debug' mode is enabled, several warnings and messages are displayed.

To suppress these messages, the checkbox 'Enable Debug mode' in Admin settings needs to be unchecked.




Unable to search for several words like clock, file and system. Why?

In order to prevent vulnerabilities like XSS attacks, SQL-injection etc, Sphider-plus is checking all user input as well as all client data sent to the server. Input containing 'bad' words is rejected

All input has to pass the function cleaninput($input) in the script .../include/commonfuns.php

By meaans of several preg_match(...) functions the bad words are detected and filtered. In order to avoid conflicts with common user queries, the corresponding filter words could be deleted. Always together with the following OR selector ( for example clock| ).


Top

Indexing stopped after 20 links, but my site contains more than 650 pages.

Indexing with a search engine like Sphider-plus may become problematic on a 'Shared Hosting' server. Indexing huge amount of links might be interrupted, because the granted time slice can be finished before index procedure is finished. Sphider-plus tries 3 times to reconnect to the database. But if the script was canceled, it will become necessary to manually invoke again the index procedure to continue. Sphider-plus will remember the last indexed link and continue the suspended process.




Don't see the new links, keywords and thumbnails on my screen during indexing, why?

There are some Admin settings that need to be attended:

- Enable Debug mode => must be activated

- Suppress browser output of logging data during index/re-index => must not be activated

If 'Multithreaded indexing' is activated, Sphider-plus takes control over these options. Because. in order to speed up the index procedure, all not required options will be switched off. But if you return to single thread indexing, Sphider-plus does not remember the old settings.
So it is up to the admin to reactivate the options with respect to his personal preferences




Internet Explorer uses complete width of browser window, while FF, Opera and Chrome are limited to 950 pixels

Yes, our dear IE needs a special solution, which will work also for the other browsers. In order to fix it, open the script search.php and find

include "".$template_dir."/html/010_html_header.html" ;

After copying this row into the clipboard, delete the row from the script. In the same file search for

$stem_dir = "$include_dir/stemming";

Will be found around row 41. Beyond this row add the stored row from the clipboard.


Top