searchlight stats explained

Searchlight is a very feature rich product. The platform itself is designed to improve the overall server performance, while the search product replaces completely the search functionality in vBulletin and also introduces a myriad of “hidden” features that often get neglected by users. The Searchlight Stats section was designed to make sure you are always aware of everything and have total control over the data manipulated by Searchlight.

Let’s take a look at a real life example of Server Status:

All settings are self explanatory. I want to focus first, on the number of connections.

Searchlight uses both types of Sphinx index declarations (local and remote), in order to take advantage of all processors. Local indexes will be searched sequentially, utilizing only 1 core. Agents (remotely connected) are linked to the very same single instance of searchd in order to utilize the rest of available cores, producing much faster results. If a query is performed locally, agents automatically kick in to “help” scanning for additional results until the maximum number of results is achieved.

There is a logical correlation between the number of queries and the number of connections. Each query performed will try to maximize the number of processors used, until the maximum number of results is achieved. Since every query is unique, the number of results varies, based on how popular the keywords are. You can search for a keyword that matches the maximum number of results using only 3 processors or other query that requires the power of all 16.

To calculate a quick average how many processors were used for each query, simply divide the total number of connections by the number of queries:
4,486,801 connections / 430,999 queries = 10.4 processors used, per query
In our example, the average time to execute a query is 0.014 seconds.

Let take a look now at the Server Indexes:

You have no idea how many secrets I can get, just by looking at those stats. :)
Let me share them with you… but first I want to explain the logic behind. Each index is linked to a specific table, in our example we are scanning 3 tables (thread, post and keyword). Also, each index is separated in 3 categories:

  • alpha – static index, that don’t need information changed often
  • omega – dynamic index, updated often
  • scan – scanning index, contains modified data already present in other indexes

Just by looking at the thread indexes, I can tell you that there were 409 new threads created since last index update and 55 thread titles (or first posts) were modified, out of all existing threads stored into index. How do I know this? By simply looking at the Records column.

I can also see right away how long it took to update each index. In our example, the thread index was refreshed in 33 seconds, post index in 23 seconds and keyword index in 11 seconds. Not bad at all, for a 8.8 millions records scan, ehh?

Every night at 4:15am server time, Searchlight has an index update scheduled. If you look into Last Update column, you will notice that alphathread finished the index update at 22:15:33 (4:15:33 server time). That means it took 33 seconds to complete the process. Once the thread index was updated, thetapost index update started at 22:15:33 and ended at 22:15:56, taking only 23 seconds to complete. Following the same logic, I will let you verify if the keyword index actually updated in 11 seconds. :)

Why Searchlight indexes are so fast to rebuilt? Because only the needed data is added or replaced into static indexes. It is illogical to recreate from scratch a new index if you only need to insert or update few thousands rows. Also, the index rows scheduled for update are done transparently, which explains why the time stamp for indexes with modified records does not change.

All omega and scan index data is updated into alpha index, allowing to refresh all new or modified data from previous day. All scan indexes allow you to search for updated data, anytime. In other words, if you search for keywords present into alpha index, Searchlight will verify if the record is not present into scan index and serve proper data to client. If the record is found into any scan index, the alpha results are discarded and fresh scan data is served, resulting in accurate queries sent to your users. If you want, you can see exactly what data was modified, if you have the Detailed Logs option enabled:

To take advantage of all processors and also to speedup the query results, large indexes are recommended to be split into chunks. That explains why the post index is split into multiple sub-indexes. Following the logic explained above, the alpha index becomes theta. In other words, all new data is stored into thetapost, instead of alphapost index. However, the scanpost index follows the same rules, it will replace all modified data into all related indexes, not just thetapost. It is very possible to have a moderator delete/modify an old post present into deltapost index, for example.

There is a disadvantage having multiple sub-indexes. Since new data is always stored into thetapost, eventually this index will grow very fast, creating a performance issue. That explains why the number of records is very small in thetapost, compared to other sub-indexes. Still, this is not a fix, I have clients who match a number of 100,000 posts/day. With that rate, the optimal number of records per index will be reached within 1-3months.

Searchlight is smart enough to automatically detect the problem mentioned above. Once your number of thetapost records is higher than the optimal number of records, a warning message will be displayed on your screen, letting you know is time to reindex your data:

All you have to do is run the Searchlight Task Manager from your server console, to “realign” the indexes data. The manager will redistribute all records evenly, based on the new total number of records, leaving only few records into thetapost index.

You will also receive an email message related to the index warning, if you have the Warning Messages option enabled.

That’s all. I hope everything is more clear, when you look at your server stats.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>