Serps – Moodledocs

Posted on

Template:Search engine plugins

Introduction

Search engines index large quantities of data in a structured way that permit customers to question them and extract relevant information. There are many search engines like google with high-quality APIs to set information on and retrieve records from. We made Moodle’s global seek pluggable so unique backends can be used, from a simple database desk (ok for small websites however unusable for huge websites) to open sourced structures like solr or elasticsearch (on pinnacle of Apache Lucene) or proprietary cloud primarily based systems.

Terms

Index: You know what it method, but on this web page we use index as the data box on your search engine. It may be an instance on your seek engine server or a database table call in case you are writing a search engine for mongodb (just an example)

Document: A “searchable” unit of facts like a database entry, a discussion board submit… You can see it as one of the seek consequences you may assume to get back through a search engine.

Writing your very own search engine plugin

To write your personal seek engine you want to code techniques to set, retrieve and delete records from your search engine. You will want to feature a \search_yourplugin\engine class in seek/engine/yourplugin/lessons/engine.personal home page extending \core_search\engine.

Search engine setup

Your search engine desires to be organized to index statistics, you could have a script for your plugin users on the way to effortlessly create the desired shape in the seek engine. Otherwise upload commands about how to set it up.

You can get the listing of fields Moodle desires together with some different info you may need like the area kinds calling \core_search\record::get_default_fields_definition()

Add contents

This technique is achieved whilst Moodle contents are being listed inside the search engine. Moodle iterates through all search areas extracting which contents must be indexed and assigns them a completely unique identification based at the search area.

public feature add_document(array $doc, $fileindexing = fake) // Use curl or every other method or extension to push the report for your seek engine.

$document will comprise a file facts with all required fields (+ perhaps some optional fields) and its contents will be already proven so a integer discipline will come with an integer cost…

$fileindexing will be genuine the search vicinity that generated the file supports connected files. Will be false in case your plugin does no longer assist document indexing

File indexing

If the engine helps record indexing, and $fileindexing is surpassed as authentic to add_document (indicating the location supports indexing), then the record despatched could have all documents related to it connected as stored_file times. They are retrieved from the document with get_files(), and it It is as much as the engine to decide how these are to be listed, which include content extraction.

The documents connected to a document for indexing represent the authoritative set of documents for that report. This method the engine should make sure that after re-indexing a report, any documents now not attached to it are not in the index.

Retrieve contents

This is the key approach, as seek engine plugins have plenty of flexibility right here.

You will get the hunt filters the person detailed and the list of contexts the person can get right of entry to and this function must return an array of \core_search\report items.

public characteristic execute_query($filters, $usercontexts, $restriction = zero) // Prepare a query applying all filters.// Include $usercontexts as a clear out to contextid field.// Send a request to the server.// Iterate thru consequences.// Check person get entry to, study https://medical doctors.moodle.org/dev/Search_engines#Security for more information// Convert consequences to '''\core_search\document''' type objects using '''\core_search\record::set_data_from_engine'''// Return an array of '''\core_search\record''' objects, limiting to $limit or \core_search\manager::MAX_RESULTS if empty.

File indexing

When retrieving outcomes based on a report hit, you could connect stored_file times to the report to indicate what document(s) produced the in shape. This data is rendered as part of the effects web page. Because of rendering issues, this have to be constrained to an inexpensive quantity of ‘healthy documents’ for a given record. The search_solr limits to a most of three matching documents.

Security

It is critical that this function is checking \core_search\report::check_access results and do no longer go back results in which the person do now not have get right of entry to. Moodle already plays part of the desired safety checkings, but search areas always have the ultimate word and it should be reputable.

Getting enough effects

Because in some instances many facts might also fail check_access(), engines need to make provisions to make certain sufficient a full set of files is lower back, even though it have to take a look at many more files. See MDL-53758 for a higher dialogue of this.

Record counts

get_query_total_count() have to be applied to go back the wide variety of outcomes that available for the most current call to execute_query(). This is used to decide how many pages may be displayed inside the paging bar. For more discussion see MDL-53758.

public feature get_query_total_count() // Return an approximate depend of total statistics for the most currently completed execute_query().

The value may be an estimate. The search supervisor will make sure that if a page is asked that is past the last web page of actual consequences, the user will seamlessly see the final to be had page.

There are some of approaches to decide what fee to go back from get_query_total_count(). Note that if the technique you pick requires you to manner extra than $limit legitimate documents, you continue to must best go back $restrict information from execute_query(). Some of the approaches to try this are:

Return what number of feasible there are

This might imply what number of effects we’ve got processed and exceeded (the use of check_access()), plus any candidate outcomes that are left. Alternately it is the whole remember of statistics for the question, minus the ones we’ve got rejected so far. search_solr uses this method.

User revel in: User sees a full praise of pages. If the pass-to-fail ration on check_access() is very excessive (and we typically count on it to be), then the wide variety of pages need to typically be accurate. This method will continually mistakes on the excessive facet. It is possible that once clicking on a higher page there will be no outcomes to be had, so the quest supervisor will seamlessly display them the closing real web page with real results.

Pros: Free or very cheap with a few engines – no want to check get admission to/method records beyond the contemporary web page. Reasonable consumer experience.

Cons: Future page count number is not best, so can result in web page not being available when clicked (however the supervisor mitigates this).

Return the present day be counted plus 1

In this case, you would calculate all of the statistics through $limit plus one. If the plus one exists, you’ll go back that remember, otherwise you’ll go back the real depend.

User revel in: User could best see as much as the contemporary web page, plus one more, besides whilst on the ultimate page of consequences. Gmail search works similar to this inside the way you can only navigate to the following page of consequences, not an arbitrary page.

Pros: Relatively cheap. Reasonable consumer experience.

Cons: User can’t bounce to an arbitrary web page despite the fact that they understand what page a particular result is on

Calculate all results as much as MAX_RESULTS

This might mean calculating the full set of consequences as much as MAX_RESULTS, and returning the real depend of consequences.

User enjoy: The user will see the precise quantity of pages they ought to

Pros: cleanest person experience

Cons: Very highly-priced, as you are calculating as much as MAX_RESULTS consequences on every web page, even if you are only displaying the primary page

Just go back MAX_RESULTS

User experience: User will always see 10 pages, except while they are on the remaining web page of actual outcomes.

Pros: Free

Cons: Worst person enjoy

Delete contents

public feature delete($areaid = false) if ($areaid === false) // Delete all your search engine index contents. else // Delete all your seek engine contents in which areaid = $areaid.

\core_search\record::check_access will go back \core_search\supervisor::ACCESS_DELETED if a file back from the search engine isn’t always to be had in Moodle any greater, you can use this to clean up the quest engine contents with some sort of \search_yourplugin\engine::delete_by_id technique. You can take a look at seek/engine/solr/training/engine.php execute_query method for an example of this.

Other abstract techniques you want to overwrite

public characteristic file_indexing_enabled() // Defaults to false, overwrite it if your seek engine helps document indexing.return false;

public feature is_server_ready() // Check in case your seek engine is ready.

Other strategies you is probably inquisitive about overwriting

public function is_installed() // Check if the specified PHP extensions you want to make the hunt engine work are mounted.

public feature optimize() // Optimize or defragment the index contents.


These techniques are called even as the indexing manner is walking and permit search engine to hook the indexing technique.

public function index_starting($fullindex = fake) // Nothing by means of default.

public feature index_complete($numdocs = zero, $fullindex = false) // Nothing by default.

public characteristic area_index_starting($searcharea, $fullindex = false) // Nothing with the aid of default.

public characteristic area_index_complete($searcharea, $numdocs = 0, $fullindex = fake) return actual;

Adapting report codecs for your seek engine layout

\core_search\record is the magnificence that represents a record, depending in your search engine backend barriers or on the way it shops time values you is probably interested in overwriting this class in \search_yourplugin\record. The foremost functions you is probably interested in overwriting are:

Format date/time fields

public static function format_time_for_engine($timestamp) // Convert $timestamp to a string the use of the format used by your seek engine.

By default, \core_search\record::format_time_for_engine returns a timestamp (integer).

Import date/time contents from the search engine

public static function import_time_from_engine($time) // Convert the string lower back from the quest engine as a date/time format to a timestamp (integer).

By default, \core_search\record::import_time_from_engine returns a timestamp (integer).

Format string fields

public static feature format_string_for_engine() // Limit the string length, convert iconv if your search engine handiest helps an unique charset...

By default, \core_search\report::format_string_for_engine returns the string as it’s miles.

Leave a Reply

Your email address will not be published. Required fields are marked *