Extension:CirrusSearch - MediaWiki
Jump to content
From mediawiki.org
Translate this page
Languages:
Bahasa Indonesia
Türkçe
polski
português do Brasil
svenska
čeština
русский
हिन्दी
中文
MediaWiki extensions manual
CirrusSearch
Release status:
stable
Implementation
Search,
API
Hook
Description
Implements searching for MediaWiki using Elasticsearch
Author(s)
Nik Everett, Chad Horohoe, Erik Bernhardson
Latest version
continuous updates
Compatibility policy
Snapshots releases along with MediaWiki. Master is not backward compatible.
Composer
mediawiki/cirrussearch
Parameters
$wgCirrusSearchDeduplicateInQuery
$wgCirrusSearchLanguageWeight
$wgCirrusSearchAutomationCIDRs
$wgCirrusSearchUseIcuFolding
$wgCirrusSearchStemmedWeight
$wgCirrusSearchQueryStringMaxDeterminizedStates
$wgCirrusSearchCrossClusterSearch
$wgCirrusSearchExtraIndexSettings
$wgCirrusSearchAutomationUserAgentRegex
$wgCirrusSearchTalkNamespaceWeight
$wgCirrusSearchPrefixWeights
$wgCirrusSearchMustTrackTotalHits
$wgCirrusSearchPrefixSearchRescoreProfile
$wgCirrusSearchLanguageToWikiMap
$wgCirrusSearchCompletionSuggesterUseDefaultSort
$wgCirrusSearchExtraFieldsInSearchResults
$wgCirrusSearchMoreLikeThisMaxQueryTermsLimit
$wgCirrusSearchUseIcuTokenizer
$wgCirrusSearchCompletionBannedPageIds
$wgCirrusSearchOptimizeIndexForExperimentalHighlighter
$wgCirrusSearchRescoreProfiles
$wgCirrusSearchPhraseRescoreBoost
$wgCirrusSearchInterwikiProv
$wgCirrusSearchMoreLikeThisAllowedFields
$wgCirrusSearchQueryStringMaxWildcards
$wgCirrusSearchElasticQuirks
$wgCirrusSearchMaxFileTextLength
$wgCirrusSearchFallbackProfiles
$wgCirrusSearchMoreLikeThisTTL
$wgCirrusSearchAllowLeadingWildcard
$wgCirrusSearchInterwikiPrefixOverrides
$wgCirrusSearchMaintenanceTimeout
$wgCirrusSearchReplicas
$wgCirrusSearchPhraseSlop
$wgCirrusSearchBoostOpening
$wgCirrusSearchWriteBackoffExponent
$wgCirrusSearchUserTesting
$wgCirrusSearchDefaultNamespaceWeight
$wgCirrusSearchUseCompletionSuggester
$wgCirrusSearchPhraseSuggestReverseField
$wgCirrusSearchFallbackProfile
$wgCirrusSearchFragmentSize
$wgCirrusSearchUnlinkedArticlesToUpdate
$wgCirrusSearchCustomPageFields
$wgCirrusSearchClientSideUpdateTimeout
$wgCirrusSearchIgnoreOnWikiBoostTemplates
$wgCirrusSearchRegexMaxDeterminizedStates
$wgCirrusSearchInterwikiHTTPConnectTimeout
$wgCirrusSearchExtraIndexes
$wgCirrusSearchCategoryDepth
$wgCirrusSearchMergeSettings
$wgCirrusSearchClusters
$wgCirrusSearchCrossProjectShowMultimedia
$wgCirrusSearchBannedPlugins
$wgCirrusSearchMoreLikeThisConfig
$wgCirrusSearchClusterOverrides
$wgCirrusSearchAlternateIndices
$wgCirrusSearchCrossProjectBlockScorerProfiles
$wgCirrusSearchEnableIncomingLinkCounting
$wgCirrusSearchNearMatchWeight
$wgCirrusSearchReplicaGroup
$wgCirrusSearchFeedbackLink
$wgCirrusSearchTextcatConfig
$wgCirrusSearchNumCrossProjectSearchResults
$wgCirrusSearchLanguageDetectors
$wgCirrusSearchUpdateShardTimeout
$wgCirrusSearchEnableCrossProjectSearch
$wgCirrusSearchSecondTryProfiles
$wgCirrusSearchFullTextQueryBuilderProfiles
$wgCirrusSearchCompletionDefaultScore
$wgCirrusSearchWriteClusters
$wgCirrusSearchCompletionSuggesterHardLimit
$wgCirrusSearchRecycleCompletionSuggesterIndex
$wgCirrusSearchFullTextQueryBuilderProfile
$wgCirrusSearchTextcatModel
$wgCirrusSearchCompletionUseSecondTryProfile
$wgCirrusSearchStreamingUpdaterUsername
$wgCirrusSearchLogElasticRequests
$wgCirrusSearchConnectionAttempts
$wgCirrusSearchCompletionSuggesterUseAltIndexId
$wgCirrusSearchWikiToNameMap
$wgCirrusSearchMaxFullTextQueryLength
$wgCirrusSearchLogElasticRequestsSecret
$wgCirrusSearchManagedClusters
$wgCirrusLanguageLanguageKeywordExtraFields
$wgCirrusSearchCompletionSettings
$wgCirrusSearchNaturalTitleSort
$wgCirrusSearchDeduplicateInMemory
$wgCirrusSearchUseEventBusBridge
$wgCirrusSearchEnableRegex
$wgCirrusSearchClientSideSearchTimeout
$wgCirrusSearchIndexFieldsToCleanup
$wgCirrusSearchDeduplicateAnalysis
$wgCirrusSearchCategoryMax
$wgCirrusSearchExtraBackendLatency
$wgCirrusSearchNamespaceMappings
$wgCirrusSearchNamespaceResolutionMethod
$wgCirrusSearchPreferRecentUnspecifiedDecayPortion
$wgCirrusSearchIndexWeightedTagsPrefixMap
$wgCirrusSearchDocumentSizeLimiterProfiles
$wgCirrusSearchSearchShardTimeout
$wgCirrusSearchWeightedTags
$wgCirrusSearchRefreshInterval
$wgCirrusSearchSimilarityProfiles
$wgCirrusSearchCategoryEndpoint
$wgCirrusSearchMasterTimeout
$wgCirrusSearchPoolCounterKey
$wgCirrusSearchCompletionProfiles
$wgCirrusSearchMaxShardsPerNode
$wgCirrusSearchPrivateClusters
$wgCirrusSearchEnableArchive
$wgCirrusSearchUpdateDelay
$wgCirrusSearchInterwikiThreshold
$wgCirrusSearchIndexDeletes
$wgCirrusSearchDocumentSizeLimiterProfile
$wgCirrusSearchFiletypeAliases
$wgCirrusSearchDevelOptions
$wgCirrusSearchPrefixSearchStartsWithAnyWord
$wgCirrusSearchUpdateConflictRetryCount
$wgCirrusSearchInterwikiHTTPTimeout
$wgCirrusSearchFetchConfigFromApi
$wgCirrusSearchBoostTemplates
$wgCirrusSearchExtraIndexBoostTemplates
$wgCirrusSearchCompletionSuggesterSubphrases
$wgCirrusSearchPrefixIds
$wgCirrusSearchIndexedRedirects
$wgCirrusSearchMoreLikeThisFields
$wgCirrusSearchIndexAllocation
$wgCirrusSearchDefaultSemanticProfile
$wgCirrusSearchSanityCheck
$wgCirrusSearchStripQuestionMarks
$wgCirrusSearchNamespaceWeights
$wgCirrusSearchCrossProjectOrder
$wgCirrusSearchPhraseSuggestBuildVariant
$wgCirrusSearchIndexBaseName
$wgCirrusSearchMoreAccurateScoringMode
$wgCirrusSearchMaxPhraseTokens
$wgCirrusSearchCrossProjectSearchBlockList
$wgCirrusSearchPhraseSuggestUseOpeningText
$wgCirrusSearchCategoriesClientCacheTTL
$wgCirrusSearchMaxIncategoryOptions
$wgCirrusSearchEnableEventBusWeightedTags
$wgCirrusSearchWikimediaExtraPlugin
$wgCirrusSearchRescoreFunctionChains
$wgCirrusSearchLinkedArticlesToUpdate
$wgCirrusSearchRescoreProfile
$wgCirrusSearchPreferRecentDefaultHalfLife
$wgCirrusSearchDisableUpdate
$wgCirrusSearchFunctionRescoreWindowSize
$wgCirrusSearchActiveTest
$wgCirrusSearchPreferRecentDefaultDecayPortion
$wgCirrusSearchUseExperimentalHighlighter
$wgCirrusSearchCrossProjectProfiles
$wgCirrusSearchDefaultCluster
$wgCirrusSearchEnableAltLanguage
$wgCirrusSearchInterleaveConfig
$wgCirrusSearchPhraseRescoreWindowSize
$wgCirrusSearchSlowSearch
$wgCirrusSearchEnablePhraseSuggest
$wgCirrusSearchClientSideConnectTimeout
$wgCirrusSearchPhraseSuggestUseText
$wgCirrusSearchPhraseSuggestProfiles
$wgCirrusSearchSimilarityProfile
$wgCirrusSearchInterwikiSources
$wgCirrusSearchWeights
$wgCirrusSearchICUNormalizationUnicodeSetFilter
$wgCirrusSearchICUFoldingUnicodeSetFilter
$wgCirrusSearchShardCount
Hooks used
APIAfterExecute
APIQuerySiteInfoGeneralInfo
APIQuerySiteInfoStatisticsInfo
ApiBeforeMain
ArticleRevisionVisibilitySet
BeforeInitialize
CirrusSearchAddQueryFeatures
CirrusSearchAnalysisConfig
CirrusSearchSimilarityConfig
GetPreferences
LinksUpdateComplete
PageDelete
PageDeleteComplete
PageMoveComplete
PageUndeleteComplete
PrefixSearchExtractNamespace
ResourceLoaderGetConfigVars
SearchGetNearMatch
SearchIndexFields
ShowSearchHitTitle
SoftwareInfo
SpecialSearchResults
SpecialSearchResultsAppend
SpecialStatsAddExtra
TitleMove
UploadComplete
UserGetDefaultOptions
Hooks provided
CirrusSearchAddQueryFeatures
CirrusSearchAnalysisConfig
CirrusSearchBuildDocumentFinishBatch
CirrusSearchBuildDocumentLinks
CirrusSearchBuildDocumentParse
CirrusSearchFulltextQueryBuilder
CirrusSearchFulltextQueryBuilderComplete
CirrusSearchMappingConfig
CirrusSearchProfileService
CirrusSearchRegisterFullTextQueryClassifiers
CirrusSearchScoreBuilder
Licence
GNU General Public License 2.0 or later
Download extension
Git
Browse repository
GitHub
Gerrit code review
Git commit log
Download source tarball
README
Help
Help:Extension:CirrusSearch
Translate the CirrusSearch extension
if it is available at translatewiki.net
Vagrant role
cirrussearch
Issues
Open tasks
Report a bug
The
CirrusSearch
extension implements searching for MediaWiki using
Elasticsearch
CirrusSearch will be migrated to use OpenSearch as its backend. Please see
Wikimedia Search Platform/Decision Records/Search backend replacement technology
for more information.
Elasticsearch is a standalone third-party software you must install as a requirement for this extension.
It is a database system that provides search and indexing functionality, where the current text of your wiki pages gets indexed for faster and improved search results.
The communication between MediaWiki and Elasticsearch is done through web services.
See also the
help page on using
this extension.
Goals
No native dependencies that would make this difficult to install. The only dependencies are pure-PHP, MediaWiki extensions, and Elasticsearch itself.
Provide a near-real-time search index for wiki pages that's extendable by other MediaWiki extensions.
Provide all of the query options
MWSearch
has given users, and more.
Dependencies
PHP and cURL
In addition to the
standard MediaWiki requirements for PHP
, CirrusSearch requires PHP to be compiled with
cURL
support.
Elasticsearch or OpenSearch
You must
install Elasticsearch
or
Opensearch
Every version of Elasticsearch changes how web services work and causes compatibility problems.
You must install the version of Elasticsearch compatible with the version of MediaWiki you are currently using:
MediaWiki 1.39+ require Elasticsearch 7.10.2 (6.8.23+ is possible using a
compatibility layer
).
See
this revision
for compatibility information with earlier versions of MediaWiki.
MediaWiki 1.44+ is compatible with OpenSearch 1.3.
Elasticsearch versions before 6.8 are incompatible with PHP 8+.
Take note that a Java installation like
OpenJDK
is needed in addition.
It's best to use the official Elasticsearch Docker image or a self-hosted version.
A managed product like Amazon OpenSearch (formerly Amazon Elasticsearch) can work but may require additional configuration depending on its specifics.
For example, Amazon OpenSearch only listens for Elasticsearch API requests over HTTPS on port 443 (i.e., it does not expose the default Elasticsearch port 9200), so a TLS-enabled proxy (e.g., Nginx) can enable CirrusSearch to communicate with an Amazon OpenSearch cluster.
Elastica
Elastica is a PHP library that makes CirrusSearch talk to Elasticsearch. Install Elastica per the
instructions below
Other
Due to the actual handling of jobs by the CirrusSearch extension, it is advisable to
set up jobs in Redis
to prevent messages like
Notice: unserialize(): Error at offset 64870 of 65535 bytes in JobQueueDB.php
and subsequent errors like
Unsupported operand types
. See
T157759
Installation
Elastica
Even though the instructions below tell you only to run Composer when installing from git, it may be necessary to issue it anyway to install all PHP dependencies.
and move the extracted
Elastica
folder to your
extensions/
directory.
Developers and code contributors should install the extension
from Git
instead, using:
cd
extensions/
git
clone
Only when installing from Git, run
Composer
to install PHP dependencies, by issuing
composer install --no-dev
in the extension directory.
(See
T173141
for potential complications.)
Add the following code at the bottom of your
LocalSettings.php
file:
wfLoadExtension
'Elastica'
);
Done
– Navigate to
Special:Version
on your wiki to verify that the extension is successfully installed.
CirrusSearch
and move the extracted
CirrusSearch
folder to your
extensions/
directory.
Developers and code contributors should install the extension
from Git
instead, using:
cd
extensions/
git
clone
Only when installing from Git, run
Composer
to install PHP dependencies, by issuing
composer install --no-dev
in the extension directory.
(See
T173141
for potential complications.)
Add the following code at the bottom of your
LocalSettings.php
file:
wfLoadExtension
'CirrusSearch'
);
Now follow the setup instructions in the CirrusSearch
README
delivered with your extension i.e.
$IP
extensions
CirrusSearch
README
. Note that all info in it might not apply to your version of the extension, especially the version of
Elasticsearch
supported.
Configure as required.
Done
– Navigate to
Special:Version
on your wiki to verify that the extension is successfully installed.
Enable regex queries
This is an optional step.
You will need to install the
search-extra
plugin for this.
Do so by following these steps:
execute the following command:
/usr/share/elasticsearch/bin/elasticsearch-plugin/elasticsearch-plugin
install
org.wikimedia.search:extra:7.10.2-wmf12
add the following line to your
LocalSettings.php
file:
$wgCirrusSearchWikimediaExtraPlugin
'regex'
'build'
'use'
'max_inspect'
=>
10000
];
restart Elasticsearch with the following command:
systemctl
restart
elasticsearch
recreate the search index by executing the following commands:
php
path/to/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php
--startOver
php
path/to/extensions/CirrusSearch/maintenance/ForceSearchIndex.php
Upgrading
Please follow the upgrade instructions in the CirrusSearch
UPGRADE
file.
Configuration
The configuration parameters of CirrusSearch are documented at the
"settings.txt"
file.
See also documentation on
CirrusSearch configuration profiles
Elasticsearch will fail to index for CirrusSearch if one uses a database name for MySQL containing a capital character, e.g., "MyWikiDatabaseName." To mitigate this, CirrusSearch provides the
$wgCirrusSearchIndexBaseName
configuration parameter, which one needs to set, e.g.,
$wgCirrusSearchIndexBaseName
'mywikidatabasename'
Hooks
CirrusSearch extension defines a number of hooks that other extensions can make use of to extend the core schema and modify documents.
The following hooks are available:
CirrusSearchAnalysisConfig
– allows to hook into the configuration for analysis
CirrusSearchMappingConfig
– allows configuration of the mapping of fields
CirrusSearchBuildDocumentParse
– allows extensions to modify Elasticsearch document produced from a page
CirrusSearchBuildDocumentLinks
– allows extensions to process incoming and outgoing links for the document
CirrusSearchBuildDocumentFinishBatch
– called when a batch of pages has been indexed
CirrusSearchAddQueryFeatures
– allows extensions to add query parser features
CirrusSearchScoreBuilder
– allows extensions to define rescore builder functions
CirrusSearchProfileService
– allows extension to declare various search components and configuration
API
CirrusSearch features can be used in API queries.
Searching happens via the normal
search API
action=query&list=search
; you can use CirrusSearch-specific features, such as the
morelike:
special prefix to find pages related to
Marie Curie
and
radium
api.php?action=query&list=search&srsearch=morelike:Marie_Curie%7Cradium&srlimit=10&srprop=size&formatversion=2
Custom APIs and parameters are provided for querying CirrusSearch configuration and debug information:
action=cirrusdump
module:
2014?action=cirrusdump
cirrusDumpQuery
parameter to
Special:Search
or search API queries:
cirrusDumpResult
parameter to
Special:Search
or search API queries:
An additional parameter,
cirrusExplain
, can be passed with
cirrusDumpResult
to have the Lucene explanation of the score included with the result dump:
It can also be used to get the explanation in a human-readable format, by giving it one of the values
verbose
pretty
or
hot
, such as:
cirrus-config-dump
cirrus-settings-dump
cirrus-mapping-dump
cirrus-profiles-dump
modules to obtain dump from the CirrusSearch setup:
api.php?action=cirrus-config-dump&formatversion=2
See also
General links
Usage help page
– CirrusSearch usage documentation (needed after the install)
Project page
Info about Wikimedia Cirrus/Elastic setup
Configuration help page
– sets of tunable parameters that influence various aspects of the indexing
Extension:WikiSearch
– provides faceted search API for Semantic MediaWiki using Elasticsearch.
Extension:AdvancedSearch
– Enhances Special:Search by providing advanced parameters
Extension:SearchParserFunction
– provides a search parser function that supports Cirrus Search.
Debugging
How to determine that Cirrus is actually used as the search backend
Local development
Elasticsearch service can be run with the Vagrant role (
cirrussearch
) and MediaWiki Vagrant.
For Docker, you can use a command like
docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.8.2
Then follow the installation and configuration directions.
If your web host is in a container, you'll want to make sure the above container is on the same network, and in the
LocalSettings.php
file, you will want to reference the
elasticsearch
as the hostname.
This will not have the WMF plugins but can be sufficient for basic testing.
This extension is being used on one or more
Wikimedia projects
. This probably means that the extension is stable and works well enough to be used by such high-traffic websites. Look for this extension's name in Wikimedia's
CommonSettings.php
and
InitialiseSettings.php
configuration files to see where it's installed. A full list of the extensions installed on a particular wiki can be seen on the wiki's
Special:Version
page.
This extension is included in the following wiki farms/hosts and/or packages:
Canasta
Miraheze
MyWikis
semantic::core
wiki.gg
Retrieved from "
Categories
Stable extensions
Search extensions
API extensions
Hook extensions
Extensions supporting Composer
APIAfterExecute extensions
APIQuerySiteInfoGeneralInfo extensions
APIQuerySiteInfoStatisticsInfo extensions
ApiBeforeMain extensions
ArticleRevisionVisibilitySet extensions
BeforeInitialize extensions
CirrusSearchAddQueryFeatures extensions
CirrusSearchAnalysisConfig extensions
CirrusSearchSimilarityConfig extensions
GetPreferences extensions
LinksUpdateComplete extensions
PageDelete extensions
PageDeleteComplete extensions
PageMoveComplete extensions
PageUndeleteComplete extensions
PrefixSearchExtractNamespace extensions
ResourceLoaderGetConfigVars extensions
SearchGetNearMatch extensions
SearchIndexFields extensions
ShowSearchHitTitle extensions
SoftwareInfo extensions
SpecialSearchResults extensions
SpecialSearchResultsAppend extensions
SpecialStatsAddExtra extensions
TitleMove extensions
UploadComplete extensions
UserGetDefaultOptions extensions
GPL licensed extensions
Extensions in Wikimedia version control
All extensions
Extensions requiring Composer with git
Extensions used on Wikimedia
Extensions included in Canasta
Extensions included in Miraheze
Extensions included in MyWikis
Extensions included in semantic::core
Extensions included in wiki.gg
Discovery
Hidden categories:
Extensions without an image
Extensions with release branches compatibility policy
Extension
CirrusSearch
Add topic
US