Keyword - OpenSearch Documentation

Keyword - OpenSearch Documentation
Keyword | OpenSearch Documentation
OpenSearch
About
Releases
Roadmap
FAQ
Platform
Observability
Security Analytics
Vector Database
Playground Demo
Performance Benchmarks
Community
Forum
Slack
Events
Solutions Providers
Projects
Members
Documentation
OpenSearch and Dashboards
Data Prepper
Clients
Benchmark
Migration Assistant
Blog
Documentation
Keyword field type
Introduced 1.0
A keyword field type contains a string that is not analyzed. It allows only exact, case-sensitive matches.
By default, keyword fields are both indexed (because
index
is enabled) and stored on disk (because
doc_values
is enabled). To reduce disk space, you can specify not to index keyword fields by setting
index
to
false
If you need to use a field for full-text search, map it as
text
instead.
Example
The following query creates a mapping with a keyword field. Setting
index
to
false
specifies to store the
genre
field on disk and to retrieve it using
doc_values
PUT
movies
"mappings"
"properties"
"genre"
"type"
"keyword"
"index"
false
copy
Parameters
The following table lists the parameters accepted by keyword field types. All parameters are optional.
Parameter
Description
Default value
Dynamically updatable
boost
A floating-point value that specifies the weight of this field toward the relevance score. Values above
1.0
increase the field’s relevance. Values between
0.0
and
1.0
decrease the field’s relevance.
1.0
Yes
doc_values
A Boolean value that specifies whether the field should be stored on disk so that it can be used for aggregations, sorting, or scripting.
true
No
eager_global_ordinals
Specifies whether global ordinals should be loaded eagerly on refresh. If the field is often used for aggregations, this parameter should be set to
true
false
Yes
fields
To index the same string in several ways (for example, as a keyword and text), provide the fields parameter. You can specify one version of the field to be used for search and another to be used for sorting and aggregations.
None
No
ignore_above
Any string longer than this integer value should not be indexed. Default dynamic mapping creates a keyword subfield for which
ignore_above
is set to
256
2147483647
Yes
index
A Boolean value that specifies whether the field should be searchable. To reduce disk space, set
index
to
false
true
No
index_options
Information to be stored in the index that will be considered when calculating relevance scores. Can be set to
freqs
for term frequency.
docs
No
meta
Accepts metadata for this field.
None
Yes
normalizer
Specifies how to preprocess this field before indexing (for example, make it lowercase).
null
(no preprocessing)
No
norms
A Boolean value that specifies whether the field length should be used when calculating relevance scores.
false
Yes
null_value
A value to be used in place of
null
. Must be of the same type as the field. If this parameter is not specified, the field is treated as missing when its value is
null
null
No
similarity
The ranking algorithm for calculating relevance scores.
The index’s
similarity
setting (by default,
BM25
No
use_similarity
Determines whether to calculate relevance scores. Default is
false
, which uses
constant_score
for faster queries. Setting this parameter to
true
enables scoring but may increase search latency. See
The use_similarity parameter
false
Yes
split_queries_on_whitespace
A Boolean value that specifies whether full-text queries should be split on white space.
false
Yes
store
A Boolean value that specifies whether the field value should be stored and can be retrieved separately from the
_source
field.
false
No
The use_similarity parameter
The
use_similarity
parameter controls whether OpenSearch calculates relevance scores when querying a
keyword
field. By default, it is set to
false
, which improves performance by using
constant_score
. Setting it to
true
enables scoring based on the configured similarity algorithm (typically, BM25) but may increase query latency.
Run a term query on the index for which
use_similarity
is disabled (default):
GET
/big
/_search
"size"
"explain"
false
"query"
"term"
"process.name"
"kernel"
},
"_source"
false
copy
The query returns results quickly (10 ms), and all documents receive a constant relevance score of 1.0:
"took"
10
"timed_out"
false
"_shards"
"total"
"successful"
"skipped"
"failed"
},
"hits"
"total"
"value"
10000
"relation"
"gte"
},
"max_score"
"hits"
"_index"
"big5"
"_id"
"xDoCtJQBE3c7bAfikzbk"
"_score"
},
"_index"
"big5"
"_id"
"xzoCtJQBE3c7bAfikzbk"
"_score"
},
"_index"
"big5"
"_id"
"yDoCtJQBE3c7bAfikzbk"
"_score"
To enable scoring using the default BM25 algorithm for the
process.name
field, provide the
use_similarity
parameter in the index mappings:
PUT
/big
/_mapping
"properties"
"process.name"
"type"
"keyword"
"use_similarity"
true
When you run the same term query on the configured index, the query takes longer to run (200 ms), and the returned documents have varying relevance scores based on term frequency and other BM25 factors:
"took"
200
"timed_out"
false
"_shards"
"total"
"successful"
"skipped"
"failed"
},
"hits"
"total"
"value"
10000
"relation"
"gte"
},
"max_score"
0.8844931
"hits"
"_index"
"big5"
"_id"
"xDoCtJQBE3c7bAfikzbk"
"_score"
0.8844931
},
"_index"
"big5"
"_id"
"xzoCtJQBE3c7bAfikzbk"
"_score"
0.8844931
},
"_index"
"big5"
"_id"
"yDoCtJQBE3c7bAfikzbk"
"_score"
0.8844931
Derived source
When an index uses
derived source
, OpenSearch may sort keyword values and remove duplicates in multi-value keyword fields during source reconstruction.
Create an index that enables derived source and configures a
name
field:
PUT
sample-index
"settings"
"index"
"derived_source"
"enabled"
true
},
"mappings"
"properties"
"name"
"type"
"keyword"
Index a document with multiple keyword values, including duplicates, into the index:
PUT
sample-index
/_doc/
"name"
"ba"
"ab"
"ac"
"ba"
After OpenSearch reconstructs
_source
, the derived
_source
removes duplicates and sorts the values alphabetically:
"name"
"ab"
"ac"
"ba"
If the field mapping defines a
null_value
, any ingested null values are replaced with that value during reconstruction. The following example demonstrates how
null_value
affects derived source output.
Create an index that enables derived source and configures a
null_value
for the
name
field:
PUT
sample-index
"settings"
"index"
"derived_source"
"enabled"
true
},
"mappings"
"properties"
"name"
"type"
"keyword"
"null_value"
"foo"
Index a document with null values into the index:
PUT
sample-index
/_doc/
"name"
null
"ba"
"ab"
After OpenSearch reconstructs
_source
, the derived
_source
replaces null values and sorts the values alphabetically:
"name"
"ab"
"ba"
"foo"
Example
Parameters
The use_similarity parameter
Derived source
WAS THIS PAGE HELPFUL?
✔ Yes
✖ No
Tell us why
350 characters left
Thank you for your feedback!
Have a question?
Ask us on the OpenSearch forum
Want to contribute?
Edit this page
or
create an issue
OpenSearch Links
Get Involved
Code of Conduct
Forum
GitHub
Slack
Resources
About
Release Schedule
Maintenance Policy
FAQ
Testimonials
Trademark and Brand Policy
Connect
Meetup
Copyright © OpenSearch Project a Series of LF Projects, LLC
For web site terms of use, trademark policy and other project policies please see