Enum xapian::constants::TermGeneratorFlag
source · #[repr(i32)]
pub enum TermGeneratorFlag {
FLAG_DEFAULT,
FLAG_SPELLING,
FLAG_NGRAMS,
FLAG_WORD_BREAKS,
}
Expand description
Flags to OR together and pass to TermGenerator::set_flags().
Variants§
FLAG_DEFAULT
FLAG_SPELLING
Index data required for spelling correction.
FLAG_NGRAMS
Generate n-grams for scripts without explicit word breaks.
Spans of characters in such scripts are split into unigrams and bigrams, with the unigrams carrying positional information. Text in other scripts is split into words as normal.
The QueryParser::FLAG_NGRAMS flag needs to be passed to QueryParser.
This mode can also be enabled in 1.2.8 and later by setting environment variable XAPIAN_CJK_NGRAM to a non-empty value (but doing so was deprecated in 1.4.11).
In 1.4.x this feature was specific to CJK (Chinese, Japanese and Korean), but in 1.5.0 it’s been extended to other languages. To reflect this change the new and preferred name is FLAG_NGRAMS, which was added as an alias for forward compatibility in Xapian 1.4.23. Use FLAG_CJK_NGRAM instead if you aim to support Xapian < 1.4.23.
@since Added in Xapian 1.4.23.
FLAG_WORD_BREAKS
Find word breaks for text in scripts without explicit word breaks.
With this option enabled, spans of text written in such scripts are split into words using ICU (which uses heuristics and/or dictionaries to do so). Text in other scripts is split into words as normal.
The QueryParser::FLAG_WORD_BREAKS flag needs to be passed to QueryParser.
@since Added in Xapian 1.5.0.