Class SimilarityBase
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.SimilarityBase
- Direct Known Subclasses:
Axiomatic
,DFISimilarity
,DFRSimilarity
,IBSimilarity
,LMSimilarity
A subclass of
Similarity
that provides a simplified API for its descendants. Subclasses
are only required to implement the score(org.apache.lucene.search.similarities.BasicStats, double, double)
and toString()
methods. Implementing
explain(List, BasicStats, double, double)
is optional, inasmuch as SimilarityBase
already provides a basic explanation of the score and the term frequency. However, implementers
of a subclass are encouraged to include as much detail about the scoring method as possible.
Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
Similarity.SimScorer
-
Constructor Summary
ConstructorsConstructorDescriptionDefault constructor: parameter-freeSimilarityBase
(boolean discountOverlaps) Primary constructor. -
Method Summary
Modifier and TypeMethodDescriptionprotected void
explain
(List<Explanation> subExpls, BasicStats stats, double freq, double docLen) Subclasses should implement this method to explain the score.protected Explanation
explain
(BasicStats stats, Explanation freq, double docLen) Explains the score.protected void
fillBasicStats
(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats) Fills all member fields defined inBasicStats
instats
.static double
log2
(double x) Returns the base two logarithm ofx
.protected BasicStats
Factory method to return a custom stats objectprotected abstract double
score
(BasicStats stats, double freq, double docLen) Scores the documentdoc
.final Similarity.SimScorer
scorer
(float boost, CollectionStatistics collectionStats, TermStatistics... termStats) Compute any collection-level weight (e.g.abstract String
toString()
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.Methods inherited from class org.apache.lucene.search.similarities.Similarity
computeNorm, getDiscountOverlaps
-
Constructor Details
-
SimilarityBase
public SimilarityBase()Default constructor: parameter-free -
SimilarityBase
public SimilarityBase(boolean discountOverlaps) Primary constructor.
-
-
Method Details
-
scorer
public final Similarity.SimScorer scorer(float boost, CollectionStatistics collectionStats, TermStatistics... termStats) Description copied from class:Similarity
Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.- Specified by:
scorer
in classSimilarity
- Parameters:
boost
- a multiplicative factor to apply to the produces scorescollectionStats
- collection-level statistics, such as the number of tokens in the collection.termStats
- term-level statistics, such as the document frequency of a term across the collection.- Returns:
- SimWeight object with the information this Similarity needs to score a query.
-
newStats
Factory method to return a custom stats object -
fillBasicStats
protected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats) Fills all member fields defined inBasicStats
instats
. Subclasses can override this method to fill additional stats. -
score
Scores the documentdoc
.Subclasses must apply their scoring formula in this class.
- Parameters:
stats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.- Returns:
- the score.
-
explain
Subclasses should implement this method to explain the score.expl
already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.The default implementation does nothing.
- Parameters:
subExpls
- the list of details of the explanation to extendstats
- the corpus level statistics.freq
- the term frequency.docLen
- the document length.
-
explain
Explains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via thescore(BasicStats, double, double)
method) and the explanation for the term frequency. Subclasses content with this format may add additional details inexplain(List, BasicStats, double, double)
.- Parameters:
stats
- the corpus level statistics.freq
- the term frequency and its explanation.docLen
- the document length.- Returns:
- the explanation.
-
toString
Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well. -
log2
public static double log2(double x) Returns the base two logarithm ofx
.
-