Class IntersectBlockReader
- All Implemented Interfaces:
Accountable,BytesRefIterator
- Direct Known Subclasses:
STIntersectBlockReader
TermsEnum response to UniformSplitTerms.intersect(CompiledAutomaton, BytesRef), intersecting the terms with an
automaton.
By design of the UniformSplit block keys, it is less efficient than
org.apache.lucene.backward_codecs.lucene40.blocktree.IntersectTermsEnum for FuzzyQuery (-37%). It is slightly slower for WildcardQuery (-5%) and slightly faster for PrefixQuery (+5%).
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected classThis is mostly a copy of AutomatonTermsEnum.protected static enumBlock iteration order.Nested classes/interfaces inherited from class org.apache.lucene.index.TermsEnum
TermsEnum.SeekStatus -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected IntersectBlockReader.BlockIterationBlock iteration order determined when scanning the terms in the current block.protected final ByteRunnableprotected final BytesRefprotected final booleanprotected final intprotected final IntersectBlockReader.AutomatonNextTermCalculatorprotected final intThreshold that controls when to attempt to jump to a block away.protected intCounter of the number of consecutively rejected terms.protected intNumber of bytes accepted by the automaton when validating the current term.protected BytesRefSet this when our current mode is seeking to this term.protected int[]Automaton states reached when validating the current term, from 0 tonumMatchedBytes- 1.protected final TransitionAccessorFields inherited from class org.apache.lucene.codecs.uniformsplit.BlockReader
blockDecoder, blockFirstLineStart, blockHeader, blockHeaderReader, blockInput, blockLine, blockLineReader, blockReadBuffer, blockStartFP, dictionaryBrowser, dictionaryBrowserSupplier, fieldMetadata, forcedTerm, lineIndexInBlock, postingsReader, scratchBlockBytes, scratchBlockLine, scratchTermState, termState, termStateForced, termStateSerializer, termStatesReadBufferFields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedIntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase postingsReader, FieldMetadata fieldMetadata, BlockDecoder blockDecoder) -
Method Summary
Modifier and TypeMethodDescriptionprotected booleanendsWithCommonSuffix(byte[] termBytes, int termLength) Indicates whether the given term ends with the automaton common suffix.protected intComputes the minimal length of the terms accepted by the automaton.next()protected booleanOpens the next block.protected BytesRefFinds the next block line that matches (accepted by the automaton), or null when at end of block.voidseekExact(long ord) Not supported.booleanvoidPositions thisBlockReaderwithout re-seeking the term dictionary.protected booleanMethods inherited from class org.apache.lucene.codecs.uniformsplit.BlockReader
clearTermState, compareToMiddleAndJump, createBlockHeaderSerializer, createBlockLineSerializer, createDeltaBaseTermStateSerializer, decodeBlockBytesIfNeeded, docFreq, getOrCreateDictionaryBrowser, impacts, initializeBlockReadLazily, initializeHeader, isBeyondLastTerm, isCurrentTerm, newCorruptIndexException, nextTerm, ord, postings, ramBytesUsed, readHeader, readLineInBlock, readTermState, readTermStateIfNotRead, seekInBlock, seekInBlock, term, termState, totalTermFreqMethods inherited from class org.apache.lucene.index.BaseTermsEnum
attributes, prepareSeekExactMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
Field Details
-
NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD
protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLDThreshold that controls when to attempt to jump to a block away.This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. When the counter is greater than or equal to this threshold, then we compute the next term accepted by the automaton, with
IntersectBlockReader.AutomatonNextTermCalculator, and we jump to a block away if the next term accepted is greater than the immediate next term in the block.A low value, for example 1, improves the performance of automatons requiring many jumps, for example
FuzzyQueryand mostWildcardQuery. A higher value improves the performance of automatons with less or no jump, for examplePrefixQuery. A threshold of 4 seems to be a good balance.- See Also:
-
transitionAccessor
-
byteRunnable
-
finite
protected final boolean finite -
commonSuffix
-
minTermLength
protected final int minTermLength -
nextStringCalculator
-
seekTerm
Set this when our current mode is seeking to this term. Set to null after. -
numMatchedBytes
protected int numMatchedBytesNumber of bytes accepted by the automaton when validating the current term. -
states
protected int[] statesAutomaton states reached when validating the current term, from 0 tonumMatchedBytes- 1. -
blockIteration
Block iteration order determined when scanning the terms in the current block. -
numConsecutivelyRejectedTerms
protected int numConsecutivelyRejectedTermsCounter of the number of consecutively rejected terms. Depending onNUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD, this may trigger a jump to a block away.
-
-
Constructor Details
-
IntersectBlockReader
protected IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase postingsReader, FieldMetadata fieldMetadata, BlockDecoder blockDecoder) throws IOException - Throws:
IOException
-
-
Method Details
-
getMinTermLength
protected int getMinTermLength()Computes the minimal length of the terms accepted by the automaton. This speeds up the term scanning for automatons accepting a finite language. -
next
- Specified by:
nextin interfaceBytesRefIterator- Overrides:
nextin classBlockReader- Throws:
IOException
-
seekFirstBlock
- Throws:
IOException
-
nextTermInBlockMatching
Finds the next block line that matches (accepted by the automaton), or null when at end of block.- Returns:
- The next term in the current block that is accepted by the automaton; or null if none.
- Throws:
IOException
-
endsWithCommonSuffix
protected boolean endsWithCommonSuffix(byte[] termBytes, int termLength) Indicates whether the given term ends with the automaton common suffix. This allows to quickly skip terms that the automaton would reject eventually. -
nextBlock
Opens the next block. Depending on theblockIterationorder, it may be the very next block, or a block away that may containseekTerm.- Returns:
- true if the next block is opened; false if there is no blocks anymore and the iteration is over.
- Throws:
IOException
-
seekExact
- Overrides:
seekExactin classBlockReader
-
seekExact
public void seekExact(long ord) Description copied from class:BlockReaderNot supported.- Overrides:
seekExactin classBlockReader
-
seekExact
Description copied from class:BlockReaderPositions thisBlockReaderwithout re-seeking the term dictionary.The block containing the term is not read by this method. It will be read lazily only if needed, for example if
BlockReader.next()is called. CallingBlockReader.postings(org.apache.lucene.index.PostingsEnum, int)after this method does require the block to be read.- Overrides:
seekExactin classBlockReader
-
seekCeil
- Overrides:
seekCeilin classBlockReader
-