Class ConcatenateGraphFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.miscellaneous.ConcatenateGraphFilter
- All Implemented Interfaces:
Closeable,AutoCloseable
Concatenates/Joins every incoming token with a separator into one output token for every path
through the token stream (which is a graph). In simple cases this yields one token, but in the
presence of any tokens with a zero positionIncrmeent (e.g. synonyms) it will be more. This filter
uses the token bytes, position increment, and position length of the incoming stream. Other
attributes are not used or manipulated.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceAttribute providing access to the term builder and UTF-16 conversionstatic final classImplementation ofConcatenateGraphFilter.BytesRefBuilderTermAttributeNested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final booleanstatic final booleanstatic final Characterstatic final intRepresents the default separator between tokens.Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY -
Constructor Summary
ConstructorsConstructorDescriptionConcatenateGraphFilter(TokenStream inputTokenStream) Creates a token stream to convertinputto a token stream of accepted strings by its token stream graph.ConcatenateGraphFilter(TokenStream inputTokenStream, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) ConcatenateGraphFilter(TokenStream inputTokenStream, Character tokenSeparator, boolean preservePositionIncrements, int maxGraphExpansions) Creates a token stream to convertinputto a token stream of accepted strings by its token stream graph. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()voidend()booleanvoidreset()Converts the tokenStream to an automaton, treating the transition labels as utf-8.toAutomaton(boolean unicodeAware) Converts the tokenStream to an automaton.Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
SEP_LABEL
public static final int SEP_LABELRepresents the default separator between tokens.- See Also:
-
DEFAULT_MAX_GRAPH_EXPANSIONS
public static final int DEFAULT_MAX_GRAPH_EXPANSIONS- See Also:
-
DEFAULT_TOKEN_SEPARATOR
-
DEFAULT_PRESERVE_SEP
public static final boolean DEFAULT_PRESERVE_SEP- See Also:
-
DEFAULT_PRESERVE_POSITION_INCREMENTS
public static final boolean DEFAULT_PRESERVE_POSITION_INCREMENTS- See Also:
-
-
Constructor Details
-
ConcatenateGraphFilter
Creates a token stream to convertinputto a token stream of accepted strings by its token stream graph.This constructor uses the default settings of the constants in this class.
-
ConcatenateGraphFilter
public ConcatenateGraphFilter(TokenStream inputTokenStream, Character tokenSeparator, boolean preservePositionIncrements, int maxGraphExpansions) Creates a token stream to convertinputto a token stream of accepted strings by its token stream graph.- Parameters:
inputTokenStream- The input/incoming TokenStreamtokenSeparator- Separator to use for concatenation. Can be null, in this case tokens will be concatenated without any separators.preservePositionIncrements- Whether to add an empty token for missing positions. The effect is a consecutiveSEP_LABEL. When false, it's as if there were no missing positions (we pretend the surrounding tokens were adjacent).maxGraphExpansions- If the tokenStream graph has more than this many possible paths through, then we'll throwTooComplexToDeterminizeExceptionto preserve the stability and memory of the machine.- Throws:
TooComplexToDeterminizeException- if the tokenStream graph has more thanmaxGraphExpansionsexpansions
-
ConcatenateGraphFilter
public ConcatenateGraphFilter(TokenStream inputTokenStream, boolean preserveSep, boolean preservePositionIncrements, int maxGraphExpansions) CallsConcatenateGraphFilter(org.apache.lucene.analysis.TokenStream, java.lang.Character, boolean, int)- Parameters:
preserveSep- WhetherSEP_LABELshould separate the input tokens in the concatenated token
-
-
Method Details
-
reset
- Overrides:
resetin classTokenStream- Throws:
IOException
-
incrementToken
- Specified by:
incrementTokenin classTokenStream- Throws:
IOException
-
end
- Overrides:
endin classTokenStream- Throws:
IOException
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classTokenStream- Throws:
IOException
-
toAutomaton
Converts the tokenStream to an automaton, treating the transition labels as utf-8. Does *not* close it.- Throws:
IOException
-
toAutomaton
Converts the tokenStream to an automaton. Does *not* close it.- Throws:
IOException
-