Class Tokenizer
- All Implemented Interfaces:
Closeable,AutoCloseable
- Direct Known Subclasses:
StandardTokenizer
This is an abstract class; subclasses must override TokenStream.incrementToken()
NOTE: Subclasses overriding TokenStream.incrementToken() must call AttributeSource.clearAttributes() before setting attributes.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State -
Field Summary
FieldsFields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedConstruct a tokenizer with no input, awaiting a call tosetReader(java.io.Reader)to provide input.protectedTokenizer(AttributeFactory factory) Construct a tokenizer with no input, awaiting a call tosetReader(java.io.Reader)to provide input. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Releases resources associated with this stream.protected final intcorrectOffset(int currentOff) Return the corrected offset.voidreset()This method is called by a consumer before it begins consumption usingTokenStream.incrementToken().final voidExpert: Set a new reader on the Tokenizer.protected voidMethods inherited from class org.apache.lucene.analysis.TokenStream
end, incrementTokenMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
input
The text source for this Tokenizer.
-
-
Constructor Details
-
Tokenizer
protected Tokenizer()Construct a tokenizer with no input, awaiting a call tosetReader(java.io.Reader)to provide input. -
Tokenizer
Construct a tokenizer with no input, awaiting a call tosetReader(java.io.Reader)to provide input.- Parameters:
factory- attribute factory.
-
-
Method Details
-
close
Releases resources associated with this stream.If you override this method, always call
super.close(), otherwise some internal state will not be correctly reset (e.g.,Tokenizerwill throwIllegalStateExceptionon reuse).NOTE: The default implementation closes the input Reader, so be sure to call
super.close()when overriding this method.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classTokenStream- Throws:
IOException
-
correctOffset
protected final int correctOffset(int currentOff) Return the corrected offset. Ifinputis aCharFiltersubclass this method callsCharFilter.correctOffset(int), else returnscurrentOff.- Parameters:
currentOff- offset as seen in the output- Returns:
- corrected offset based on the input
- See Also:
-
setReader
Expert: Set a new reader on the Tokenizer. Typically, an analyzer (in its tokenStream method) will use this to re-use a previously created tokenizer. -
reset
Description copied from class:TokenStreamThis method is called by a consumer before it begins consumption usingTokenStream.incrementToken().Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
If you override this method, always call
super.reset(), otherwise some internal state will not be correctly reset (e.g.,Tokenizerwill throwIllegalStateExceptionon further usage).- Overrides:
resetin classTokenStream- Throws:
IOException
-
setReaderTestPoint
protected void setReaderTestPoint()- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-