Class SimplePatternSplitTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.pattern.SimplePatternSplitTokenizer
- All Implemented Interfaces:
Closeable,AutoCloseable
This tokenizer uses a Lucene
RegExp or (expert usage) a pre-built determinized Automaton, to locate tokens. The regexp syntax is more limited than PatternTokenizer,
but the tokenization is quite a bit faster. This is just like SimplePatternTokenizer
except that the pattern should make valid token separator characters, like String.split.
Empty string tokens are never produced.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State -
Field Summary
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY -
Constructor Summary
ConstructorsConstructorDescriptionSimplePatternSplitTokenizer(String regexp) SeeRegExpfor the accepted syntax.SimplePatternSplitTokenizer(AttributeFactory factory, String regexp, int determinizeWorkLimit) SeeRegExpfor the accepted syntax.SimplePatternSplitTokenizer(AttributeFactory factory, Automaton dfa) Runs a pre-built automaton.Runs a pre-built automaton. -
Method Summary
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset, setReader, setReaderTestPointMethods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Constructor Details
-
SimplePatternSplitTokenizer
SeeRegExpfor the accepted syntax. -
SimplePatternSplitTokenizer
Runs a pre-built automaton. -
SimplePatternSplitTokenizer
public SimplePatternSplitTokenizer(AttributeFactory factory, String regexp, int determinizeWorkLimit) SeeRegExpfor the accepted syntax. -
SimplePatternSplitTokenizer
Runs a pre-built automaton.
-
-
Method Details
-
incrementToken
- Specified by:
incrementTokenin classTokenStream- Throws:
IOException
-
end
- Overrides:
endin classTokenStream- Throws:
IOException
-
reset
- Overrides:
resetin classTokenizer- Throws:
IOException
-