Package org.apache.lucene.tests.analysis
Class Token
- All Implemented Interfaces:
Appendable,CharSequence,Cloneable,CharTermAttribute,FlagsAttribute,OffsetAttribute,PayloadAttribute,PositionIncrementAttribute,PositionLengthAttribute,TermFrequencyAttribute,TermToBytesRefAttribute,TypeAttribute,Attribute
A Token is an occurrence of a term from the text of a field. It consists of a term's text, the
start and end offset of the term in the text of the field, and a type string.
The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC display, etc.
The type is a string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word".
A Token can optionally have metadata (a.k.a. payload) in the form of a variable length byte
array. Use PostingsEnum.getPayload() to retrieve the payloads
from the index.
A few things to note:
- clear() initializes all of the fields to default values. This was changed in contrast to Lucene 2.4, but should affect no one.
- Because
TokenStreamscan be chained, one cannot assume that theToken'scurrent type is correct. - The startOffset and endOffset represent the start and offset in the source text, so be careful in adjusting them.
- When caching a reusable token, clone it. When injecting a cached token into a stream that can be reset, clone it again.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final AttributeFactoryConvenience factory that returnsTokenas implementation for the basic attributes and return the default impl (with "Impl" appended) for all other attributes.Fields inherited from class org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl
builderFields inherited from interface org.apache.lucene.analysis.tokenattributes.TypeAttribute
DEFAULT_TYPE -
Constructor Summary
ConstructorsConstructorDescriptionToken()Constructs a Token will null text.Token(CharSequence text, int start, int end) Constructs a Token with the given term text, start and end offsets.Token(CharSequence text, int posInc, int start, int end) Constructs a Token with the given term text, position increment, start and end offsetsToken(CharSequence text, int posInc, int start, int end, int posLength) -
Method Summary
Modifier and TypeMethodDescriptionvoidclear()Resets the term text, payload, flags, positionIncrement, positionLength, startOffset, endOffset and token type to default.clone()voidcopyTo(AttributeImpl target) booleanintgetFlags()inthashCode()voidreflectWith(AttributeReflector reflector) voidCopy the prototype token's fields into this one.voidsetFlags(int flags) voidsetPayload(BytesRef payload) Methods inherited from class org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl
end, endOffset, getPositionIncrement, getPositionLength, getTermFrequency, setOffset, setPositionIncrement, setPositionLength, setTermFrequency, setType, startOffset, typeMethods inherited from class org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl
append, append, append, append, append, append, buffer, charAt, copyBuffer, getBytesRef, length, resizeBuffer, setEmpty, setLength, subSequence, toStringMethods inherited from class org.apache.lucene.util.AttributeImpl
reflectAsStringMethods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.CharSequence
chars, codePoints, isEmpty
-
Field Details
-
TOKEN_ATTRIBUTE_FACTORY
Convenience factory that returnsTokenas implementation for the basic attributes and return the default impl (with "Impl" appended) for all other attributes.- Since:
- 3.0
-
-
Constructor Details
-
Token
public Token()Constructs a Token will null text. -
Token
Constructs a Token with the given term text, start and end offsets. The type defaults to "word." NOTE: for better indexing speed you should instead use the char[] termBuffer methods to set the term text.- Parameters:
text- term textstart- start offset in the source textend- end offset in the source text
-
Token
Constructs a Token with the given term text, position increment, start and end offsets -
Token
-
-
Method Details
-
getFlags
public int getFlags()- Specified by:
getFlagsin interfaceFlagsAttribute- See Also:
-
setFlags
public void setFlags(int flags) - Specified by:
setFlagsin interfaceFlagsAttribute- See Also:
-
getPayload
- Specified by:
getPayloadin interfacePayloadAttribute- See Also:
-
setPayload
- Specified by:
setPayloadin interfacePayloadAttribute- See Also:
-
clear
public void clear()Resets the term text, payload, flags, positionIncrement, positionLength, startOffset, endOffset and token type to default.- Overrides:
clearin classPackedTokenAttributeImpl
-
equals
- Overrides:
equalsin classPackedTokenAttributeImpl
-
hashCode
public int hashCode()- Overrides:
hashCodein classPackedTokenAttributeImpl
-
clone
- Overrides:
clonein classPackedTokenAttributeImpl
-
reinit
Copy the prototype token's fields into this one. Note: Payloads are shared.- Parameters:
prototype- source Token to copy fields from
-
copyTo
- Overrides:
copyToin classPackedTokenAttributeImpl
-
reflectWith
- Overrides:
reflectWithin classPackedTokenAttributeImpl
-