Package org.apache.lucene.util.fst
package org.apache.lucene.util.fst
Finite state transducers
This package implements Finite State Transducers with the following characteristics:
- Fast and low memory overhead construction of the minimal FST (but inputs must be provided in sorted order)
- Low object overhead and quick deserialization (byte[] representation)
- Pluggable
Outputsrepresentation N-shortest-pathssearch by weight- Enumerators (
IntsRefandBytesRef) that behave likeSortedMapiterators
FST Construction example:
// Input values (keys). These must be provided to Builder in Unicode code point (UTF8 or UTF32) sorted order.
// Note that sorting by Java's String.compareTo, which is UTF16 sorted order, is not correct and can lead to
// exceptions while building the FST:
String inputValues[] = {"cat", "dog", "dogs"};
long outputValues[] = {5, 7, 12};
PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
FSTCompiler<Long> fstCompiler = new FSTCompiler<Long>(INPUT_TYPE.BYTE1, outputs);
BytesRefBuilder scratchBytes = new BytesRefBuilder();
IntsRefBuilder scratchInts = new IntsRefBuilder();
for (int i = 0; i < inputValues.length; i++) {
scratchBytes.copyChars(inputValues[i]);
fstCompiler.add(Util.toIntsRef(scratchBytes.toBytesRef(), scratchInts), outputValues[i]);
}
FST<Long> fst = FST.fromFSTReader(fstCompiler.compile(), fstCompiler.getFSTReader());
Retrieval by key:
Long value = Util.get(fst, new BytesRef("dog"));
System.out.println(value); // 7
Retrieval by value:
// Only works because outputs are also in sorted order
IntsRef key = Util.getByOutput(fst, 12);
System.out.println(Util.toBytesRef(key, scratchBytes).utf8ToString()); // dogs
Iterate over key-value pairs in sorted order:
// Like TermsEnum, this also supports seeking (advance)
BytesRefFSTEnum<Long> iterator = new BytesRefFSTEnum<Long>(fst);
while (iterator.next() != null) {
InputOutput<Long> mapEntry = iterator.current();
System.out.println(mapEntry.input.utf8ToString());
System.out.println(mapEntry.output);
}
N-shortest paths by weight:
Comparator<Long> comparator = new Comparator<Long>() {
public int compare(Long left, Long right) {
return left.compareTo(right);
}
};
Arc<Long> firstArc = fst.getFirstArc(new Arc<Long>());
Util.TopResults<Long> paths = Util.shortestPaths(fst, firstArc, fst.outputs.getNoOutput(), comparator, 3, true);
System.out.println(Util.toBytesRef(paths.topN.get(0).input, scratchBytes).utf8ToString()); // cat
System.out.println(paths.topN.get(0).output); // 5
System.out.println(Util.toBytesRef(paths.topN.get(1).input, scratchBytes).utf8ToString()); // dog
System.out.println(paths.topN.get(1).output); // 7
-
ClassDescriptionAn FST
Outputsimplementation where each output is a sequence of bytes.Enumerates all input (BytesRef) + output pairs in an FST.Holds a single input (BytesRef) + output pair.An FSTOutputsimplementation where each output is a sequence of characters.FST<T>Represents an finite state machine (FST), using a compact byte[] format.FST.Arc<T>Represents a single arc.Reads bytes stored in an FST.Represents the FST metadata.Specifies allowed range of each int input label for this FST.FSTCompiler<T>Builds a minimal FST (maps an IntsRef term to an arbitrary output) from pre-sorted terms with outputs.Fluent-style constructor for FSTFSTCompiler.Abstraction for reading bytes necessary for FST.An FSTOutputsimplementation where each output is a sequence of ints.Enumerates all input (IntsRef) + output pairs in an FST.Holds a single input (IntsRef) + output pair.A null FSTOutputsimplementation; use this if you just want to build an FSA.Provides off heap storage of finite state machine (FST), using underlying index input instead of byte store on heapProvides storage of finite state machine (FST), using byte array or byte store allocated on heap.Outputs<T>Represents the outputs for an FST, providing the basic algebra required for building and traversing the FST.PairOutputs<A,B> An FSTOutputsimplementation, holding two other outputs.PairOutputs.Pair<A,B> Holds a single pair of two outputs.An FSTOutputsimplementation where each output is a non-negative long value.Static helper methods.Util.FSTPath<T>Represents a path in TopNSearcher.Util.Result<T>Holds a single input (IntsRef) + output, returned byshortestPaths().Utility class to find top N shortest paths from start point(s).Holds the results for a top N search usingUtil.TopNSearcher