Class BloomFilter
java.lang.Object
org.apache.drill.exec.work.filter.BloomFilter
According to Putze et al.'s "Cache-, Hash- and Space-Efficient BloomFilter
Filters", see this paper
for details, the main theory is to construct tiny bucket bloom filters which benefit to
the cpu cache and SIMD opcode.
-
Constructor Summary
ConstructorDescriptionBloomFilter
(int ndv, double fpp, BufferAllocator bufferAllocator) BloomFilter
(int numBytes, BufferAllocator bufferAllocator) BloomFilter
(DrillBuf byteBuf) -
Method Summary
Modifier and TypeMethodDescriptionstatic int
adjustByteSize
(int numBytes) boolean
find
(long hash) Determine whether an element is set or not.void
insert
(long hash) Add an element's hash value to this bloom filter.static int
optimalNumOfBytes
(long ndv, double fpp) Calculate optimal size according to the number of distinct values and false positive probability.void
or
(BloomFilter other) Merge this bloom filter with other one
-
Constructor Details
-
BloomFilter
-
BloomFilter
-
BloomFilter
-
-
Method Details
-
adjustByteSize
public static int adjustByteSize(int numBytes) -
insert
public void insert(long hash) Add an element's hash value to this bloom filter.- Parameters:
hash
- hash result of element.
-
find
public boolean find(long hash) Determine whether an element is set or not.- Parameters:
hash
- the hash value of element.- Returns:
- false if the element is not set, true if the element is probably set.
-
or
Merge this bloom filter with other one- Parameters:
other
- other bloom filter
-
optimalNumOfBytes
public static int optimalNumOfBytes(long ndv, double fpp) Calculate optimal size according to the number of distinct values and false positive probability. See http://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives for the formula.- Parameters:
ndv
- The number of distinct values.fpp
- The false positive probability.- Returns:
- optimal number of bytes of given ndv and fpp.
-
getContent
-