Class BloomFilter

java.lang.Object
org.apache.drill.exec.work.filter.BloomFilter

public class BloomFilter extends Object
According to Putze et al.'s "Cache-, Hash- and Space-Efficient BloomFilter Filters", see this paper for details, the main theory is to construct tiny bucket bloom filters which benefit to the cpu cache and SIMD opcode.
  • Constructor Details

    • BloomFilter

      public BloomFilter(int numBytes, BufferAllocator bufferAllocator)
    • BloomFilter

      public BloomFilter(int ndv, double fpp, BufferAllocator bufferAllocator)
    • BloomFilter

      public BloomFilter(DrillBuf byteBuf)
  • Method Details

    • adjustByteSize

      public static int adjustByteSize(int numBytes)
    • insert

      public void insert(long hash)
      Add an element's hash value to this bloom filter.
      Parameters:
      hash - hash result of element.
    • find

      public boolean find(long hash)
      Determine whether an element is set or not.
      Parameters:
      hash - the hash value of element.
      Returns:
      false if the element is not set, true if the element is probably set.
    • or

      public void or(BloomFilter other)
      Merge this bloom filter with other one
      Parameters:
      other - other bloom filter
    • optimalNumOfBytes

      public static int optimalNumOfBytes(long ndv, double fpp)
      Calculate optimal size according to the number of distinct values and false positive probability. See http://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives for the formula.
      Parameters:
      ndv - The number of distinct values.
      fpp - The false positive probability.
      Returns:
      optimal number of bytes of given ndv and fpp.
    • getContent

      public DrillBuf getContent()