SpringBoot 中使用布隆過濾器 Guava、Redission實(shí)現(xiàn)



前言



今天說到的實(shí)現(xiàn)方式有以下幾種:

引入 Guava 實(shí)現(xiàn)
引入 hutool 實(shí)現(xiàn)
引入 Redission 實(shí)現(xiàn)
Guava 布隆過濾器結(jié)合 Redis (重點(diǎn))
項(xiàng)目工程的搭建,就在這里先寫明啦~

boot項(xiàng)目就是四步走~ 導(dǎo)包->寫配置->編寫配置類->使用

補(bǔ)充說明:我使用的 redis 是用docker下載的一個(gè)集成redis和布隆過濾器的鏡像。安裝方式:Docker安裝Redis布隆過濾器

如果你是在windows上安裝的redis 是3.0版本的,是無法集成布隆過濾器。

如果是在liunx版本上的redis,需要再額外下載一個(gè)布隆過濾器的模塊。需要自行百度啦~

我將要用到的所有jar都放在這里啦~

 <parent>     <artifactId>spring-boot-dependencies</artifactId>     <groupId>org.springframework.boot</groupId>     <version>2.5.2</version> </parent> <dependencies>     <dependency>         <groupId>org.springframework.boot</groupId>         <artifactId>spring-boot-starter</artifactId>     </dependency>     <dependency>         <groupId>org.springframework.boot</groupId>         <artifactId>spring-boot-starter-web</artifactId>     </dependency>     <dependency>         <groupId>org.springframework.boot</groupId>         <artifactId>spring-boot-starter-data-redis</artifactId>     </dependency>     <!-- https://mvnrepository.com/artifact/org.redisson/redisson-spring-boot-starter -->     <dependency>         <groupId>org.redisson</groupId>         <artifactId>redisson-spring-boot-starter</artifactId>         <version>3.17.6</version>     </dependency>
     <dependency>         <groupId>com.google.guava</groupId>         <artifactId>guava</artifactId>         <version>30.0-jre</version>     </dependency>     <dependency>         <groupId>org.springframework.boot</groupId>         <artifactId>spring-boot-starter-test</artifactId>     </dependency>     <dependency>         <groupId>junit</groupId>         <artifactId>junit</artifactId>         <scope>test</scope>     </dependency>     <dependency>         <groupId>org.projectlombok</groupId>         <artifactId>lombok</artifactId>     </dependency>     <dependency>         <groupId>cn.hutool</groupId>         <artifactId>hutool-all</artifactId>         <version>5.7.22</version>     </dependency> </dependencies> 復(fù)制代碼
yml 配置文件:

 server:   port: 8081 spring:   redis:     port: 6379     host: 192.xxx復(fù)制代碼


一、Guava 實(shí)現(xiàn)布隆過濾器



這個(gè)方式非??旖荩?br>
直接用一個(gè)Demo來說明吧

     @Test
     public void test2() {
         // 預(yù)期插入數(shù)量
         long capacity = 10000L;         // 錯(cuò)誤比率
         double errorRate = 0.01;         //創(chuàng)建BloomFilter對(duì)象,需要傳入Funnel對(duì)象,預(yù)估的元素個(gè)數(shù),錯(cuò)誤率
         BloomFilter<Long> filter = BloomFilter.create(Funnels.longFunnel(), capacity, errorRate); //        BloomFilter<String> filter = BloomFilter.create(Funnels.stringFunnel(Charset.forName("utf-8")), 10000, 0.0001);         //put值進(jìn)去
         for (long i = 0; i < capacity; i++) {             filter.put(i);         }
         // 統(tǒng)計(jì)誤判次數(shù)
         int count = 0;         // 我在數(shù)據(jù)范圍之外的數(shù)據(jù),測(cè)試相同量的數(shù)據(jù),判斷錯(cuò)誤率是不是符合我們當(dāng)時(shí)設(shè)定的錯(cuò)誤率
         for (long i = capacity; i < capacity * 2; i++) {             if (filter.mightContain(i)) {
                 count++;             }
         }
         System.out.println(count);     }
 復(fù)制代碼
當(dāng)容量為1k,誤判率為 0.01時(shí)

 2022-08-26 23:50:01.028  INFO 14748 --- [           main] com.nzc.test.RedisBloomFilterTest        : 存入元素為==1000
 誤判個(gè)數(shù)為==>10復(fù)制代碼
當(dāng)容量為1w,誤判率為 0.01時(shí)

 2022-08-26 23:49:23.618  INFO 21796 --- [           main] com.nzc.test.RedisBloomFilterTest        : 存入元素為==10000
 誤判個(gè)數(shù)為==>87
 復(fù)制代碼
當(dāng)容量為100w,誤判率為 0.01時(shí)

 2022-08-26 23:50:45.167  INFO 8964 --- [           main] com.nzc.test.RedisBloomFilterTest        : 存入元素為==1000000 誤判個(gè)數(shù)為==>9946復(fù)制代碼
BloomFilter<Long> filter = BloomFilter.create(Funnels.longFunnel(), capacity, errorRate);

create方法實(shí)際上調(diào)用的方法是:

 public static <T> BloomFilter<T> create(
     Funnel<? super T> funnel, int expectedInsertions, double fpp) {
   return create(funnel, (long) expectedInsertions, fpp);
 }復(fù)制代碼
funnel 用來對(duì)參數(shù)做轉(zhuǎn)化,方便生成hash值
expectedInsertions 預(yù)期插入的數(shù)據(jù)量大小
fpp 誤判率
里面具體的實(shí)現(xiàn),相對(duì)我來說,數(shù)學(xué)能力有限,沒法說清楚。希望大家多多包含。






二、Hutool 布隆過濾器





Hutool 工具中的布隆過濾器,內(nèi)存占用太高了,并且功能相比于guava也弱了很多,個(gè)人不建議使用。

 @Test
 public void test4(){
     int capacity = 100;     // 錯(cuò)誤比率
     double errorRate = 0.01;     // 初始化
     BitMapBloomFilter filter = new BitMapBloomFilter(capacity);     for (int i = 0; i < capacity; i++) {         filter.add(String.valueOf(i));     }
 
     log.info("存入元素為=={}",capacity);     // 統(tǒng)計(jì)誤判次數(shù)
     int count = 0;     // 我在數(shù)據(jù)范圍之外的數(shù)據(jù),測(cè)試相同量的數(shù)據(jù),判斷錯(cuò)誤率是不是符合我們當(dāng)時(shí)設(shè)定的錯(cuò)誤率
     for (int i = capacity; i < capacity * 2; i++) {         if (filter.contains(String.valueOf(i))) {
             count++;         }
     }
     log.info("誤判元素為==={}",count); }


三、Redission 布隆過濾器



redission的使用其實(shí)也很簡(jiǎn)單,官方也有非常好的教程。

引入jar,然后編寫一個(gè)config類即可

 <dependency>
      <groupId>org.springframework.boot</groupId>
           <artifactId>spring-boot-starter-data-redis</artifactId>
            </dependency>
             <!-- https://mvnrepository.com/artifact/org.redisson/redisson-spring-boot-starter -->
<dependency>     
        <groupId>org.redisson</groupId>
              <artifactId>redisson-spring-boot-starter</artifactId>
                 <version>3.17.6</version>
</dependency>
出了注入 redissionclient,還注入了一些redis相關(guān)的東西,都是歷史包裹~

 /**
  * @description:
  * @date: 2022年08月26日 22:06
  */ @Configuration @EnableCaching public class RedisConfig {
 
     @Bean     public RedissonClient redissonClient(){
         Config config = new Config();
         config.useSingleServer().setAddress("redis://47.113.227.254:6379");
         RedissonClient redissonClient = Redisson.create(config);
         return  redissonClient;
     }
 
     @Bean     public CacheManager cacheManager(RedisConnectionFactory connectionFactory) {
         RedisCacheManager rcm=RedisCacheManager.create(connectionFactory);
         return rcm;
     }
     @Bean     public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) {
         RedisTemplate<String, Object> redisTemplate = new RedisTemplate<String, Object>();
         redisTemplate.setConnectionFactory(factory);
 
         Jackson2JsonRedisSerializer jackson2JsonRedisSerializer = new                 Jackson2JsonRedisSerializer(Object.class);
         ObjectMapper om = new ObjectMapper();
         om.setVisibility(PropertyAccessor.ALL, JsonAutoDetect.Visibility.ANY);
         om.enableDefaultTyping(ObjectMapper.DefaultTyping.NON_FINAL);
         jackson2JsonRedisSerializer.setObjectMapper(om);
         //序列化設(shè)置 ,這樣計(jì)算是正常顯示的數(shù)據(jù),也能正常存儲(chǔ)和獲取         redisTemplate.setKeySerializer(jackson2JsonRedisSerializer);
         redisTemplate.setValueSerializer(jackson2JsonRedisSerializer);
         redisTemplate.setHashKeySerializer(jackson2JsonRedisSerializer);
         redisTemplate.setHashValueSerializer(jackson2JsonRedisSerializer);
 
         return redisTemplate;
     }
     @Bean     public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) {
         StringRedisTemplate stringRedisTemplate = new StringRedisTemplate();
         stringRedisTemplate.setConnectionFactory(factory);
         return stringRedisTemplate;
     }
 }
我們?cè)谥虚g再編寫一個(gè)Service,






 @Service public class BloomFilterService {
 
     @Autowired     private RedissonClient redissonClient;
 
     /**
      * 創(chuàng)建布隆過濾器
      * @param filterName 布隆過濾器名稱
      * @param capacity 預(yù)測(cè)插入數(shù)量
      * @param errorRate 誤判率
      * @param <T>
      * @return      */     public <T> RBloomFilter<T> create(String filterName, long capacity, double errorRate) {
         RBloomFilter<T> bloomFilter = redissonClient.getBloomFilter(filterName);
         bloomFilter.tryInit(capacity, errorRate);
         return bloomFilter;
     }
 }
測(cè)試:

 package com.nzc.test;
 import com.nzc.WebApplication;
 import com.nzc.service.BloomFilterService;
 import lombok.extern.slf4j.Slf4j;
 import org.junit.Test;
 import org.junit.runner.RunWith;
 import org.redisson.api.RBloomFilter;
 import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.boot.test.context.SpringBootTest;
 import org.springframework.test.context.junit4.SpringRunner;
 @Slf4j
 @RunWith(SpringRunner.class)
 @SpringBootTest(classes = WebApplication.class)
 public class BloomFilterTest {
 
     @Autowired
     private BloomFilterService bloomFilterService;
     @Test
     public void testBloomFilter() {
         // 預(yù)期插入數(shù)量
         long expectedInsertions = 1000L;         // 錯(cuò)誤比率
         double falseProbability = 0.01;         RBloomFilter<Long> bloomFilter = bloomFilterService.create("NZC:BOOM-FILTER", expectedInsertions, falseProbability);         // 布隆過濾器增加元素
         for (long i = 0; i < expectedInsertions; i++) {             bloomFilter.add(i);         }
         long elementCount = bloomFilter.count();         log.info("布隆過濾器中含有元素個(gè)數(shù) = {}.", elementCount);
         // 統(tǒng)計(jì)誤判次數(shù)
         int count = 0;         // 我在數(shù)據(jù)范圍之外的數(shù)據(jù),測(cè)試相同量的數(shù)據(jù),判斷錯(cuò)誤率是不是符合我們當(dāng)時(shí)設(shè)定的錯(cuò)誤率
         for (long i = expectedInsertions; i < expectedInsertions * 2; i++) {             if (bloomFilter.contains(i)) {
                 count++;             }
         }
         log.info("誤判次數(shù) = {}.", count);
         // 清空布隆過濾器 內(nèi)部實(shí)現(xiàn)是個(gè)異步線程在執(zhí)行  我只是為了方便測(cè)試
         bloomFilter.delete();     }
 }
當(dāng)容量為1k,誤判率為0.01時(shí)的輸出情況

 2022-08-26 23:37:04.903  INFO 1472 --- [           main] com.nzc.test.BloomFilterTest             : 布隆過濾器中含有元素個(gè)數(shù) = 993. 2022-08-26 23:37:38.549  INFO 1472 --- [           main] com.nzc.test.BloomFilterTest             : 誤判次數(shù) = 36.
當(dāng)容量為1w,誤判率為0.01時(shí)的輸出情況

 2022-08-26 23:50:54.478  INFO 17088 --- [           main] com.nzc.test.BloomFilterTest             : 布隆過濾器中含有元素個(gè)數(shù) = 9895. 2022-08-26 23:56:56.171  INFO 17088 --- [           main] com.nzc.test.BloomFilterTest             : 誤判次數(shù) = 259.


四、小結(jié)



我實(shí)際測(cè)試的時(shí)候,Guava 的效果應(yīng)該是最好的,Redission 雖然是直接集成了Redis,但實(shí)際效果比起Guava較差一些,我這里沒有貼上時(shí)間,Redission所創(chuàng)建出來的布隆過濾器,速度較慢。

當(dāng)然我的測(cè)試范圍是有限的,并且只是循環(huán)測(cè)試,另外服務(wù)器也并非在本地,這都有影響。

但是僅目前看來是這樣的。

還有就是將 Guava 結(jié)合 Redis 一起使用。




五、Guava 布隆過濾器結(jié)合 Redis 使用



僅限于測(cè)試,一切效果還是需看實(shí)測(cè)。

我是以 Guava 中創(chuàng)建 布隆過濾器為基礎(chǔ),利用它構(gòu)造的方法,來進(jìn)行修改,功能相比于 guava 還是少了很多的。

 package com.nzc.boom;  
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Preconditions;
 import com.google.common.hash.Funnel;
 import com.google.common.hash.Hashing;
 import com.google.common.primitives.Longs;
 public class BloomFilterHelper<T> {
 
     private int numHashFunctions;  
     private int bitSize;  
     private Funnel<T> funnel;  
     public BloomFilterHelper(Funnel<T> funnel, int expectedInsertions, double fpp) {
         Preconditions.checkArgument(funnel != null, "funnel不能為空");         this.funnel = funnel;         // 計(jì)算bit數(shù)組長(zhǎng)度
         bitSize = optimalNumOfBits(expectedInsertions, fpp);         // 計(jì)算hash方法執(zhí)行次數(shù)
         numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, bitSize);     }
 
 
     /** 源碼
      *public <T> boolean mightContain(
      *         T object, Funnel<? super T> funnel, int numHashFunctions, LockFreeBitArray bits) {
      *       long bitSize = bits.bitSize();      *       byte[] bytes = Hashing.murmur3_128().hashObject(object, funnel).getBytesInternal();      *       long hash1 = lowerEight(bytes);      *       long hash2 = upperEight(bytes);      *
      *       long combinedHash = hash1;      *       for (int i = 0; i < numHashFunctions; i++) {      *         // Make the combined hash positive and indexable
      *         if (!bits.get((combinedHash & Long.MAX_VALUE) % bitSize)) {
      *           return false;      *         }
      *         combinedHash += hash2;      *       }
      *       return true;      *     }
      * @param value
      * @return
      */
     public long[] murmurHashOffset(T value) {
         long[] offset = new long[numHashFunctions];         byte[] bytes = Hashing.murmur3_128().hashObject(value, funnel).asBytes();         long hash1 = lowerEight(bytes);         long hash2 = upperEight(bytes);         long combinedHash = hash1;         for (int i = 1; i <= numHashFunctions; i++) {             long nextHash = hash1 + i * hash2;             if (nextHash < 0) {
                 nextHash = ~nextHash;             }
             offset[i - 1] = nextHash % bitSize;         }
         return offset;
 
     }
     private /* static */ long lowerEight(byte[] bytes) {
         return Longs.fromBytes(
                 bytes[7], bytes[6], bytes[5], bytes[4], bytes[3], bytes[2], bytes[1], bytes[0]);     }
 
     private /* static */ long upperEight(byte[] bytes) {
         return Longs.fromBytes(
                 bytes[15], bytes[14], bytes[13], bytes[12], bytes[11], bytes[10], bytes[9], bytes[8]);     }
     /**
      * 計(jì)算bit數(shù)組長(zhǎng)度
      * 同樣是guava創(chuàng)建布隆過濾器中的計(jì)算bit數(shù)組長(zhǎng)度方法
      */
     private int optimalNumOfBits(long n, double p) {
         if (p == 0) {
             // 設(shè)定最小期望長(zhǎng)度
             p = Double.MIN_VALUE;         }
         return (int) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));     }
 
     /**
      * 這里是從guava 中 copy 出來的
      * 就是guava 創(chuàng)建一個(gè) 布隆過濾器時(shí),
      * 計(jì)算hash方法執(zhí)行次數(shù)的方法
      */
     private int optimalNumOfHashFunctions(long n, long m) {
         int countOfHash = Math.max(1, (int) Math.round((double) m / n * Math.log(2)));         return countOfHash;     }
 
 }
以上的這些代碼,在guava包都可以找到的。

在redisConfig中注入布隆過濾器


 /**
  * @description:
  * @date: 2022年08月26日 22:06
  */ @Configuration @EnableCaching public class RedisConfig {
 
     @Bean     public CacheManager cacheManager(RedisConnectionFactory connectionFactory) {
         RedisCacheManager rcm=RedisCacheManager.create(connectionFactory);
         return rcm;
     }
     @Bean     public RedisTemplate<String, Object> redisTemplate(RedisConnectionFactory factory) {
         RedisTemplate<String, Object> redisTemplate = new RedisTemplate<String, Object>();
         redisTemplate.setConnectionFactory(factory);
 
         Jackson2JsonRedisSerializer jackson2JsonRedisSerializer = new                 Jackson2JsonRedisSerializer(Object.class);
         ObjectMapper om = new ObjectMapper();
         om.setVisibility(PropertyAccessor.ALL, JsonAutoDetect.Visibility.ANY);
         om.enableDefaultTyping(ObjectMapper.DefaultTyping.NON_FINAL);
         jackson2JsonRedisSerializer.setObjectMapper(om);
         //序列化設(shè)置 ,這樣計(jì)算是正常顯示的數(shù)據(jù),也能正常存儲(chǔ)和獲取         redisTemplate.setKeySerializer(jackson2JsonRedisSerializer);
         redisTemplate.setValueSerializer(jackson2JsonRedisSerializer);
         redisTemplate.setHashKeySerializer(jackson2JsonRedisSerializer);
         redisTemplate.setHashValueSerializer(jackson2JsonRedisSerializer);
 
         return redisTemplate;
     }
     @Bean     public StringRedisTemplate stringRedisTemplate(RedisConnectionFactory factory) {
         StringRedisTemplate stringRedisTemplate = new StringRedisTemplate();
         stringRedisTemplate.setConnectionFactory(factory);
         return stringRedisTemplate;
     }
 
     
     //初始化布隆過濾器,放入到spring容器里面     @Bean     public BloomFilterHelper<String> initBloomFilterHelper() {
         return new BloomFilterHelper<String>((Funnel<String>) (from, into) -> into.putString(from, Charsets.UTF_8).putString(from, Charsets.UTF_8), 1000, 0.01);
     }
 
     @Bean     public BloomFilterHelper<Long> initLongBloomFilterHelper() {
         return new BloomFilterHelper<Long>((Funnel<Long>) (from, into) -> into.putLong(from),1000, 0.01);
     }
 
 
 }
也就是注入我們剛剛編寫的那個(gè)布隆過濾器。

然后再編寫一個(gè)Service 層

 
 /**
  * @description:
  */ @Slf4j @Service public class RedisBloomFilter {
 
     @Autowired     private RedisTemplate redisTemplate;
 
     /**
      * 根據(jù)給定的布隆過濾器添加值
      */     public <T> void addByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) {
         Preconditions.checkArgument(bloomFilterHelper != null, "bloomFilterHelper不能為空");
         long[] offset = bloomFilterHelper.murmurHashOffset(value);
         for (long i : offset) {
             log.info("key :{} ,value : {}", key,  i);
             redisTemplate.opsForValue().setBit(key, i, true);
         }
     }
 
     /**
      * 根據(jù)給定的布隆過濾器判斷值是否存在
      */     public <T> boolean includeByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) {
         Preconditions.checkArgument(bloomFilterHelper != null, "bloomFilterHelper不能為空");
         long[] offset = bloomFilterHelper.murmurHashOffset(value);
         for (long i : offset) {
             log.info("key :{} ,value : {}", key,  i);
             if (!redisTemplate.opsForValue().getBit(key, i)) {
                 return false;
             }
         }
         return true;
     }
 }
測(cè)試:

  @Test
     public void test1() {
         // 預(yù)期插入數(shù)量
         long capacity = 1000L;         // 錯(cuò)誤比率
         double errorRate = 0.01;         for (long i = 0; i < capacity; i++) {             redisBloomFilter.addByBloomFilter(bloomFilterHelper, "nzc:bloomFilter1", i);         }
         log.info("存入元素為=={}", capacity);         // 統(tǒng)計(jì)誤判次數(shù)
         int count = 0;         // 我在數(shù)據(jù)范圍之外的數(shù)據(jù),測(cè)試相同量的數(shù)據(jù),判斷錯(cuò)誤率是不是符合我們當(dāng)時(shí)設(shè)定的錯(cuò)誤率
         for (long i = capacity; i < capacity * 2; i++) {             if (redisBloomFilter.includeByBloomFilter(bloomFilterHelper, "nzc:bloomFilter1", i)) {
                 count++;             }
         }
         System.out.println("誤判個(gè)數(shù)為==>" + count);     }
 
輸出:

 存入元素為==1000 誤判個(gè)數(shù)為==>12



作者:碼出宇宙


歡迎關(guān)注微信公眾號(hào) :碼出宇宙

掃描添加好友邀你進(jìn)技術(shù)交流群,加我時(shí)注明【姓名+公司(學(xué)校)+職位】