代码随记 SpringBoot Elasticsearch

eezd 收录于类别 Java 和系列代码随记

2023-11-25 2023-11-25 约 1126 字预计阅读 5 分钟

系列 - 代码随记

1 安装

  es:
    image: elasticsearch:7.12.1
    container_name: es
    environment:
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
      - discovery.type=single-node
    volumes:
      - /docker/es/data:/usr/share/elasticsearch/data
      - /docker/es/plugins:/usr/share/elasticsearch/plugins
    ports:
      - "9200:9200"
      - "9300:9300"
    restart: unless-stopped

  kibana:
    image: kibana:7.12.1
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=http://es:9200
    ports:
      - "5601:5601"
    depends_on:
      - es
    restart: unless-stopped

2 认识

正向索引是最传统的，根据id索引的方式。

他会以关键字为索引，做一张表，这就是倒排索引表。 elasticsearch是面向文档（Document）存储的，可以是数据库中的一条商品数据，一个订单信息。文档数据会被序列化为json格式后存储在elasticsearch中：

id（索引）	title	price
1	小米手机	3499
2	华为手机	4999
3	华为小米充电器	49
4	小米手环	49
…	…	…

词条（索引）	文档id
小米	1，3，4
手机	1，2
华为	2，3
充电器	3
手环	4

例如搜索华为手机，得到分词 华为 和 手机，根据词条对应的文档ID，去查询ID对应的商品信息并返回结果。

正向索引：

优点：
- 可以给多个字段创建索引
- 根据索引字段搜索、排序速度非常快
缺点：
- 根据非索引字段，或者索引字段中的部分词条查找时，只能全表扫描。 倒排索引：
优点：
- 根据词条搜索、模糊搜索时，速度非常快
缺点：
- 只能给词条创建索引，而不是字段
- 无法根据字段做排序

3 入门

3.1 分词

docker exec -it es ./bin/elasticsearch-plugin install https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-7.12.1.zip

默认，标准分词器智能1字1词条，无法正确对中文做分词

POST /_analyze
{
  "analyzer": "standard",
  "text": "学习java太棒了"
}

{
  "tokens" : [
    {
      "token" : "学",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "习",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "java",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "太",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "棒",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "了",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    }
  ]
}

IK分词器
ik_smart：智能语义切分
ik_max_word：最细粒度切分

POST /_analyze
{
  "analyzer": "ik_smart",
  "text": "学习java太棒了"
}

{
  "tokens" : [
    {
      "token" : "学习",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "java",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "ENGLISH",
      "position" : 1
    },
    {
      "token" : "太棒了",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

3.2 词典

有些词他默认是没有的，需要我们自己额外添加

es\plugins\analysis-ik\config\IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">ext.dic</entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords"></entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

创建文件 ext.dic 在里面添加就可以了。

有个坑

如果直接在 plugins/analysis-ik 里面创建config是不生效的，需要

先将CONFIG复制出来 docker cp es:/usr/share/elasticsearch/config/analysis-ik ./es/ik-config，然后在docker里面重新映射 - ./es/ik-config:/usr/share/elasticsearch/config/analysis-ik 就可以了

4 操作

索引
- 是指分类的意思，例如存储用户的数据库、存储商品的数据库。
文档
- 一条数据，例如一条用户数据、一条商品数据

4.1 索引操作

Mapping属性包括：

type：字段数据类型，常见的简单类型有：
- 字符串：text（可分词的文本）、keyword（精确值，例如：品牌、国家、ip地址）
- 数值：long、integer、short、byte、double、float、
- 布尔：boolean
- 日期：date
- 对象：object
index：是否创建索引，默认为true
analyzer：使用哪种分词器
properties：该字段的子字段
创建索引库
- info：使用IK分词
- email：他是精确值因此使用 keyword
- name：由 lastName 和 firstName 组成因此使用 properties

PUT /heima
{
  "mappings": {
    "properties": {
      "info":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "email":{
        "type": "keyword",
        "index": "false"
      },
      "name":{
        "type": "object", 
        "properties": {
          "firstName": {
            "type": "keyword"
          },
          "lastName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

查询

GET /heima

删除

DELETE /heima

修改
- 倒排索引结构虽然不复杂，但是一旦数据结构改变（比如改变了分词器），就需要重新创建倒排索引，这简直是灾难。因此索引库一旦创建，无法修改mapping。
- 因此可以添加新索引

PUT /heima/_mapping
{
  "properties": {
    "age":{
      "type": "byte"
    }
  }
}

4.2 文档操作

POST /索引库名/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    "字段3": {
        "子属性1": "值3",
        "子属性2": "值4"
    },
}

demo

POST /heima/_doc/1
{
    "info": "程序员ES学习",
    "email": "[email protected]",
    "name": {
        "firstName": "云",
        "lastName": "赵"
    }
}

查询

GET /heima/_doc/1

删除

DELETE /heima/_doc/1

全量修改，删除原本的，新建新的

PUT /heima/_doc/1
剩下代码就不给了，和上面添加新的一样

增量修改
- POST /库名/_update/文档id：重点是中间变成了 _update

POST /heima/_update/1
{
  "doc": {
    "email": "[email protected]"
  }
}

批量处理
index代表新增操作
- _index：指定索引库名
- _id指定要操作的文档id
- { "field1" : "value1" }：则是要新增的文档内容
delete代表删除操作
- _index：指定索引库名
- _id指定要操作的文档id
update代表更新操作
- _index：指定索引库名
- _id指定要操作的文档id
- { "doc" : {"field2" : "value2"} }：要更新的文档字段

POST _bulk
{ "index" : { "_index" : "索引库名", "_id" : "1" } }
{ "字段1" : "值1" }

{ "delete" : { "_index" : "索引库名", "_id" : "2" } }

{ "create" : { "_index" : "索引库名", "_id" : "3" } }
{ "字段1" : "值1" }

{ "update" : {"_index" : "索引库名", "_id" : "1" } }
{ "字段1" : "值1" }

5 RestAPI

1）在item-service模块中引入es的RestHighLevelClient依赖：

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>

2）因为SpringBoot默认的ES版本是7.17.10，所以我们需要覆盖默认的ES版本：

  <properties>
      <maven.compiler.source>11</maven.compiler.source>
      <maven.compiler.target>11</maven.compiler.target>
      <elasticsearch.version>7.12.1</elasticsearch.version>
  </properties>

3）初始化RestHighLevelClient：初始化的代码如下：

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
        HttpHost.create("http://192.168.150.101:9200")
));

5.1 索引库

结合数据库表结构，以上字段对应的mapping映射属性如下：


字段名	字段类型	类型说明	是否参与搜索	是否参与分词	分词器
id	`long`	长整数			——
name	`text`	字符串，参与分词搜索			IK
price	`integer`	以分为单位，所以是整数			——
stock	`integer`	字符串，但需要分词			——
image	`keyword`	字符串，但是不分词			——
category	`keyword`	字符串，但是不分词			——
brand	`keyword`	字符串，但是不分词			——
sold	`integer`	销量，整数			——
commentCount	`integer`	评价，整数			——
isAD	`boolean`	布尔类型			——
updateTime	`Date`	更新时间			——

因此，最终我们的索引库文档结构应该是这样：

PUT /items
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name":{
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "price":{
        "type": "integer"
      },
      "stock":{
        "type": "integer"
      },
      "image":{
        "type": "keyword",
        "index": false
      },
      "category":{
        "type": "keyword"
      },
      "brand":{
        "type": "keyword"
      },
      "sold":{
        "type": "integer"
      },
      "commentCount":{
        "type": "integer",
        "index": false
      },
      "isAD":{
        "type": "boolean"
      },
      "updateTime":{
        "type": "date"
      }
    }
  }
}

代码

package es;  
  
import org.apache.http.HttpHost;  
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;  
import org.elasticsearch.client.RequestOptions;  
import org.elasticsearch.client.RestClient;  
import org.elasticsearch.client.RestHighLevelClient;  
import org.elasticsearch.client.indices.CreateIndexRequest;  
import org.elasticsearch.client.indices.GetIndexRequest;  
import org.elasticsearch.common.xcontent.XContentType;  
import org.junit.jupiter.api.AfterEach;  
import org.junit.jupiter.api.BeforeEach;  
import org.junit.jupiter.api.Test;  
  
public class ElasticTest {  
  
    private RestHighLevelClient client;  
  
    @Test  
    public void test() throws Exception {  
        // TODO: write test cases  
        System.out.println("Hello World!");  
    }  
  
    @Test  
    void testCreateIndex() throws Exception {  
        // 准备请求对象  
        CreateIndexRequest request = new CreateIndexRequest("items");  
        // 准备请求参数  
        request.source(MAPPING_TEMPLATE, XContentType.JSON);  
        // 发送  
        client.indices().create(request, RequestOptions.DEFAULT);  
    }  
  
    @Test  
    void testGetIndex() throws Exception {  
        GetIndexRequest request = new GetIndexRequest("items");  
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);  
        System.out.println(exists);  
    }  
  
    @Test  
    void testDeleteIndex() throws Exception {  
        DeleteIndexRequest request = new DeleteIndexRequest("items");  
        client.indices().delete(request, RequestOptions.DEFAULT);  
    }  
  
    @BeforeEach  
    public void setUp() {  
        client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://127.0.0.1:9200")));  
    }  
  
    @AfterEach  
    public void tearDown() throws Exception {  
        if (client != null) {  
            client.close();  
        }  
    }  
  
    private static final String MAPPING_TEMPLATE = "{\n" +  
            "  \"mappings\": {\n" +  
            "    \"properties\": {\n" +  
            "      \"id\": {\n" +  
            "        \"type\": \"keyword\"\n" +  
            "      },\n" +  
            "      \"name\":{\n" +  
            "        \"type\": \"text\",\n" +  
            "        \"analyzer\": \"ik_max_word\"\n" +  
            "      },\n" +  
            "      \"price\":{\n" +  
            "        \"type\": \"integer\"\n" +  
            "      },\n" +  
            "      \"stock\":{\n" +  
            "        \"type\": \"integer\"\n" +  
            "      },\n" +  
            "      \"image\":{\n" +  
            "        \"type\": \"keyword\",\n" +  
            "        \"index\": false\n" +  
            "      },\n" +  
            "      \"category\":{\n" +  
            "        \"type\": \"keyword\"\n" +  
            "      },\n" +  
            "      \"brand\":{\n" +  
            "        \"type\": \"keyword\"\n" +  
            "      },\n" +  
            "      \"sold\":{\n" +  
            "        \"type\": \"integer\"\n" +  
            "      },\n" +  
            "      \"commentCount\":{\n" +  
            "        \"type\": \"integer\",\n" +  
            "        \"index\": false\n" +  
            "      },\n" +  
            "      \"isAD\":{\n" +  
            "        \"type\": \"boolean\"\n" +  
            "      },\n" +  
            "      \"updateTime\":{\n" +  
            "        \"type\": \"date\"\n" +  
            "      }\n" +  
            "    }\n" +  
            "  }\n" +  
            "}";  
}

5.2 文档

批量操作
- IndexRequest，也就是新增
- UpdateRequest，也就是修改
- DeleteRequest，也就是删除

package es;  
  
import cn.hutool.core.bean.BeanUtil;  
import cn.hutool.json.JSONUtil;  
import com.hmall.item.ItemApplication;  
import com.hmall.item.domain.po.Item;  
import com.hmall.item.domain.po.ItemDoc;  
import com.hmall.item.service.IItemService;  
import org.apache.http.HttpHost;  
import org.elasticsearch.action.index.IndexRequest;  
import org.elasticsearch.client.RequestOptions;  
import org.elasticsearch.client.RestClient;  
import org.elasticsearch.client.RestHighLevelClient;  
import org.elasticsearch.common.xcontent.XContentType;  
import org.junit.jupiter.api.AfterEach;  
import org.junit.jupiter.api.BeforeEach;  
import org.junit.jupiter.api.Test;  
import org.springframework.beans.factory.annotation.Autowired;  
import org.springframework.boot.test.context.SpringBootTest;  
  
@SpringBootTest(properties = "spring.profiles.active=local", classes = ItemApplication.class)  
public class ElasticDocmentTest {  
    private RestHighLevelClient client;  
  
    @Autowired  
    private IItemService itemService;  
  
    @Test  
    public void testIndexDocument() throws Exception {  
        // 1.根据id查询商品数据  
        Item item = itemService.getById(100002644680L);  
        // 2.转换为文档类型  
        ItemDoc itemDoc = BeanUtil.copyProperties(item, ItemDoc.class);  
        // 3.将ItemDTO转json  
        String doc = JSONUtil.toJsonStr(itemDoc);  
  
        // 1.准备Request对象  
        IndexRequest request = new IndexRequest("items").id(itemDoc.getId());  
        // 2.准备Json文档  
        request.source(doc, XContentType.JSON);  
        // 3.发送请求  
        client.index(request, RequestOptions.DEFAULT);  
    }  
  
    @BeforeEach  
    public void setUp() {  
        client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://127.0.0.1:9200")));  
    }  
  
    @AfterEach  
    public void tearDown() throws Exception {  
        if (client != null) {  
            client.close();  
        }  
    }  
}

目录

目录