目录

代码随记 SpringBoot Elasticsearch

  es:
    image: elasticsearch:7.12.1
    container_name: es
    environment:
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
      - discovery.type=single-node
    volumes:
      - /docker/es/data:/usr/share/elasticsearch/data
      - /docker/es/plugins:/usr/share/elasticsearch/plugins
    ports:
      - "9200:9200"
      - "9300:9300"
    restart: unless-stopped

  kibana:
    image: kibana:7.12.1
    container_name: kibana
    environment:
      - ELASTICSEARCH_HOSTS=http://es:9200
    ports:
      - "5601:5601"
    depends_on:
      - es
    restart: unless-stopped

正向索引是最传统的,根据id索引的方式。

他会以关键字为索引,做一张表,这就是倒排索引表。 elasticsearch是面向 文档(Document) 存储的,可以是数据库中的一条商品数据,一个订单信息。文档数据会被序列化为json格式后存储在elasticsearch中:

id(索引)titleprice
1小米手机3499
2华为手机4999
3华为小米充电器49
4小米手环49
词条(索引)文档id
小米1,3,4
手机1,2
华为2,3
充电器3
手环4

例如搜索华为手机,得到分词 华为手机,根据词条对应的文档ID,去查询ID对应的商品信息并返回结果。

正向索引

  • 优点:
    • 可以给多个字段创建索引
    • 根据索引字段搜索、排序速度非常快
  • 缺点:
    • 根据非索引字段,或者索引字段中的部分词条查找时,只能全表扫描。 倒排索引
  • 优点:
    • 根据词条搜索、模糊搜索时,速度非常快
  • 缺点:
    • 只能给词条创建索引,而不是字段
    • 无法根据字段做排序
docker exec -it es ./bin/elasticsearch-plugin install https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-7.12.1.zip
  • 默认,标准分词器智能1字1词条,无法正确对中文做分词
POST /_analyze
{
  "analyzer": "standard",
  "text": "学习java太棒了"
}
{
  "tokens" : [
    {
      "token" : "学",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "习",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "java",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "太",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "棒",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "了",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    }
  ]
}
  • IK分词器
  • ik_smart:智能语义切分
  • ik_max_word:最细粒度切分
POST /_analyze
{
  "analyzer": "ik_smart",
  "text": "学习java太棒了"
}
{
  "tokens" : [
    {
      "token" : "学习",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "java",
      "start_offset" : 2,
      "end_offset" : 6,
      "type" : "ENGLISH",
      "position" : 1
    },
    {
      "token" : "太棒了",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

有些词他默认是没有的,需要我们自己额外添加

es\plugins\analysis-ik\config\IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">ext.dic</entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords"></entry>
	<!--用户可以在这里配置远程扩展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用户可以在这里配置远程扩展停止词字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

创建文件 ext.dic 在里面添加就可以了。

有个坑

如果直接在 plugins/analysis-ik 里面创建config是不生效的,需要

先将CONFIG复制出来 docker cp es:/usr/share/elasticsearch/config/analysis-ik ./es/ik-config,然后在docker里面重新映射 - ./es/ik-config:/usr/share/elasticsearch/config/analysis-ik 就可以了

  • 索引
    • 是指分类的意思,例如存储用户的数据库、存储商品的数据库。
  • 文档
    • 一条数据,例如一条用户数据、一条商品数据

Mapping属性包括:

  • type:字段数据类型,常见的简单类型有:

    • 字符串:text(可分词的文本)、keyword(精确值,例如:品牌、国家、ip地址)
    • 数值:longintegershortbytedoublefloat
    • 布尔:boolean
    • 日期:date
    • 对象:object
  • index:是否创建索引,默认为true

  • analyzer:使用哪种分词器

  • properties:该字段的子字段

  • 创建索引库

    • info:使用IK分词
    • email:他是精确值因此使用 keyword
    • name:由 lastName 和 firstName 组成因此使用 properties
PUT /heima
{
  "mappings": {
    "properties": {
      "info":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "email":{
        "type": "keyword",
        "index": "false"
      },
      "name":{
        "type": "object", 
        "properties": {
          "firstName": {
            "type": "keyword"
          },
          "lastName": {
            "type": "keyword"
          }
        }
      }
    }
  }
}
  • 查询
GET /heima
  • 删除
DELETE /heima
  • 修改
    • 倒排索引结构虽然不复杂,但是一旦数据结构改变(比如改变了分词器),就需要重新创建倒排索引,这简直是灾难。因此索引库一旦创建,无法修改mapping
    • 因此可以添加新索引
PUT /heima/_mapping
{
  "properties": {
    "age":{
      "type": "byte"
    }
  }
}
POST /索引库名/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    "字段3": {
        "子属性1": "值3",
        "子属性2": "值4"
    },
}
  • demo
POST /heima/_doc/1
{
    "info": "程序员ES学习",
    "email": "[email protected]",
    "name": {
        "firstName": "云",
        "lastName": "赵"
    }
}
  • 查询
GET /heima/_doc/1
  • 删除
DELETE /heima/_doc/1
  • 全量修改,删除原本的,新建新的
PUT /heima/_doc/1
剩下代码就不给了,和上面添加新的一样
  • 增量修改
    • POST /库名/_update/文档id:重点是中间变成了 _update
POST /heima/_update/1
{
  "doc": {
    "email": "[email protected]"
  }
}
  • 批量处理
  • index代表新增操作
    • _index:指定索引库名
    • _id指定要操作的文档id
    • { "field1" : "value1" }:则是要新增的文档内容
  • delete代表删除操作
    • _index:指定索引库名
    • _id指定要操作的文档id
  • update代表更新操作
    • _index:指定索引库名
    • _id指定要操作的文档id
    • { "doc" : {"field2" : "value2"} }:要更新的文档字段
POST _bulk
{ "index" : { "_index" : "索引库名", "_id" : "1" } }
{ "字段1" : "值1" }

{ "delete" : { "_index" : "索引库名", "_id" : "2" } }

{ "create" : { "_index" : "索引库名", "_id" : "3" } }
{ "字段1" : "值1" }

{ "update" : {"_index" : "索引库名", "_id" : "1" } }
{ "字段1" : "值1" }

1)在item-service模块中引入esRestHighLevelClient依赖:

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>

2)因为SpringBoot默认的ES版本是7.17.10,所以我们需要覆盖默认的ES版本:

  <properties>
      <maven.compiler.source>11</maven.compiler.source>
      <maven.compiler.target>11</maven.compiler.target>
      <elasticsearch.version>7.12.1</elasticsearch.version>
  </properties>

3)初始化RestHighLevelClient: 初始化的代码如下:

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
        HttpHost.create("http://192.168.150.101:9200")
));

结合数据库表结构,以上字段对应的mapping映射属性如下:

字段名字段类型类型说明是否

参与搜索
是否

参与分词
分词器
idlong长整数——
nametext字符串,参与分词搜索IK
priceinteger以分为单位,所以是整数——
stockinteger字符串,但需要分词——
imagekeyword字符串,但是不分词——
categorykeyword字符串,但是不分词——
brandkeyword字符串,但是不分词——
soldinteger销量,整数——
commentCountinteger评价,整数——
isADboolean布尔类型——
updateTimeDate更新时间——

因此,最终我们的索引库文档结构应该是这样:

PUT /items
{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "name":{
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "price":{
        "type": "integer"
      },
      "stock":{
        "type": "integer"
      },
      "image":{
        "type": "keyword",
        "index": false
      },
      "category":{
        "type": "keyword"
      },
      "brand":{
        "type": "keyword"
      },
      "sold":{
        "type": "integer"
      },
      "commentCount":{
        "type": "integer",
        "index": false
      },
      "isAD":{
        "type": "boolean"
      },
      "updateTime":{
        "type": "date"
      }
    }
  }
}

代码

package es;  
  
import org.apache.http.HttpHost;  
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;  
import org.elasticsearch.client.RequestOptions;  
import org.elasticsearch.client.RestClient;  
import org.elasticsearch.client.RestHighLevelClient;  
import org.elasticsearch.client.indices.CreateIndexRequest;  
import org.elasticsearch.client.indices.GetIndexRequest;  
import org.elasticsearch.common.xcontent.XContentType;  
import org.junit.jupiter.api.AfterEach;  
import org.junit.jupiter.api.BeforeEach;  
import org.junit.jupiter.api.Test;  
  
public class ElasticTest {  
  
    private RestHighLevelClient client;  
  
    @Test  
    public void test() throws Exception {  
        // TODO: write test cases  
        System.out.println("Hello World!");  
    }  
  
    @Test  
    void testCreateIndex() throws Exception {  
        // 准备请求对象  
        CreateIndexRequest request = new CreateIndexRequest("items");  
        // 准备请求参数  
        request.source(MAPPING_TEMPLATE, XContentType.JSON);  
        // 发送  
        client.indices().create(request, RequestOptions.DEFAULT);  
    }  
  
    @Test  
    void testGetIndex() throws Exception {  
        GetIndexRequest request = new GetIndexRequest("items");  
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);  
        System.out.println(exists);  
    }  
  
    @Test  
    void testDeleteIndex() throws Exception {  
        DeleteIndexRequest request = new DeleteIndexRequest("items");  
        client.indices().delete(request, RequestOptions.DEFAULT);  
    }  
  
    @BeforeEach  
    public void setUp() {  
        client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://127.0.0.1:9200")));  
    }  
  
    @AfterEach  
    public void tearDown() throws Exception {  
        if (client != null) {  
            client.close();  
        }  
    }  
  
    private static final String MAPPING_TEMPLATE = "{\n" +  
            "  \"mappings\": {\n" +  
            "    \"properties\": {\n" +  
            "      \"id\": {\n" +  
            "        \"type\": \"keyword\"\n" +  
            "      },\n" +  
            "      \"name\":{\n" +  
            "        \"type\": \"text\",\n" +  
            "        \"analyzer\": \"ik_max_word\"\n" +  
            "      },\n" +  
            "      \"price\":{\n" +  
            "        \"type\": \"integer\"\n" +  
            "      },\n" +  
            "      \"stock\":{\n" +  
            "        \"type\": \"integer\"\n" +  
            "      },\n" +  
            "      \"image\":{\n" +  
            "        \"type\": \"keyword\",\n" +  
            "        \"index\": false\n" +  
            "      },\n" +  
            "      \"category\":{\n" +  
            "        \"type\": \"keyword\"\n" +  
            "      },\n" +  
            "      \"brand\":{\n" +  
            "        \"type\": \"keyword\"\n" +  
            "      },\n" +  
            "      \"sold\":{\n" +  
            "        \"type\": \"integer\"\n" +  
            "      },\n" +  
            "      \"commentCount\":{\n" +  
            "        \"type\": \"integer\",\n" +  
            "        \"index\": false\n" +  
            "      },\n" +  
            "      \"isAD\":{\n" +  
            "        \"type\": \"boolean\"\n" +  
            "      },\n" +  
            "      \"updateTime\":{\n" +  
            "        \"type\": \"date\"\n" +  
            "      }\n" +  
            "    }\n" +  
            "  }\n" +  
            "}";  
}
  • 批量操作
    • IndexRequest,也就是新增
    • UpdateRequest,也就是修改
    • DeleteRequest,也就是删除
package es;  
  
import cn.hutool.core.bean.BeanUtil;  
import cn.hutool.json.JSONUtil;  
import com.hmall.item.ItemApplication;  
import com.hmall.item.domain.po.Item;  
import com.hmall.item.domain.po.ItemDoc;  
import com.hmall.item.service.IItemService;  
import org.apache.http.HttpHost;  
import org.elasticsearch.action.index.IndexRequest;  
import org.elasticsearch.client.RequestOptions;  
import org.elasticsearch.client.RestClient;  
import org.elasticsearch.client.RestHighLevelClient;  
import org.elasticsearch.common.xcontent.XContentType;  
import org.junit.jupiter.api.AfterEach;  
import org.junit.jupiter.api.BeforeEach;  
import org.junit.jupiter.api.Test;  
import org.springframework.beans.factory.annotation.Autowired;  
import org.springframework.boot.test.context.SpringBootTest;  
  
@SpringBootTest(properties = "spring.profiles.active=local", classes = ItemApplication.class)  
public class ElasticDocmentTest {  
    private RestHighLevelClient client;  
  
    @Autowired  
    private IItemService itemService;  
  
    @Test  
    public void testIndexDocument() throws Exception {  
        // 1.根据id查询商品数据  
        Item item = itemService.getById(100002644680L);  
        // 2.转换为文档类型  
        ItemDoc itemDoc = BeanUtil.copyProperties(item, ItemDoc.class);  
        // 3.将ItemDTO转json  
        String doc = JSONUtil.toJsonStr(itemDoc);  
  
        // 1.准备Request对象  
        IndexRequest request = new IndexRequest("items").id(itemDoc.getId());  
        // 2.准备Json文档  
        request.source(doc, XContentType.JSON);  
        // 3.发送请求  
        client.index(request, RequestOptions.DEFAULT);  
    }  
  
    @BeforeEach  
    public void setUp() {  
        client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://127.0.0.1:9200")));  
    }  
  
    @AfterEach  
    public void tearDown() throws Exception {  
        if (client != null) {  
            client.close();  
        }  
    }  
}