代码随记 SpringBoot Elasticsearch
1 安装
es:
image: elasticsearch:7.12.1
container_name: es
environment:
- ES_JAVA_OPTS=-Xms512m -Xmx512m
- discovery.type=single-node
volumes:
- /docker/es/data:/usr/share/elasticsearch/data
- /docker/es/plugins:/usr/share/elasticsearch/plugins
ports:
- "9200:9200"
- "9300:9300"
restart: unless-stopped
kibana:
image: kibana:7.12.1
container_name: kibana
environment:
- ELASTICSEARCH_HOSTS=http://es:9200
ports:
- "5601:5601"
depends_on:
- es
restart: unless-stopped2 认识
正向索引是最传统的,根据id索引的方式。
他会以关键字为索引,做一张表,这就是倒排索引表。
elasticsearch是面向 文档(Document) 存储的,可以是数据库中的一条商品数据,一个订单信息。文档数据会被序列化为json格式后存储在elasticsearch中:
| id(索引) | title | price |
|---|---|---|
| 1 | 小米手机 | 3499 |
| 2 | 华为手机 | 4999 |
| 3 | 华为小米充电器 | 49 |
| 4 | 小米手环 | 49 |
| … | … | … |
| 词条(索引) | 文档id |
|---|---|
| 小米 | 1,3,4 |
| 手机 | 1,2 |
| 华为 | 2,3 |
| 充电器 | 3 |
| 手环 | 4 |
例如搜索华为手机,得到分词 华为 和 手机,根据词条对应的文档ID,去查询ID对应的商品信息并返回结果。
正向索引:
- 优点:
- 可以给多个字段创建索引
- 根据索引字段搜索、排序速度非常快
- 缺点:
- 根据非索引字段,或者索引字段中的部分词条查找时,只能全表扫描。 倒排索引:
- 优点:
- 根据词条搜索、模糊搜索时,速度非常快
- 缺点:
- 只能给词条创建索引,而不是字段
- 无法根据字段做排序
3 入门
3.1 分词
docker exec -it es ./bin/elasticsearch-plugin install https://release.infinilabs.com/analysis-ik/stable/elasticsearch-analysis-ik-7.12.1.zip- 默认,标准分词器智能1字1词条,无法正确对中文做分词
POST /_analyze
{
"analyzer": "standard",
"text": "学习java太棒了"
}{
"tokens" : [
{
"token" : "学",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "习",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "java",
"start_offset" : 2,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "太",
"start_offset" : 6,
"end_offset" : 7,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "棒",
"start_offset" : 7,
"end_offset" : 8,
"type" : "<IDEOGRAPHIC>",
"position" : 4
},
{
"token" : "了",
"start_offset" : 8,
"end_offset" : 9,
"type" : "<IDEOGRAPHIC>",
"position" : 5
}
]
}- IK分词器
ik_smart:智能语义切分ik_max_word:最细粒度切分
POST /_analyze
{
"analyzer": "ik_smart",
"text": "学习java太棒了"
}{
"tokens" : [
{
"token" : "学习",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "java",
"start_offset" : 2,
"end_offset" : 6,
"type" : "ENGLISH",
"position" : 1
},
{
"token" : "太棒了",
"start_offset" : 6,
"end_offset" : 9,
"type" : "CN_WORD",
"position" : 2
}
]
}3.2 词典
有些词他默认是没有的,需要我们自己额外添加
es\plugins\analysis-ik\config\IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">ext.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>创建文件 ext.dic 在里面添加就可以了。
有个坑
如果直接在 plugins/analysis-ik 里面创建config是不生效的,需要
先将CONFIG复制出来 docker cp es:/usr/share/elasticsearch/config/analysis-ik ./es/ik-config,然后在docker里面重新映射 - ./es/ik-config:/usr/share/elasticsearch/config/analysis-ik 就可以了
4 操作
- 索引
- 是指分类的意思,例如存储用户的数据库、存储商品的数据库。
- 文档
- 一条数据,例如一条用户数据、一条商品数据
4.1 索引操作
Mapping属性包括:
type:字段数据类型,常见的简单类型有:- 字符串:
text(可分词的文本)、keyword(精确值,例如:品牌、国家、ip地址) - 数值:
long、integer、short、byte、double、float、 - 布尔:
boolean - 日期:
date - 对象:
object
- 字符串:
index:是否创建索引,默认为trueanalyzer:使用哪种分词器properties:该字段的子字段创建索引库
info:使用IK分词email:他是精确值因此使用keywordname:由 lastName 和 firstName 组成因此使用properties
PUT /heima
{
"mappings": {
"properties": {
"info":{
"type": "text",
"analyzer": "ik_smart"
},
"email":{
"type": "keyword",
"index": "false"
},
"name":{
"type": "object",
"properties": {
"firstName": {
"type": "keyword"
},
"lastName": {
"type": "keyword"
}
}
}
}
}
}- 查询
GET /heima- 删除
DELETE /heima- 修改
- 倒排索引结构虽然不复杂,但是一旦数据结构改变(比如改变了分词器),就需要重新创建倒排索引,这简直是灾难。因此索引库一旦创建,无法修改mapping。
- 因此可以添加新索引
PUT /heima/_mapping
{
"properties": {
"age":{
"type": "byte"
}
}
}4.2 文档操作
POST /索引库名/_doc/文档id
{
"字段1": "值1",
"字段2": "值2",
"字段3": {
"子属性1": "值3",
"子属性2": "值4"
},
}- demo
POST /heima/_doc/1
{
"info": "程序员ES学习",
"email": "[email protected]",
"name": {
"firstName": "云",
"lastName": "赵"
}
}- 查询
GET /heima/_doc/1- 删除
DELETE /heima/_doc/1- 全量修改,删除原本的,新建新的
PUT /heima/_doc/1
剩下代码就不给了,和上面添加新的一样- 增量修改
POST /库名/_update/文档id:重点是中间变成了_update
POST /heima/_update/1
{
"doc": {
"email": "[email protected]"
}
}- 批量处理
index代表新增操作_index:指定索引库名_id指定要操作的文档id{ "field1" : "value1" }:则是要新增的文档内容
delete代表删除操作_index:指定索引库名_id指定要操作的文档id
update代表更新操作_index:指定索引库名_id指定要操作的文档id{ "doc" : {"field2" : "value2"} }:要更新的文档字段
POST _bulk
{ "index" : { "_index" : "索引库名", "_id" : "1" } }
{ "字段1" : "值1" }
{ "delete" : { "_index" : "索引库名", "_id" : "2" } }
{ "create" : { "_index" : "索引库名", "_id" : "3" } }
{ "字段1" : "值1" }
{ "update" : {"_index" : "索引库名", "_id" : "1" } }
{ "字段1" : "值1" }5 RestAPI
1)在item-service模块中引入es的RestHighLevelClient依赖:
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>2)因为SpringBoot默认的ES版本是7.17.10,所以我们需要覆盖默认的ES版本:
<properties>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<elasticsearch.version>7.12.1</elasticsearch.version>
</properties>3)初始化RestHighLevelClient: 初始化的代码如下:
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
HttpHost.create("http://192.168.150.101:9200")
));5.1 索引库
结合数据库表结构,以上字段对应的mapping映射属性如下:
| 字段名 | 字段类型 | 类型说明 | 是否 参与搜索 | 是否 参与分词 | 分词器 | |
| id | long | 长整数 | —— | |||
| name | text | 字符串,参与分词搜索 | IK | |||
| price | integer | 以分为单位,所以是整数 | —— | |||
| stock | integer | 字符串,但需要分词 | —— | |||
| image | keyword | 字符串,但是不分词 | —— | |||
| category | keyword | 字符串,但是不分词 | —— | |||
| brand | keyword | 字符串,但是不分词 | —— | |||
| sold | integer | 销量,整数 | —— | |||
| commentCount | integer | 评价,整数 | —— | |||
| isAD | boolean | 布尔类型 | —— | |||
| updateTime | Date | 更新时间 | —— | |||
因此,最终我们的索引库文档结构应该是这样:
PUT /items
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "ik_max_word"
},
"price":{
"type": "integer"
},
"stock":{
"type": "integer"
},
"image":{
"type": "keyword",
"index": false
},
"category":{
"type": "keyword"
},
"brand":{
"type": "keyword"
},
"sold":{
"type": "integer"
},
"commentCount":{
"type": "integer",
"index": false
},
"isAD":{
"type": "boolean"
},
"updateTime":{
"type": "date"
}
}
}
}代码
package es;
import org.apache.http.HttpHost;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
public class ElasticTest {
private RestHighLevelClient client;
@Test
public void test() throws Exception {
// TODO: write test cases
System.out.println("Hello World!");
}
@Test
void testCreateIndex() throws Exception {
// 准备请求对象
CreateIndexRequest request = new CreateIndexRequest("items");
// 准备请求参数
request.source(MAPPING_TEMPLATE, XContentType.JSON);
// 发送
client.indices().create(request, RequestOptions.DEFAULT);
}
@Test
void testGetIndex() throws Exception {
GetIndexRequest request = new GetIndexRequest("items");
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
}
@Test
void testDeleteIndex() throws Exception {
DeleteIndexRequest request = new DeleteIndexRequest("items");
client.indices().delete(request, RequestOptions.DEFAULT);
}
@BeforeEach
public void setUp() {
client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://127.0.0.1:9200")));
}
@AfterEach
public void tearDown() throws Exception {
if (client != null) {
client.close();
}
}
private static final String MAPPING_TEMPLATE = "{\n" +
" \"mappings\": {\n" +
" \"properties\": {\n" +
" \"id\": {\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"name\":{\n" +
" \"type\": \"text\",\n" +
" \"analyzer\": \"ik_max_word\"\n" +
" },\n" +
" \"price\":{\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"stock\":{\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"image\":{\n" +
" \"type\": \"keyword\",\n" +
" \"index\": false\n" +
" },\n" +
" \"category\":{\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"brand\":{\n" +
" \"type\": \"keyword\"\n" +
" },\n" +
" \"sold\":{\n" +
" \"type\": \"integer\"\n" +
" },\n" +
" \"commentCount\":{\n" +
" \"type\": \"integer\",\n" +
" \"index\": false\n" +
" },\n" +
" \"isAD\":{\n" +
" \"type\": \"boolean\"\n" +
" },\n" +
" \"updateTime\":{\n" +
" \"type\": \"date\"\n" +
" }\n" +
" }\n" +
" }\n" +
"}";
}5.2 文档
- 批量操作
IndexRequest,也就是新增UpdateRequest,也就是修改DeleteRequest,也就是删除
package es;
import cn.hutool.core.bean.BeanUtil;
import cn.hutool.json.JSONUtil;
import com.hmall.item.ItemApplication;
import com.hmall.item.domain.po.Item;
import com.hmall.item.domain.po.ItemDoc;
import com.hmall.item.service.IItemService;
import org.apache.http.HttpHost;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.junit.jupiter.api.AfterEach;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
@SpringBootTest(properties = "spring.profiles.active=local", classes = ItemApplication.class)
public class ElasticDocmentTest {
private RestHighLevelClient client;
@Autowired
private IItemService itemService;
@Test
public void testIndexDocument() throws Exception {
// 1.根据id查询商品数据
Item item = itemService.getById(100002644680L);
// 2.转换为文档类型
ItemDoc itemDoc = BeanUtil.copyProperties(item, ItemDoc.class);
// 3.将ItemDTO转json
String doc = JSONUtil.toJsonStr(itemDoc);
// 1.准备Request对象
IndexRequest request = new IndexRequest("items").id(itemDoc.getId());
// 2.准备Json文档
request.source(doc, XContentType.JSON);
// 3.发送请求
client.index(request, RequestOptions.DEFAULT);
}
@BeforeEach
public void setUp() {
client = new RestHighLevelClient(RestClient.builder(HttpHost.create("http://127.0.0.1:9200")));
}
@AfterEach
public void tearDown() throws Exception {
if (client != null) {
client.close();
}
}
}