Elasticsearch笔记
以下是看udemy的Elasticsearch入门教程做的简短笔记
基础知识、下载、配置(1-3)
1
2
3
4
5brew cask install java
下载elasticsearch、kibana
config/kibana 配置文件
bin/elasticsearch (9200)
bin/kibana 启动(5601端口)Elasticsearch is Document Oriented
数据存储引擎、分布式工作引擎集群,shard(4-7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24PUT /{index}/{type}/{id}
{
"field1": "value1",
"field2": "value2"
...
}
相同的id插入会生成不同的版本号
更新数据的原理是读取数据合并再put.只支持row级别,不支持字段级别
DELETE /vehicles/car/123
{
"_index": "vehicles",
"_type": "car",
"_id": "123",
"_version": 13,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 12,
"_primary_term": 1
}
即使删除同样也不会真的删除数据只会进行标记。磁盘空间不会立即释放mappings and settings.mappings类似与数据库中的schema
1
2
3
4
5
6
7
8
9
10
11
12"settings": {
"index": {
"creation_date": "1512370059209",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "Obppoj9yT4eQg5oq6Zv5IA",
"version": {
"created": "6000099"
},
"provided_name": "vehicles"
}
}GET business/_search
1
2
3
4
5
6
7
8
9
10
11
12GET business/building/_search
{
"query":{
"term":{"address":"pen"}
}
}
PUT数据后经过分析进行分词写入内存。到达一定数量后持久化写入磁盘。组成segment。且不可变
tokenization -> filter
1. Remove Stop Words
2. Lowercasing
3. Stemming
4. SynonymsToken Exists in token 1 inverted index(8-9)
Data Types for Document Fields
String Fields: text,keyword
Numeric Fields: long,integer,short,byte,double,float
Date Fields: text,keyword
True/False Fields: boolean
Binary Fields: binary- 注意。和教程不一致了。教程使用的5版本,新版本6进行了破坏性升级。以前index/type/documents的概念后续会去掉type。在6里面单个index里面无法存储多个type。故也无法设置多个mapping
关闭写入内容时自动生成mappings(dynamic:false/strict)
-
1
2
3
4
5POST _analyze
{
"analyzer": "simple",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
search DSL (10-14)
Domain Specific Language
Query Context & Filter Context
Query
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15match_all
"match": {"name": "computer"}
这查询语法简直让人有种生不如死的感觉
GET /courses/_search
{
"query": {
"bool": {
"must": [{"match": {"room": "c8"}},
{"match": {"name": "computer"}}]
}
}
}
must/must_not/should/minimun_should_match
multi_match/match_phrase /match_phrase_prefix
rangeFilter
- 不能单独使用
- 不进行评分
- ElasticSearch会对结果进行缓存
- 可以先常规查询得到评分。再使用filter对结果过滤
Aggregation
- endpoint->_search/_count
- bulk indexing
- terms:field类似与group、aggs来聚合、avg、max、min函数(如果需要根据聚合的结果来排序怎么办)
- 先执行query。后面根据结果进行聚合
- stats(min、max、avg、sum)
- bucket和metric的概念。range
- 对于group by得到的结果再次使用range(from、to)进行二次分组。分组后可以嵌套使用agg聚合
logstash and kibana (15-17)
logstash
- input、filter、output(ruby语法)
Kibana
- kibana需要先设置默认index
- Visualize新建单个视图、Dashboard组合它们