以下是看udemy的Elasticsearch入门教程做的简短笔记

  1. 基础知识、下载、配置(1-3)

    1
    2
    3
    4
    5
    brew cask install java
    下载elasticsearch、kibana
    config/kibana 配置文件
    bin/elasticsearch (9200)
    bin/kibana 启动(5601端口)

    Elasticsearch is Document Oriented

    • insert Documents
    • Delete Documents
    • Retrieve Documents
    • Analyze Documents
    • Search Documents

      inverted index

      JSON对象(经典数据库的Table模型转变为JSON模型)

      ElasticSearch | Relational DB
      — | —
      Field | Column
      Document | Row
      Type | Table
      Index | Database

  2. 数据存储引擎、分布式工作引擎集群,shard(4-7)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    PUT /{index}/{type}/{id}
    {
    "field1": "value1",
    "field2": "value2"
    ...
    }
    相同的id插入会生成不同的版本号
    更新数据的原理是读取数据合并再put.只支持row级别,不支持字段级别
    DELETE /vehicles/car/123
    {
    "_index": "vehicles",
    "_type": "car",
    "_id": "123",
    "_version": 13,
    "result": "deleted",
    "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
    },
    "_seq_no": 12,
    "_primary_term": 1
    }
    即使删除同样也不会真的删除数据只会进行标记。磁盘空间不会立即释放

    mappings and settings.mappings类似与数据库中的schema

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    "settings": {
    "index": {
    "creation_date": "1512370059209",
    "number_of_shards": "5",
    "number_of_replicas": "1",
    "uuid": "Obppoj9yT4eQg5oq6Zv5IA",
    "version": {
    "created": "6000099"
    },
    "provided_name": "vehicles"
    }
    }

    GET business/_search

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    GET business/building/_search
    {
    "query":{
    "term":{"address":"pen"}
    }
    }
    PUT数据后经过分析进行分词写入内存。到达一定数量后持久化写入磁盘。组成segment。且不可变
    tokenization -> filter
    1. Remove Stop Words
    2. Lowercasing
    3. Stemming
    4. Synonyms

    Token | Exists in
    — | —
    token | 1

    elasticsearch_analyzer

  1. inverted index(8-9)

    Data Types for Document Fields

    String Fields: text,keyword
    Numeric Fields: long,integer,short,byte,double,float
    Date Fields: text,keyword
    True/False Fields: boolean
    Binary Fields: binary

    - 注意。和教程不一致了。教程使用的5版本,新版本6进行了破坏性升级。以前index/type/documents的概念后续会去掉type。在6里面单个index里面无法存储多个type。故也无法设置多个mapping

    • 关闭写入内容时自动生成mappings(dynamic:false/strict)
    • 各种analyzer测试

      1
      2
      3
      4
      5
      POST _analyze
      {
      "analyzer": "simple",
      "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
      }
  1. search DSL (10-14)

    • Domain Specific Language
    • Query Context & Filter Context

      1. Query

        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        match_all
        "match": {"name": "computer"}
        这查询语法简直让人有种生不如死的感觉
        GET /courses/_search
        {
        "query": {
        "bool": {
        "must": [{"match": {"room": "c8"}},
        {"match": {"name": "computer"}}]
        }
        }
        }
        must/must_not/should/minimun_should_match
        multi_match/match_phrase /match_phrase_prefix
        range
      2. Filter

        • 不能单独使用
        • 不进行评分
        • ElasticSearch会对结果进行缓存
        • 可以先常规查询得到评分。再使用filter对结果过滤
      3. Aggregation

        • endpoint->_search/_count
        • bulk indexing
        • terms:field类似与group、aggs来聚合、avg、max、min函数(如果需要根据聚合的结果来排序怎么办)
        • 先执行query。后面根据结果进行聚合
        • stats(min、max、avg、sum)
        • bucket和metric的概念。range
          • 对于group by得到的结果再次使用range(from、to)进行二次分组。分组后可以嵌套使用agg聚合
  2. logstash and kibana (15-17)

    1. logstash

      • input、filter、output(ruby语法)
    2. Kibana

      • kibana需要先设置默认index
      • Visualize新建单个视图、Dashboard组合它们