The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.

Overview

IK Analysis for Elasticsearch

The IK Analysis plugin integrates Lucene IK analyzer (http://code.google.com/p/ik-analyzer/) into elasticsearch, support customized dictionary.

Analyzer: ik_smart , ik_max_word , Tokenizer: ik_smart , ik_max_word

Versions

IK version ES version
master 7.x -> master
6.x 6.x
5.x 5.x
1.10.6 2.4.6
1.9.5 2.3.5
1.8.1 2.2.1
1.7.0 2.1.1
1.5.0 2.0.0
1.2.6 1.0.0
1.2.5 0.90.x
1.1.3 0.20.x
1.0.0 0.16.2 -> 0.19.0

Install

1.download or compile

  • optional 1 - download pre-build package from here: https://github.com/medcl/elasticsearch-analysis-ik/releases

    create plugin folder cd your-es-root/plugins/ && mkdir ik

    unzip plugin to folder your-es-root/plugins/ik

  • optional 2 - use elasticsearch-plugin to install ( supported from version v5.5.1 ):

    ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip
    

    NOTE: replace 6.3.0 to your own elasticsearch version

2.restart elasticsearch

Quick Example

1.create a index

curl -XPUT http://localhost:9200/index

2.create a mapping

curl -XPOST http://localhost:9200/index/_mapping -H 'Content-Type:application/json' -d'
{
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_smart"
            }
        }

}'

3.index some docs

curl -XPOST http://localhost:9200/index/_create/1 -H 'Content-Type:application/json' -d'
{"content":"美国留给伊拉克的是个烂摊子吗"}
'
curl -XPOST http://localhost:9200/index/_create/2 -H 'Content-Type:application/json' -d'
{"content":"公安部:各地校车将享最高路权"}
'
curl -XPOST http://localhost:9200/index/_create/3 -H 'Content-Type:application/json' -d'
{"content":"中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"}
'
curl -XPOST http://localhost:9200/index/_create/4 -H 'Content-Type:application/json' -d'
{"content":"中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"}
'

4.query with highlighting

curl -XPOST http://localhost:9200/index/_search  -H 'Content-Type:application/json' -d'
{
    "query" : { "match" : { "content" : "中国" }},
    "highlight" : {
        "pre_tags" : ["<tag1>", "<tag2>"],
        "post_tags" : ["</tag1>", "</tag2>"],
        "fields" : {
            "content" : {}
        }
    }
}
'

Result

{
    "took": 14,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 2,
        "hits": [
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "4",
                "_score": 2,
                "_source": {
                    "content": "中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首"
                },
                "highlight": {
                    "content": [
                        "<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首 "
                    ]
                }
            },
            {
                "_index": "index",
                "_type": "fulltext",
                "_id": "3",
                "_score": 2,
                "_source": {
                    "content": "中韩渔警冲突调查:韩警平均每天扣1艘中国渔船"
                },
                "highlight": {
                    "content": [
                        "均每天扣1艘<tag1>中国</tag1>渔船 "
                    ]
                }
            }
        ]
    }
}

Dictionary Configuration

IKAnalyzer.cfg.xml can be located at {conf}/analysis-ik/config/IKAnalyzer.cfg.xml or {plugins}/elasticsearch-analysis-ik-*/config/IKAnalyzer.cfg.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 扩展配置</comment>
	<!--用户可以在这里配置自己的扩展字典 -->
	<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>
	 <!--用户可以在这里配置自己的扩展停止词字典-->
	<entry key="ext_stopwords">custom/ext_stopword.dic</entry>
 	<!--用户可以在这里配置远程扩展字典 -->
	<entry key="remote_ext_dict">location</entry>
 	<!--用户可以在这里配置远程扩展停止词字典-->
	<entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry>
</properties>

热更新 IK 分词使用方法

目前该插件支持热更新 IK 分词,通过上文在 IK 配置文件中提到的如下配置

 	<!--用户可以在这里配置远程扩展字典 -->
	<entry key="remote_ext_dict">location</entry>
 	<!--用户可以在这里配置远程扩展停止词字典-->
	<entry key="remote_ext_stopwords">location</entry>

其中 location 是指一个 url,比如 http://yoursite.com/getCustomDict,该请求只需满足以下两点即可完成分词热更新。

  1. 该 http 请求需要返回两个头部(header),一个是 Last-Modified,一个是 ETag,这两者都是字符串类型,只要有一个发生变化,该插件就会去抓取新的分词进而更新词库。

  2. 该 http 请求返回的内容格式是一行一个分词,换行符用 \n 即可。

满足上面两点要求就可以实现热更新分词了,不需要重启 ES 实例。

可以将需自动更新的热词放在一个 UTF-8 编码的 .txt 文件里,放在 nginx 或其他简易 http server 下,当 .txt 文件修改时,http server 会在客户端请求该文件时自动返回相应的 Last-Modified 和 ETag。可以另外做一个工具来从业务系统提取相关词汇,并更新这个 .txt 文件。

have fun.

常见问题

1.自定义词典为什么没有生效?

请确保你的扩展词典的文本格式为 UTF8 编码

2.如何手动安装?

git clone https://github.com/medcl/elasticsearch-analysis-ik
cd elasticsearch-analysis-ik
git checkout tags/{version}
mvn clean
mvn compile
mvn package

拷贝和解压release下的文件: #{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-*.zip 到你的 elasticsearch 插件目录, 如: plugins/ik 重启elasticsearch

3.分词测试失败 请在某个索引下调用analyze接口测试,而不是直接调用analyze接口 如:

curl -XGET "http://localhost:9200/your_index/_analyze" -H 'Content-Type: application/json' -d'
{
   "text":"中华人民共和国MN","tokenizer": "my_ik"
}'
  1. ik_max_word 和 ik_smart 什么区别?

ik_max_word: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合,适合 Term Query;

ik_smart: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”,适合 Phrase 查询。

Changes

自 v5.0.0 起

  • 移除名为 ik 的analyzer和tokenizer,请分别使用 ik_smartik_max_word

Thanks

YourKit supports IK Analysis for ElasticSearch project with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.

Comments
  • 无法将ik设置为默认analyzer

    无法将ik设置为默认analyzer

    IndexCreationException[failed to create index]; nested: IllegalArgumentException[Unknown Analyzer type [ik_smart] for [default]]
    

    ik_smart改成ik也不行 版本是Elasticsearch 2.1.1,以前用1.7.1的时候没碰到过这样的问题,是有什么改动我没注意到吗? elasticsearch.yml里只有一条

    index.analysis.analyzer.default.type: ik_smart
    
    root@xxx:/opt/elasticsearch-jdbc-2.1.1.2/bin# /usr/share/elasticsearch/bin/plugin list
    Installed plugins in /usr/share/elasticsearch/plugins:
        - elasticsearch-analysis-mmseg-1.7.0
        - elasticsearch-analysis-pinyin-1.5.2
        - elasticsearch-analysis-stconvert-1.6.1
        - elasticsearch-analysis-ik-1.7.0
    
    opened by denghongcai 31
  • 类似#195的match_phrase问题

    类似#195的match_phrase问题

    medcl大神,我觉的ik分词的position可能有问题

    先描述下问题:原文中为“前次募集资金”,索引用的ik_max_word,搜索时match_phrase搜“前次募集资金”没问题,match_phrase搜“前次募集”啥也搜不到。

    ik_max_word的analyzer测试: _analyze?analyzer=ik_max_word&text=前次募集资金 返回: {"tokens":[{"token":"前次","start_offset":0,"end_offset":2,"type":"CN_WORD","position":0},{"token":"募集","start_offset":2,"end_offset":4,"type":"CN_WORD","position":1},{"token":"募","start_offset":2,"end_offset":3,"type":"CN_WORD","position":2},{"token":"集","start_offset":3,"end_offset":4,"type":"CN_CHAR","position":3},{"token":"基金","start_offset":4,"end_offset":6,"type":"CN_WORD","position":4}]}

    相关mapping: [ElasticProperty(IncludeInAll = false, IndexAnalyzer = "ik_max_word", SearchAnalyzer = "ik_max_word")] public string Title { get; set; }

    第一次用“前次募集资金”搜索:

    {
      "_source": "false",
      "highlight": {
        "fields": {
          "title": {}
        }
      },
      "query": {
        "match_phrase": {
          "title": {
            "query": "前次募集资金"
          }
        }
      }
    }
    

    返回有结果: {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":21,"max_score":3.6136353,"hits":[{"_index":"disclosure.main.alpha","_type":"esdisclosurecomp","_id":"73419","_score":3.6136353,"highlight":{"title":["国金证券:<em>前次募集资金</em>使用情况报告"]}}]}}

    然后第二次去掉“资金”:

    {
      "_source": "false",
      "highlight": {
        "fields": {
          "title": {}
        }
      },
      "size": 1,
      "query": {
        "match_phrase": {
          "title": {
            "query": "前次募集"
          }
        }
      }
    }
    

    此时返回无匹配: {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

    我就奇了怪了,于是加了term vector,翻国金证券这篇,找到:

    "前次" : {
              "term_freq" : 1,
              "tokens" : [ {
                "position" : 5,
                "start_offset" : 5,
                "end_offset" : 7
              } ]
            },
    
    "募集" : {
              "term_freq" : 1,
              "tokens" : [ {
                "position" : 7,
                "start_offset" : 7,
                "end_offset" : 9
              } ]
            },
    
    "募集资金" : {
              "term_freq" : 1,
              "tokens" : [ {
                "position" : 6,
                "start_offset" : 7,
                "end_offset" : 11
              } ]
            },
    

    我觉得问题就在这里,在ik_max_word下,募集资金本身是一个完整的词,同时又可以被分为募集和资金,而募集的position离前次已经差1个词了,所以match_phrase不认为前次募集可以构成一个词组。您看看是不是这么回事,然后想问下有没有解决的方法,谢谢!

    opened by smilesfc 21
  • 关于热词更新 return bad code 405

    关于热词更新 return bad code 405


    错误提示 Elasticsearch版本 1.6.0 ik 对应版本

    [2015-08-06 14:31:11,567][INFO ][ik-analyzer ] remote_ext_dict http://localhost:8000/api/Dict/getCustomDict/ return bad code 405

    curl里第一次执行返回值:

    C:\Windows\system32>curl -i http://localhost:8000/api/Dict/getCustomDict/ HTTP/1.1 200 OK Cache-Control: public Content-Length: 12 Content-Type: text/html Last-Modified: Thu, 06 Aug 2015 03:29:17 GMT ETag: "2e7541aa8c6e5a2681adf19c007bc3d9" Server: Microsoft-IIS/8.0 X-AspNet-Version: 4.0.30319 X-Powered-By: ASP.NET Date: Thu, 06 Aug 2015 06:22:37 GMT

    medcl

    浣?

    curl里第二次执行返回值

    C:\Windows\system32>curl -i http://localhost:8000/api/Dict/getCustomDict/ HTTP/1.1 200 OK Cache-Control: public Content-Length: 71 Content-Type: text/html Last-Modified: Thu, 06 Aug 2015 06:25:45 GMT ETag: "91fb16762ac5ce1dee59a127f23daaf8" Server: Microsoft-IIS/8.0 X-AspNet-Version: 4.0.30319 X-Powered-By: ASP.NET Date: Thu, 06 Aug 2015 06:26:06 GMT

    medcl 浣? medcl 浣? 鍗撳垱 鍗撳垱璧勮 鏀剁洏 鏀剁洏浠?

    纾烽吀

    请问是哪里有问题的呢? Please told me where is the problem?Thank YOU! 另外,为什么中文显示的是乱码,读取的文件是utf-8格式。

    opened by starckgates 19
  • elasticsearch 1.2.1 中加载analysis-ik成功,报Analyzer [ik] not found?

    elasticsearch 1.2.1 中加载analysis-ik成功,报Analyzer [ik] not found?

    ./bin/elasticsearch

    [2014-07-08 15:09:05,884][INFO ][node ] [test-01] version[1.2.1], pid[16965], build[6c95b75/2014-06-03T15:02:52Z] [2014-07-08 15:09:05,884][INFO ][node ] [test-01] initializing ... [2014-07-08 15:09:05,898][INFO ][plugins ] [test-01] loaded [marvel, analysis-smartcn, analysis-ik, analysis-mmseg], sites [marvel, kopf] [2014-07-08 15:09:07,656][INFO ][node ] [test-01] initialized [2014-07-08 15:09:07,656][INFO ][node ] [test-01] starting ... [2014-07-08 15:09:07,730][INFO ][transport ] [test-01] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/192.168.16.128:9301]} [2014-07-08 15:09:10,761][INFO ][cluster.service ] [test-01] new_master [test-01]qBp7LwCZSnew5ElUaz4u4Q][testdeMacBook-Pro.local][inet[/192.168.16.128:9301]], reason: zen-disco-join (elected_as_master) [2014-07-08 15:09:10,782][INFO ][discovery ] [test-01] elasticsearch/qBp7LwCZSnew5ElUaz4u4Q [2014-07-08 15:09:10,817][INFO ][http ] [test-01] bound_address {inet[/0:0:0:0:0:0:0:0:9201]}, publish_address {inet[/192.168.16.128:9201]} [2014-07-08 15:09:11,366][INFO ][ik-analyzer ] [Dict Loading]ik/custom/mydict.dic [2014-07-08 15:09:11,367][INFO ][ik-analyzer ] [Dict Loading]ik/custom/single_word_low_freq.dic [2014-07-08 15:09:11,373][INFO ][ik-analyzer ] [Dict Loading]ik/custom/ext_stopword.dic [2014-07-08 15:09:11,393][INFO ][mmseg-analyzer ] chars loaded time=18ms, line=14861, on file=chars.dic [2014-07-08 15:09:11,394][INFO ][mmseg-analyzer ] words loaded time=1ms, line=3, on file=words-my.dic [2014-07-08 15:09:11,654][INFO ][mmseg-analyzer ] words loaded time=260ms, line=263638, on file=words.dic [2014-07-08 15:09:11,654][INFO ][mmseg-analyzer ] load all dic use time=279ms [2014-07-08 15:09:11,654][INFO ][mmseg-analyzer ] unit loaded time=0ms, line=35, on file=units.dic [2014-07-08 15:09:11,978][INFO ][gateway ] [test-01] recovered [8] indices into cluster_state [2014-07-08 15:09:11,980][INFO ][node ] [test-01] started

    curl -XPOST http://localhost:9200/index/fulltext/_mapping -d' { "fulltext": { "_all": { "indexAnalyzer": "ik", "searchAnalyzer": "ik", "term_vector": "no", "store": "false" }, "properties": { "content": { "type": "string", "store": "no", "term_vector": "with_positions_offsets", "indexAnalyzer": "ik", "searchAnalyzer": "ik", "include_in_all": "true", "boost": 8 } } } }'

    {"error":"MapperParsingException[Analyzer [ik] not found for field [content]]","status":400}

    opened by liujianping 16
  • elasticsearch5.0 analysis-ik 5.0 使用DEMO 报错

    elasticsearch5.0 analysis-ik 5.0 使用DEMO 报错

    错误信息 { "error":{ "root_cause":[ { "type":"mapper_parsing_exception", "reason":"analyzer [ik_max_word] not found for field [content]" } ], "type":"mapper_parsing_exception", "reason":"analyzer [ik_max_word] not found for field [content]" }, "status":400 }

    elasticsearch.yml 没做任何配置. elasticsearch/plugins/ik 已存在

    opened by zuozhehao 15
  • 不可以使用index/_analyze测试?

    不可以使用index/_analyze测试?

    用_search搜索的时候没问题, 但是不能用_analyze测试:

    调用 GET index/_analyze?analyzer=ik_max_word&text=中华人民共和国国歌

    返回

    {
       "tokens": [
          {
             "token": "20013",
             "start_offset": 2,
             "end_offset": 7,
             "type": "ARABIC",
             "position": 0
          },
          {
             "token": "21326",
             "start_offset": 10,
             "end_offset": 15,
             "type": "ARABIC",
             "position": 1
          },
          {
             "token": "20154",
             "start_offset": 18,
             "end_offset": 23,
             "type": "ARABIC",
             "position": 2
          },
          {
             "token": "27665",
             "start_offset": 26,
             "end_offset": 31,
             "type": "ARABIC",
             "position": 3
          },
          {
             "token": "20849",
             "start_offset": 34,
             "end_offset": 39,
             "type": "ARABIC",
             "position": 4
          },
          {
             "token": "21644",
             "start_offset": 42,
             "end_offset": 47,
             "type": "ARABIC",
             "position": 5
          },
          {
             "token": "22269",
             "start_offset": 50,
             "end_offset": 55,
             "type": "ARABIC",
             "position": 6
          },
          {
             "token": "22269",
             "start_offset": 58,
             "end_offset": 63,
             "type": "ARABIC",
             "position": 7
          },
          {
             "token": "27468",
             "start_offset": 66,
             "end_offset": 71,
             "type": "ARABIC",
             "position": 8
          }
       ]
    }
    
    opened by childe 14
  • null pointer exception

    null pointer exception

    [2015-06-10 10:20:47,209][DEBUG][action.search.type ] [ubuntu6] [guba][6], node[JydePkllR2WQdbgrRJDrCQ], [R], s[STARTED]: ❯ 5540 org.elasticsearch.transport.RemoteTransportException: [ubuntu9][inet[/219.224.135.94:9301]][indices:data/read/search[phase/query❯ 5541 Caused by: org.elasticsearch.search.SearchParseException: [guba][6]: from[-1],size[-1]: Parse Failure [Failed to parse source [{❯ 5542 ▸---at org.elasticsearch.search.SearchService.parseSource(SearchService.java:681) 5543 ▸---at org.elasticsearch.search.SearchService.createContext(SearchService.java:537) 5544 ▸---at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:509) 5545 ▸---at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:264) 5546 ▸---at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTra❯ 5547 ▸---at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTra❯ 5548 ▸---at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) 5549 ▸---at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 5550 ▸---at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 5551 ▸---at java.lang.Thread.run(Thread.java:745) 5552 Caused by: java.lang.NullPointerException 5553 ▸---at org.wltea.analyzer.dic.Dictionary.isStopWord(Dictionary.java:212) 5554 ▸---at org.wltea.analyzer.core.AnalyzeContext.getNextLexeme(AnalyzeContext.java:316) 5555 ▸---at org.wltea.analyzer.core.IKSegmenter.next(IKSegmenter.java:118) 5556 ▸---at org.wltea.analyzer.lucene.IKTokenizer.incrementToken(IKTokenizer.java:88) 5557 ▸---at org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:90) 5558 ▸---at org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:55) 5559 ▸---at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:217) 5560 ▸---at org.apache.lucene.util.QueryBuilder.createBooleanQuery(QueryBuilder.java:87) 5561 ▸---at org.elasticsearch.index.search.MatchQuery.parse(MatchQuery.java:210) 5562 ▸---at org.elasticsearch.index.query.MatchQueryParser.parse(MatchQueryParser.java:165) 5563 ▸---at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:277) 5564 ▸---at org.elasticsearch.index.query.BoolQueryParser.parse(BoolQueryParser.java:93) 5565 ▸---at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:277) 5566 ▸---at org.elasticsearch.index.query.IndexQueryParserService.innerParse(IndexQueryParserService.java:382) 5567 ▸---at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:281) 5568 ▸---at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:276)

    opened by linhaobuaa 13
  • 同义词在查询中不起作用

    同义词在查询中不起作用

    请大神帮忙看一下这个问题:我在查询时候无法返回同义词查询的结果,ES版本是2.2.1

    1. yml里这样定义analyzer: index: analysis: analyzer: ik_syno: type: custom tokenizer: ik_max_word filter: [my_synonym_filter] ik_syno_smart: type: custom tokenizer: ik_smart filter: [my_synonym_filter] filter: my_synonym_filter: type: synonym synonyms_path: analysis/synonym.txt
    2. 同义词是这么定义的,文件是UTF-8编码: 番茄,西红柿 白薯,地瓜,红薯
    3. 然后建立index: PUT items2 { "itemNames":{ "_all": { "enabled": "false" }, "properties": { "name": { "type": "string", "index": "analyzed", "analyzer": "ik_syno_smart", "search_analyzer": "ik_syno_smart", "index_analyzer":"ik_syno_smart" } } } }
    4. 测试同义词可以通过: POST items2/_analyze?analyzer=ik_syno_smart { "text": "番茄" }

    { "tokens": [ { "token": "番茄", "start_offset": 0, "end_offset": 2, "type": "CN_WORD", "position": 0 }, { "token": "西红柿", "start_offset": 0, "end_offset": 2, "type": "SYNONYM", "position": 0 } ] }

    但是无论用query_string还是match都无法查询到结果: POST items2/itemName/_search { "query": { "query_string": { "default_field": "name", "query": "番茄", "analyzer": "ik_syno_smart" } } }

    POST items2/itemName/_search { "query": { "match": { "name": { "query": "番茄", "analyzer": "ik_syno_smart" } } } }

    { "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } }

    这问题困扰我好几天了,请帮忙看一下,多谢~

    opened by mdifferent 12
  • 关于ik加入同义词的问题

    关于ik加入同义词的问题

    我是这样配置的 ,还是不起作用,index用的max的,search用的smart的,然后加载的同义词库也是没问题的,请教一下这个配置的有问题么,谢谢
    ik_max_word: type: ik use_smart: false ik_smart: type: custom tokenizer: ik filter: [sysfilter] use_smart: true

    opened by hanleijun 12
  • elasticsearch 1.4.0不能愉快的添加elasticsearch-analysis-ik

    elasticsearch 1.4.0不能愉快的添加elasticsearch-analysis-ik

    通过源码自己build的elasticsearch-analysis-ik。配置好elasticsearch-analysis-ik,启动的时候报错,求大神更新。

    Exception in thread "Thread-3" java.lang.NullPointerException
            at org.wltea.analyzer.dic.Monitor.run(Monitor.java:87)
            at java.lang.Thread.run(Thread.java:745)
    org.apache.http.client.ClientProtocolException
            at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp
    Client.java:186)
            at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp
    Client.java:82)
            at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttp
    Client.java:106)
            at org.wltea.analyzer.dic.Monitor.run(Monitor.java:64)
            at java.lang.Thread.run(Thread.java:745)
    Caused by: org.apache.http.ProtocolException: Target host is not specified
            at org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(DefaultR
    outePlanner.java:69)
            at org.apache.http.impl.client.InternalHttpClient.determineRoute(Interna
    lHttpClient.java:124)
            at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttp
    Client.java:183)
            ... 4 more
    Exception in thread "Thread-4" java.lang.NullPointerException
            at org.wltea.analyzer.dic.Monitor.run(Monitor.java:87)
            at java.lang.Thread.run(Thread.java:745)
    [2014-11-18 16:48:27,134][INFO ][mmseg-analyzer           ] chars loaded time=50
    8ms, line=12638, on file=chars.dic
    [2014-11-18 16:48:27,149][INFO ][mmseg-analyzer           ] words loaded time=2m
    s, line=4, on file=words-my.dic
    [2014-11-18 16:48:27,379][INFO ][mmseg-analyzer           ] words loaded time=22
    7ms, line=157202, on file=words.dic
    [2014-11-18 16:48:27,380][INFO ][mmseg-analyzer           ] load all dic use tim
    e=754ms
    [2014-11-18 16:48:27,382][INFO ][mmseg-analyzer           ] unit loaded time=1ms
    , line=22, on file=units.dic
    [2014-11-18 16:48:28,196][INFO ][gateway                  ] [Son of Satan] recov
    ered [2] indices into cluster_state
    [2014-11-18 16:52:12,078][INFO ][node                     ] [Son of Satan] stopp
    ing ...
    [2014-11-18 16:52:13,141][ERROR][marvel.agent.exporter    ] [Son of Satan] error
     sending data
    java.net.ConnectException: Connection refused: connect
            at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
            at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketI
    mpl.java:85)
            at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.ja
    va:339)
            at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocket
    Impl.java:200)
            at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java
    :182)
            at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
    

    卸载elasticsearch-analysis-ik后启动ok

    opened by ameizi 12
  • 加入集群后ik_max_word分词器找不到

    加入集群后ik_max_word分词器找不到

    http://localhost:9200/standard_auction_names/_analyze?analyzer=ik_max_word&pretty=true&text=%E4%B8%AD%E5%8D%8E%E4%BA%BA%E6%B0%91%E5%85%B1%E5%92%8C%E5%9B%BD

    org.elasticsearch.index.mapper.MapperParsingException: Analyzer [ik_max_word] not found for field [description]

    opened by sxhyll 12
  • 如何将固定词进行分词?

    如何将固定词进行分词?

    我想将一些固定(英文)搭配进行分词,如:

    POST /_analyze
    {
      "analyzer": "ik_max_word",
      "text": "helloWorld"
    }
    

    分词结果为:

    {
      "tokens": [
        {
          "token": "helloWorld",
          "start_offset": 0,
          "end_offset": 10,
          "type": "ENGLISH",
          "position": 0
        }
      ]
    }
    

    可以通过什么方式或自定义,来将其分词,分为helloworld

    opened by command-z-z 0
  • ik_smart分词短词和长词问题

    ik_smart分词短词和长词问题

    自定义词典里,加了 "极速版2.0", "极速版" 但是搜 “极速版2.0”,拆成了极速版, 2.0. 如下 { "tokens": [ { "token": "极速版", "start_offset": 0, "end_offset": 3, "type": "CN_WORD", "position": 0 }, { "token": "2.0", "start_offset": 3, "end_offset": 6, "type": "ARABIC", "position": 1 } ] }

    请问,同时加了长短2个词, 在搜长词时,能不能不拆?

    opened by heikehuan 1
  • fix(sec): upgrade org.apache.httpcomponents:httpclient to 4.5.13

    fix(sec): upgrade org.apache.httpcomponents:httpclient to 4.5.13

    What happened?

    There are 2 security vulnerabilities found in org.apache.httpcomponents:httpclient 4.5.2

    What did I do?

    Upgrade org.apache.httpcomponents:httpclient from 4.5.2 to 4.5.13 for vulnerability fix

    What did you expect to happen?

    Ideally, no insecure libs should be used.

    The specification of the pull request

    PR Specification from OSCS

    opened by eurrio 0
Releases(v8.5.2)
Owner
Medcl
Developer| Evangelist | Consultant
Medcl
Highly customized business metrics monitoring with TDengine & Spring Boot

Horus —— Highly customized business metrics monitoring with TDengine & Spring Boot 给予业务指标监控的高度定制自由 设计文档:https://akhnhwmr9k.feishu.cn/wiki/wikcnJJFmDHj

ArchLiu 6 Jun 17, 2022
Integrates with XRPLedger to send token payment to trustline addresses

strategyengine token distribution rest service This service also runs here -- https://fsedistributionservice-56gpv2b23a-uc.a.run.app/ #How to run loca

strategyengine 8 Dec 14, 2022
Bank Statement Analyzer Application that currently runs in terminal with the commands: javac Application.java java Application [file-name].csv GUI coming soon...

Bank Statement Analyzer Application that currently runs in terminal with the commands: javac Application.java java Application [file-name].csv GUI coming soon...

Hayden Hanson 0 May 21, 2022
Java - Packet Analyzer Application based on Java, Networking and Swing UI

Network-Packet-Tracer-using-Java Java - Packet Analyzer / Sniffing System Application based on Java, Networking and Swing UI Java - Packet Analyzer Ap

Muhammad Asad 6 Feb 3, 2022
mall学习教程,架构、业务、技术要点全方位解析。mall项目(40k+star)是一套电商系统,使用现阶段主流技术实现。涵盖了SpringBoot 2.3.0、MyBatis 3.4.6、Elasticsearch 7.6.2、RabbitMQ 3.7.15、Redis 5.0、MongoDB 4.2.5、Mysql5.7等技术,采用Docker容器化部署。

mall学习教程 简介 mall学习教程,架构、业务、技术要点全方位解析。mall项目(40k+star)是一套电商系统,使用现阶段主流技术实现。涵盖了SpringBoot 2.3.0、MyBatis 3.4.6、Elasticsearch 7.6.2、RabbitMQ 3.7.15、Redis 5

macro 11.7k Jan 8, 2023
mall-swarm是一套微服务商城系统,采用了 Spring Cloud Hoxton & Alibaba、Spring Boot 2.3、Oauth2、MyBatis、Docker、Elasticsearch、Kubernetes等核心技术,同时提供了基于Vue的管理后台方便快速搭建系统。mall-swarm在电商业务的基础集成了注册中心、配置中心、监控中心、网关等系统功能。文档齐全,附带全套Spring Cloud教程。

mall-swarm 友情提示 快速体验项目:在线访问地址。 全套学习教程:《mall学习教程》。 Spring Cloud全套教程:《SpringCloud学习教程》。 专属学习路线:学习不走弯路,整理了套非常不错的《mall专属学习路线》。 项目交流:想要加群交流项目的朋友,可以加入mall项目

macro 9.7k Jan 3, 2023
【咕泡学院实战项目】-基于SpringBoot+Dubbo构建的电商平台-微服务架构、商城、电商、微服务、高并发、kafka、Elasticsearch

咕泡商城- 微服务架构实战 咕泡商城是咕泡学院 Java架构课程中,帮助学员对于技术更好落地的一个实战项目,项目基于springboot2.1.6.RELEASE+Dubbo2.7.3 来构建微服务。 业务模块划分,尽量贴合互联网公司的架构体系。所以,除了业务本身的复杂度不是很高之外,整体的架构基本

Mic 4.5k Dec 26, 2022
🦄 开源社区系统:基于 SpringBoot + MyBatis + MySQL + Redis + Kafka + Elasticsearch + Spring Security + ... 并提供详细的开发文档和配套教程。包含帖子、评论、私信、系统通知、点赞、关注、搜索、用户设置、数据统计等模块。

Echo — 开源社区系统 项目上线到服务器之后可能会出现各种各样的 BUG,比如 Elasticsearch 服务启动失败导致搜索模块不可用,但是在本地运行是完全没问题的,所以各位小伙伴可以放心下载部署。 ?? 项目简介 Echo 是一套前后端不分离的开源社区系统,基于目前主流 Java Web

小牛肉 434 Jan 7, 2023
Official Elasticsearch Java Client

Elasticsearch Java Client The official Java client for Elasticsearch. Note: this project is still a work in progress. This client is meant to replace

elastic 230 Jan 8, 2023
mall4cloud微服务商城,基于Spring Cloud、Nacos、Seata、Mysql、Redis、RocketMQ、canal、ElasticSearch、minio的B2B2C微服务商城系统,采用主流的互联网技术架构、全新的UI设计 B2B2C微服务商城|小程序微服务商城|

README 前言 本商城是基于Spring Cloud、Nacos、Seata、Mysql、Redis、RocketMQ、canal、ElasticSearch、minio的微服务B2B2C电商商城系统,采用主流的互联网技术架构、全新的UI设计、支持集群部署、服务注册和发现以及拥有完整的订单流程等

null 3k Jan 1, 2023
An example on how to build a configurable widget to trigger external searches along with Liferay search to Elasticsearch.

Liferay External Searches An example on how to build a configurable widget to trigger external searches along with Liferay search to Elasticsearch. Ge

Louis-Guillaume Durand 4 Oct 25, 2021
This application will help you to generate Elasticsearch template based on your data

Welcome to templates generator application for Elasticsearch This application will help you to generate the template and/or test index, based on your

DBeast 2 Jan 2, 2023
Reindex - application for visualize, optimize and automate your Elasticsearch reindex process

Welcome to reindex application for Elasticsearch This application will help you to reindex one or more existing indices, into the local or remote Elas

DBeast 3 Jan 2, 2023
Minecraft configurable plugin , which sends messages the first time a player logs into the server or the next time they log in.

JoinMessages Minecraft configurable plugin , which sends messages the first time a player logs into the server or the next time they log in or leave.

ᴠᴀʟᴇɴᴛɪɴ ᴢʜᴇʟᴇᴠ 6 Aug 30, 2022
A Jenkins plugin for inserting the commits changelog into the jenkins build environment.

commits-changelog-env-plugin A Jenkins plugin for inserting the commits changelog into the jenkins build environment. Jenkins插件, 在构建时通过将提交的更新列表插入 Jenk

Chen Pan 1 Feb 16, 2022
Team project within the course of Software System Design and Analysis.

InnoBookCrossing - Application for sharing books at Innopolis Description The application is designed to help people share books with each other. All

Dariya 33 Oct 22, 2022
FEM for Students is a program of modeling and structural analysis by Finite Element Method

FEM For Students FEM for Students is a software for modeling and structural analysis by Finite Element Method. This software was developed by me in my

André de Sousa 4 Jun 23, 2022
Operating Systems - Concepts of computer operating systems including concurrency, memory management, file systems, multitasking, performance analysis, and security. Offered spring only.

Nachos for Java README Welcome to Nachos for Java. We believe that working in Java rather than C++ will greatly simplify the development process by p

Sabir Kirpal 1 Nov 28, 2021
Tags plugin with MySQL, MariaDB & SQLite support

zAngelTags Support if you have any trouble you can visit my discord and ask any type of question about my plugins Building If you want to build the pl

null 4 Jul 26, 2022