数据包下载:http://www.rendoumi.com/soft/testdata.tgz
准备了三种数据,collections、products、users,要导入elasticsearch,压缩后247兆,先解压看看大小:
-rw-r--r-- 1 root root 47M May 29 10:30 collections-anon.txt
-rw-r--r-- 1 root root 522M May 29 10:33 products-anon.txt
-rw-r--r-- 1 root root 857M May 29 10:36 users-anon.txt
users是用户表,products是产品表,collections是用户的收藏表。
把collections和products导入elasticsearch的时候都没有问题,瞬间就导入了,但是导入users的时候却不行,直接报错!!
# curl -s -XPOST http://localhost:9200/_bulk --data-binary @user-anon.txt
org.jboss.netty.handler.codec.frame.TooLongFrameException: HTTP content length exceeded 104857600 bytes.
看起来是数据太大,超过了http post的限制了,改下配置:
vi config/elasticsearch.yml
...
network.host: 172.16.11.2,127.0.0.1
http.port: 9200
http.max_content_length: 1024mb
...
把content放大到1G,再提交,依然报错,java内存溢出了
[WARN ][http.netty ] [Ogress] Caught exception while handling client http traffic, closing connection ...(省略)
java.lang.OutOfMemoryError: Java heap space
我去,继续改,修改启动文件,把堆放大到4g
vi bin/elasticsearch
...
# Maven will replace the project.name with elasticsearch below. If that
# hasn't been done, we assume that this is not a packaged version and the
# user has forgotten to run Maven to create a package.
ES_HEAP_SIZE=4g
...
再启动,哈哈,数据跑的太多,这回改xshell崩溃了
但是elasticsearch日志没报错,显示触发了了java的GC
进程监控也显示java使用量比较大
没办法,继续改,打开压缩,索引也压缩存放
vi bin/elasticsearch
...
network.host: 172.16.11.2,127.0.0.1
http.port: 9200
http.max_content_length: 1024mb
http.compression: true
index.store.compress.stored: true
index.store.compress.tv: true
...
提交数据的时候也先压缩,然后提交压缩包,并且不显示结果:
gzip users-anon.txt
curl --compressed -H "Content-encoding: gzip" -XPOST localhost:9200/_bulk --data-binary @users-anon.txt.gz > /dev/null
这回就完全ok了,用plugin的head看看,collections和products都是86772条记录,users是970446条记录。
在浏览器里打开网址,模拟*的查询
http://172.16.11.2:9200/_search?q=*
乱入一团麻,不好看:
弄得好看点:
http://172.16.11.2:9200/_search?q=*&pretty=on
这下也算是能看了: