Elasticsearch Introduction

What is Elasticsearch?

Elasticsearch is a distributed, real-time, search and analytics platform.

Yeah, but what IS Elasticsearch?

In the previous definition you can see all these hype-sounding tech terms (distributed, real-time, analytics), so let’s try to explain.

ES is distributed, it organizes information in clusters of nodes, so it will run in multiple servers if we intend it to.

ES is real-time, since data is indexed, we get responses to our queries super fast!

And last but not least, it does searches and analytics. The main problem we are solving with this tool is exploring our data!

A platform like ES is the foundation for any respectable search engine.

How does it work?

Using a restful API, Elasticsearch saves data and indexes it automatically. It assigns types to fields and that way a search can be done smartly and quickly using filters and different queries.


It’s uses JVM in order to be as fast as possible. It distributes indexes in “shards” of data. It replicates shards in different nodes, so it’s distributed and clusters can function even if not all nodes are operational. Adding nodes is super easy and that’s what makes it so scalable.


ES uses Lucene to solve searches. This is quite an advantage with comparing with, for example, Django query strings. A restful API call allows us to perform searches using json objects as parameters, making it much more flexible and giving each search parameter within the object a different weight, importance and or priority.


The final result ranks objects that comply with the search query requirements. You could even use synonyms, autocompletes, spell suggestions and correct typos. While the usual query strings provides results that follow certain logic rules, ES queries give you a ranked list of results that may fall in different criteria and its order depend on how they comply with a certain rule or filter.


ES can also provide answers for data analysis, like averages, how many unique terms and or statistics. This could be done using aggregations. To dig a little deeper in this feature check the documentation here.

Should I use ES?


The main point is scalability and getting results and insights very fast. In most cases using Lucene could be enough to have all you need.

It seems sometimes that these tools are designed for projects with tons of data and are distributed in order to handle tons of users. Startups dream of growing to that scenario, but may start thinking small first to build a prototype and then when the data is there, start thinking about scaling problems.

Does it make sense and pays off to be prepared to grow A LOT? Why not? Elasticsearch has no drawback and is easy to use, so it’s just a decision of using it to be prepared for the future.

I’m going to give you a quick example of a dead simple project using Elasticsearch to quickly and beautifully search for some example data. It will be quick to do, Python powered and ready to scale in case we need it to, so, best of both worlds.

Sample Demo


Create Index
$ curl -u elasticuser -H "Content-Type: application/json" -XPUT http://localhost:9200/engopsteam -d '{ 
	"mappings" : {
		"properties" : { 
			"id" : { "type": "long" },
			"displayName" : { "type": "text" },
			"userName" : { "type": "text" },
			"emailAddress" : {"type": "text" },
			"startDate" : {"type": "date", "format" : "yyyy-MM-dd HH:mm:ss" },
			"issuesCreated" : {"type": "long"},
			"issuesResolved" : {"type": "long"}
		}
	} 
}'
Enter host password for user 'elasticuser': 
{"acknowledged":true,"shards_acknowledged":true,"index":"engopsteam"}


Create Documents
curl -u elasticuser -H "Content-Type: application/json" -XPOST http://localhost:9200/engopsteam/_doc -d '{
	"id" : "10001",
	"displayName" : "Venkat Prasad",
	"userName" : "prasadve",
	"emailAddress" : "venkat.prasad@noreply.com",
	"startDate" : "2015-01-01 01:00:00",
	"issuesCreated" : "100",
	"issuesResolved" : "60"
}'
Enter host password for user 'elasticuser': 


$ curl -u elasticuser -H "Content-Type: application/json" -XPOST http://localhost:9200/engopsteam/_doc -d '{
	"id" : "10001",
	"displayName" : "Venkat Prasad",
	"userName" : "prasadve",
	"emailAddress" : "venkat.prasad@noreply.com",
	"startDate" : "2015-01-01 01:00:00",
	"issuesCreated" : "100",
	"issuesResolved" : "60"
}'
{"_index":"engopsteam","_type":"_doc","_id":"N9OD3HABKUnfGCRJZ5pa","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":1,"_primary_term":1}
$ 


Update Document (partial update)
curl -XPOST http://localhost:9200/engopsteam/_doc/N9OD3HABKUnfGCRJZ5pa/_update -d '{
	"doc" : {
		"issuesResolved": "70"
	}
}'


Delete Document
curl -XDELETE http://elkstack-server:8080/prasadve/_doc/AVEfDHKHZteFAc4UPY0j