Elasticsearch

Elasticsearch

Elasticsearch is also an open-source full-text search engine. It is highly scalable, It introduced in somewhere in 2010. Nowadays it became the most popular log analytic platform. It plays a major role in ELK stack (Logstash, Kibana). Elasticsearch is a Java search server that runs in a Java application server. Although Elasticsearch is mainly used by Java applications. Elasticsearch goes beyond free-text search and provides structured search, hit word highlighting, aggregations, facets over the data, and more. These capabilities enable users and developers to extract valuable information from their data regardless of the form.

While Elasticsearch makes it possible to perform various types of searches and aggregations, it is not suitable to provide advanced analytics and data mining features. Elasticsearch is primarily designed as a search engine, but since Elasticsearch is also able to retrieve documents by IDs, it can use as a regular non-relational document-based data store rather than a database.

Although Elasticsearch is giving functionalities such as acting as a data storage solution. Furthermore, concurrency control, lack of backup functionality, transactions, and durability seem to be some of the weaknesses of Elasticsearch as a storage solution at the moment. This is also why Elasticsearch is in the first place still a search engine and not a database. Elasticsearch is the main component of the ELK stack and provides its storage and search engine capabilities.

Conceptualization of Elasticsearch

Its main feature is providing search capabilities. There are two features are using to do sophisticated searches those are query and filters. It mostly uses Full-text, Term, Match, and Prefix out of multiple types of queries. The Geo filter is also one of the most important features of Elasticsearch. Since Elasticsearch is built on top of Lucene, Elasticsearch makes use of all features providing from Lucene and extends them providing additional features.

An Elasticsearch Shard, which contains the data, is an index of Lucene. There are two types of shards, Primary and Replica. These shards are built of multiple segments which can be optimized by the Segment merging capability. An Elasticsearch Node can contain multiple shards. There are four types of nodes, namely Tribe, Master, Data, and Routing. By creating an Elasticsearch node, it automatically searches for other existing nodes by using the Zen discovery feature. This means that Zen discovery enables nodes to build a cluster. The Cluster state provides actual information about the cluster.
The below image represents the conceptualization of Elasticsearch.

Conceptualization of Elasticsearch

Key features of Elasticsearch

There are few key features as below,

1. Node and Cluster

A single instance of an Elasticsearch server can introduce as a node. The node utilizes, for storing data and helps in the indexing/searching capabilities. Elasticsearch can work as a standalone and single-search server. A single node in Elasticsearch deployment can be sufficient for many simple use cases. Due to reasons of fault tolerance and sufficient storage, the Elasticsearch server can be run on many cooperating servers. These servers introduce a cluster that consists of one or more multiple nodes with the same cluster name. They are working together to share data and workload. When the time of adding or removing nodes from the cluster, the cluster reorganizes itself in order to distribute the data evenly. In Elasticsearch, nodes can play four types of roles:

  1. master node – The master node acts as a supervisor of the cluster and is responsible for the management of the cluster.
  2. data node – The data node indexes documents and performs searches on indexed documents. In order to improve the performance or scaling out, data nodes can add to the cluster.
  3. routing/load balancer node – The routing/load balancer node is responsible for balancing the load, routing the requests for searches, and indexing the documents to appropriate nodes.
  4. tribe node – The tribe node can join multiple clusters and thus casts as a bridge between them.

By default, after starting a node in Elasticsearch, it will search for other nodes in the same network with the same cluster name. This mechanism is introducing as a zen discovery. It enables Elasticsearch to take a core of discovering nodes on the network and to bind them into a cluster

2. Shard

A shard is a physical entity that stores the data for each index. Each Elasticsearch index consists of one or more shards. Thus, each index has a number of primary and replica shards. For that reason, it is possible to spread shards among all the nodes in the cluster. It can move from one node to another in the case of node failures or the addition of new nodes to the cluster. The size of a shard has no technical upper limit, however, there is a limit to how big a shard can be with respect to the underlying hardware.
There are two types of shards,

  1. Primary shards
  2. Replica shards.

It is possible to store documents within a primary shard and by default, every index has five primary shards. This parameter is configurable. Hence it is not possible to change it once it creates the index. A replica shard is a full copy of the primary shard and replica shards can be dynamically altered, also after index creation. Replica shards are automatically promoted to primary shards if a primary shard fails. By default, every primary shard has one replica shard.

3. Index

An index in Elasticsearch is the logical place where a collection of documents, Those store number of common characteristics. Due to this, an index can divide into smaller pieces, namely shards, and optimize its structure for fast and efficient full-text searching. A cluster can contain any number of indices. Indices are conceptually similar to databases in traditional relational database systems. While Lucene has only one index, an Elasticsearch index consists of many shards and each shard can consider as Lucene index. Each index comprises multiple document types, which in turn contain multiple JSON documents, and each document contains multiple document fields with an associated data type.

Search

Elasticsearch provides search and analysis capabilities. The search in Elasticsearch is near real-time. This means that although documents are indexed immediately after they are successfully added to an index, they do not appear in the search results until the index is refreshed. Elasticsearch does not refresh the indices after each update. Instead, it makes the use of a specified time interval, also called refresh interval, to perform this operation. By default, the refresh interval is one second. Since refreshing is costly in terms of disk I/O, it can affect the indexing performance. For that reason, increasing the refresh interval before updating a large number of documents is useful. Elasticsearch provides a Search API that supports GET and POST methods and enables them to search across multiple indices. However, more complex searches can accomplish by using the Query DSL which allows for using Queries and Filters.

Reference
https://qbox.io/blog/elasticsearch-logstash-kibana-manage-nginx-logs
https://www.elastic.co/guide/en/logstash/current/input-plugins.html
https://www.elastic.co/guide/en/logstash/current/advanced-pipeline.html

2 Replies to “Elasticsearch”

  1. If I want to enable security for elastic search ho can I acchive that. Basicaly I want to authenticate by domain user(LDAP)