In Elasticsearch, a document belongs to a type, and those types live inside an index.
You can draw some (rough) parallels to a traditional relational database:
Relational DB ⇒ Databases ⇒ Tables ⇒ Rows ⇒ Columns
Elasticsearch ⇒ Index⇒ Types ⇒ Documents ⇒ Fields
Note: In Elasticseach version 6 or more we need to create one index per document type.
cluster – An Elasticsearch cluster consists of one or more nodes and is identifiable by its cluster name.
node – A single Elasticsearch instance. In most environments, each node runs on a separate box or virtual machine.
index – In Elasticsearch, an index is a collection of documents.
shard – Because Elasticsearch is a distributed search engine, an index is usually split into elements known as shards that are distributed across multiple nodes. Elasticsearch automatically manages the arrangement of these shards. It also rebalances the shards as necessary, so users need not worry about the details.
replica – By default, Elasticsearch creates five primary shards and one replica for each index. This means that each index will consist of five primary shards, and each shard will have one copy.
Index is divided into Shard to avoid hardware limit of storage on single machine.
Sharding is important for two primary reasons:
- It allows you to horizontally split/scale your content volume
- It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
Elastic further create Replica of Shard so that in case on get offline, at that time replica is there to support.
To add data to Elasticsearch, we need an index—a place to store related data. In reality, an index is just a logical namespace that points to one or more physical shards. A shard is a low-level worker unit that holds just a slice of all the data in the index.
Actually, in Elasticsearch, our data is stored and indexed in shards,while an index is just a logical namespace that groups together one or more shards. However, this is an internal detail; our application shouldn’t care about shards at all. As far as our application is concerned, our documents live in an index. Elasticsearch takes care of the details.