Lambda Architecture Over View:
Lambda Architecture (LA) is a scalable and fault-tolerant data processing architecture.
Few years back when Big data analysis was done only through batch process using Hadoop. The evaluation in Big data technologies makes the Big data analysis real time. One of the approach to get the real time data for analytics is Lambda Architecture.
The underlying motivation for building systems
with Lambda Architecture are:
- · The need for a robust system that is fault-tolerant, both against hardware failures and human mistakes.
- · To serve a wide range of workloads and use cases, in which low-latency reads and updates are required. Related to this point, the system should support ad-hoc queries.
- · The system should be linearly scalable, and it should scale out rather than up, meaning that throwing more machines at the problem will do the job.
- The system should be extensible so that features can be added easily, and it should be easily de-buggable and require minimal maintenance.
Essentially, the Lambda Architecture comprises the following
components, processes, and responsibilities are:
·
New Data:
All data entering the system is dispatched to both the batch layer and the
speed layer for processing.
·
Batch layer:
This layer has two functions: (i) managing the master dataset, an
immutable, append-only set of raw data, and (ii) to pre-compute arbitrary query
functions, called batch views. Hadoop's HDFS is typically used to store the master
dataset and perform the computation of the batch views using MapReduce.
·
Serving
layer: This layer indexes the batch views so that they can be queried in ad hoc
with low latency. To implement the serving layer, usually technologies such
as Apache HBase or ElephantDB are utilized. The Apache Drill project provides the capability to execute full ANSI
SQL 2003 queries against batch views.
·
Speed
layer:This layer compensates for the high latency of updates to the serving
layer, due to the batch layer. Using fast and incremental algorithms, the speed
layer deals with recent data only. Storm is
often used to implement this layer.
·
Queries:
Last but not least, any
incoming query can be answered by merging results from batch views and
real-time vie.
Key Word: Lambda , Hadoop , Big Data
Key Word: Lambda , Hadoop , Big Data

No comments:
Post a Comment