Friday, 19 August 2016

Lambda Architecture Over View:

Lambda Architecture (LA) is a scalable and fault-tolerant data processing architecture.

Few years back when Big data analysis was done only through batch process using Hadoop. The evaluation in Big data technologies makes the Big data analysis real time. One of the approach to get the real time data for analytics is Lambda Architecture.

The underlying motivation for building systems with Lambda Architecture are:

  1. ·         The need for a robust system that is fault-tolerant, both against hardware failures and human mistakes.
  2. ·         To serve a wide range of workloads and use cases, in which low-latency reads and updates are required. Related to this point, the system should support ad-hoc queries.
  3. ·         The system should be linearly scalable, and it should scale out rather than up, meaning that throwing more machines at the problem will do the job.
  4.       The system should be extensible so that features can be added easily, and it should be easily de-buggable and require minimal maintenance.

Essentially, the Lambda Architecture comprises the following components, processes, and responsibilities are:

·         New Data: All data entering the system is dispatched to both the batch layer and the speed layer for processing.

·         Batch layer: This layer has two functions: (i) managing the master dataset, an immutable, append-only set of raw data, and (ii) to pre-compute arbitrary query functions, called batch views. Hadoop's HDFS is typically used to store the master dataset and perform the computation of the batch views using MapReduce.

·         Serving layer: This layer indexes the batch views so that they can be queried in ad hoc with low latency. To implement the serving layer, usually technologies such as Apache HBase or ElephantDB are utilized. The Apache Drill project provides the capability to execute full ANSI SQL 2003 queries against batch views.

·         Speed layer:This layer compensates for the high latency of updates to the serving layer, due to the batch layer. Using fast and incremental algorithms, the speed layer deals with recent data only. Storm is often used to implement this layer.
·         
     Queries: Last but not least, any incoming query can be answered by merging results from batch views and real-time vie.

Key Word: Lambda , Hadoop , Big Data 

No comments:

Post a Comment