Monday, 15 August 2016

Hadoop Operational Security Architecture & Implementation

Hadoop Operational Security Architecture:
By default Hadoop runs in non-secure mode there is no security so we need to configure Hadoop security model. Considering a typical Hadoop cluster with Hadoop ecosystem tool with day-to-day operations, how can we secure our Hadoop clusters and data? Let’s discuss this in detail.




Hadoop Cluster Operational Security Architecture


Cluster Security:
Hadoop Cluster Security, indeed any cluster security can be achieved by authentication, authorization, encryption, key management, and logging. In a Hadoop cluster, by configuring secure mode, each user and service needs to be authenticated by Kerberos.

Authentication:
Kerberos: Kerberos is an authentication server which identifies client's identity. It ensures that client’s password is encrypted and transferred through the network to authenticate. We can integrate existing LADP/AD authentication systems with Kerberos. Kerberos is an essential step for user authentication, but it is not sufficient in itself as it lacks the ability to hide cluster entry points and block access at the perimeter. Apache Knox is built to bridge these gaps with Perimeter Level Security.

Knox: The Apache Knox Gateway is a REST API Gateway for interacting with Apache Hadoop clusters. Knox provides authorization, authentication, SSL, and SSO capabilities to enable a single access point for Hadoop.

Authorization & Audits:
Ranger: The Apache Ranger providing a framework for central administration of security policies and monitoring of user access, log audits, key management and fine grained data access policies across HDFS, Hive, YARN, Solr, Kafka and other modules in Hadoop Cluster.


Data Protection:
HDFS Encryption: HDFS offers encryption on data in transit/rest. HDFS supports encrypting network traffic as data flows into and through the Hadoop cluster over RPC, HTTP, Data Transfer Protocol (DTP) and JDBC. Network traffic over each of these protocols can be encrypted to provide privacy for data movement. Data on rest are encrypted Hadoop Key Management Service (KMS) or integrated with third party key management.

Here are the high level point to install Kerberos and integrating it in to Hadoop cluster using Apache Ambari:
1.       Install KDC Server: yum install krb5-server krb5-libs krb5-workstation

2.       Edit /etc/krb5.conf and change the realm and copy it over to /var/lib/ambari-server/resources/scripts/krb5.conf


3.       Create KDC Database:
kdb5_util create –s

4.       Start KDC services
/etc/rc.d/init.d/krb5kdc start
/etc/rc.d/init.d/kadmin start

5.       Create admin user:
kadmin.local -q "addprinc admin/admin"

6.       Edit KDC acl:
vi /var/kerberos/krb5kdc/kadm5.acl and edit as per krb4.conf for example */admin@kpi.COM *

7.       Restart KDC admin service:
/etc/rc.d/init.d/kadmin restart


8.       Now connect to Ambari admin and enable Kerberos:



9.       Select ‘Existing MIT KDC’ since we have already configured Kerberos.


10.   Enter your KDC details:




11.   Download CSV to verify for which Kerberos is enabled: 








  
HDFS Directory/File Security through Access Control List (ACL):

To Setup ACL we need to enable ACL in hdfs-site.xml by setting up dfs.namenode.acls.enabled property with ‘true’.



 1.       Granting access to another User:

hdfs dfs –setfacl –m user:kpi:rwx /kpi_dw

To check the access control by running hdfs dfs -getfacl /kpi_dw



       2.                 Granting access to another group:

hdfs dfs -setfacl -m group:kpi_group:r-- /kpi_dw 



To check the access control by running hdfs dfs -getfacl /kpi_dw


 3.       ACL with automatic replication to its Childs:
hdfs dfs -setfacl -m default:group:kpi_group:r-x /kpi_dw




To check the access control by running
a.       hdfs dfs -mkdir /kpi_dw/abc_project
b.       hdfs dfs -getfacl /kpi_dw/abc_project


 4.       Removing/Blocking access:

hdfs dfs -setfacl -m user:kpi:--- /kpi_dw
To check the access control by running hdfs dfs -getfacl /kpi_dw


 The rest of the security implementation will be available soon. 

Key Words: MAVEN , Maven , mvn , SPARK  , Spark, spark , Scala , scala , SCALA . Eclipse , eclipse , ECLIPSE , Lambda , Hadoop , Big Data .

1 comment:

  1. Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop Administration Online Training Bangalore

    ReplyDelete