Securing Hadoop with Kerberos

Lately I was working on making our 10 node Hadoop cluster secure behind Kerberos. The cluster consists of CentOS 6.4 servers with CDH 5.0 distribution. Instead of using our company wide Active Directory running on Windows (didn’t want to mess up data for 32,000 users), I decided to run Kerberos Key Distriburtion Center (KDC) and realm local to our hadoop cluster. The aim was to configure CDH to work with this local KDC and create all Hadoop related service principals in this realm initially, followed by creating a unidirectional cross-realm trust between this local KDC and company wide Active Directory.

Cloudera provides excellent documentation regarding how to secure a Hadoop cluster. What was missing was detailed information regarding how to set up KDC on CentOS and its configuration, which I intend to cover in this blog.

For this document, let us assume that our organization Active Directory realm is COMPANYNYC.COM and realm for local dedicated Kerberos KDC being used for hadoop cluster is HADOOP-DEV.COMPANYNYC.COM. Another assumption is that our organization level AD is running on serveradserver.companynyc.com and Kerberos KDC is being installed on machine nyclpsitest001.companynyc.com. The last assumption is that ten machines running CentOS 6.2 being used for our hadoop cluster have FQDN nyclpsitest001.companynyc.com, nyclpsitest002.companynyc.com,nyclpsitest003.companynyc.com all the way till nyclpsitest010.companynyc.com.

To install KDC on CentOS, I followed the guide available at http://major.io/2012/02/05/the-kerberos-haters-guide-to-installing-kerberos/. Although this guide uses two CentOS 5.5 machines, the instructions worked well for CentOS 6.2. A couple of helpful notes:
– The command to manually set the NIS domain is ‘nisdomainname’ not ‘nisdomain’
– While setting up the new KDC, use HADOOP-DEV.COMPANYNYC.COM for the realm.
– You might get errors with the ‘make -C /var/yp’ command but I don’t think they’ll matter on this install.

After you have successfully set up this dedicated Kerberos KDC, you will need to configure /etc/krb5.conf on all machines in your cluster to specify default realm for hadoop cluster as well as specifying the local dedicated KDC server host name. For our 10 node cluster, the krb5.conf looked something like below:

—————————————————Contents of krb5.conf—————————————————-
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log

[libdefaults]
default_realm = HADOOP-DEV.COMPANYNYC.COM
dns_lookup_realm = true
dns_lookup_kdc = true
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true

[realms]
// realm for local dedicated KDC for Hadoop cluster
HADOOP-DEV.COMPANYNYC.COM = {
kdc = nyc1lpsitest001.companynyc.com
admin_server = nyc1lpsitest001.companynyc.com
default_domain = hadoop-dev.companynyc.com
}

// realm for organization wide Active directory
COMPANYNYC.COM = {
kdc = adserver.companynyc.com:88
admin_server = adserver.companynyc.com
default_domain = companynyc.com
}

[domain_realm]
nyc1lpsitest001.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
nyc1lpsitest002.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
nyc1lpsitest003.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
nyc1lpsitest004.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
nyc1lpsitest005.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
nyc1lpsitest006.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
nyc1lpsitest007.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
nyc1lpsitest008.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
nyc1lpsitest009.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
nyc1lpsitest010.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
hadoop-dev.companynyc.com = HADOOP-DEV.COMPANYNYC.COM
adserver = COMPANYNYC.COM

[kdc]
profile = /var/kerberos/krb5kdc/kdc.conf

[appdefaults]
pam = {
validate = true
debug = false
ticket_lifetime = 36000
renew_lifetime = 36000
forwardable = true
krb4_convert = false
}
————————————————————————————————————————————
At this point, after setting up local KDC and configuring krb5.conf for all machines in the cluster, you are ready to secure your CHD hadoop cluster following detailed instructions provided by Cloudera at https://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-3-1/CDH4-Security-Guide/cdh4sg_topic_3.html.

Leave a comment