My last post was about a set of Ansible playbooks that can be used to deploy a TLS-secured Docker Swarm. Our goal with our first production Swarm was to deploy a NiFi cluster for a few of our big data ingestion projects. Fortunately, one of the Hortonworks engineers had already developed a great base repository to deploy NiFi or Hortonworks Data Flow (HDF) on a Swarm. This repository can be easily deployed on top of our Ansible-deployed Swarm. To get up and running quickly, deploy the Swarm and download the docker-compose.yml file.
- cd nifi-cluster
- docker-compose –H SWARM-MANAGER-ADDR:3376 –tls –tlscacert /var/docker/certs/ca.pem –tlscert /var/docker/certs/servercert.pem –tlskey /var/docker/certs/serverkey.pem pull
- docker-compose –H SWARM-MANAGER-ADDR:3376 –tls –tlscacert /var/docker/certs/ca.pem –tlscert /var/docker/certs/servercert.pem –tlskey /var/docker/certs/serverkey.pem up –d
The first command will pull the Docker image and the second will deploy a node manager, worker node, and acquisition node. Additional information about the compose file can be found in the original repository readme. Depending on your use case, the acquisition node may not be necessary. This library worked very well, except for an inability to restart nodes due to a config file setting (we have submitted a pull request to patch this). While great for development, the other issue was that we required a Kerberos-based login for the nodes.
We used the docker-nifi base library and extended it to include the Kerberos client libraries and user login modules. This fork can be found at: https://github.com/wadeschulz/docker-nifi-kerberos
To use docker-nifi-kerberos, users need to build their own Docker image so that the krb5.conf file can be included in the image (this could be updated in the future to allow a volume mount). First, clone the repository and copy your krb5.conf file to the docker subdirectory. Build and push your image to a Docker repository, then configure the docker-compose.yml file.
- Since we were targeting a high-throughput environment, environmental can be used to set the min/max Java heap sizes to prevent NiFi out of memory errors and garbage collection issues. Recommendations for moderate size systems are 4-8GB+.
- Several volumes were added to each node to map host volumes containing Kerberos keytab files, HDP configuration files, and any scripts that need to be NiFi-accessible. These files need to be available on any swarm node that the NiFi nodes could be deployed to. We do an Ansible-based deployment to keep these folder in sync.
- KRB_REALM is used to set the default realm and NIFI_ADMIN must be set for the first NiFi interface user.
- NIFI_KEY_PASS and certificates are needed to enable SSL. If you try to launch containers without certificates, you will get a “unexpectedly closed the connection” or similar message from your browser. To generate certificates, go to https://www.tinycert.org, create a certificate authority, and then a certificate request. Download the PKCS12 archive and place it in the volume mapped to /etc/security/nifi/certs. When using tinycert.org, the PKCS12 is automatically password-protected with your tincert.org account password. This can be set in NIFI_KEY_PASS or the PKCS12 archive updated with a new password.
After completing these steps, be sure to update the docker-compose file to point to your new Docker repository/image (currently lines 11, 34, and 53 of docker-compose.yml – replace aperepel/nifi with your image name). Then repeat the pull and up commands to launch your new, Kerberos-aware cluster.