Friday, January 31, 2020

Hands-on MKE(MapR Kubernetes Ecosystem ) 1.0 release


【Note:MKE is no longer available so below MapR Doc links are invalid. 】

HPE Ezmeral Container Platform is where Kubernetes operators will be made available.】

MKE(MapR Kubernetes Ecosystem ) 1.0 has been released.
Basically it puts Spark and Drill into Kubernetes environment in this release.
Below is the architecture from the documentation Operators and Compute Spaces

This article shares the step-by-step commands used to install and configure a MKE 1.0 env.


MKE 1.0
MapR 6.1 secured
MacOS with kubectl installed as the client


Currently we already have one MapR 6.1 secured cluster running in GCE(Google Compute Engine).
We just want to create a CSpace(Compute Space) in a Kubernetes Cluster which can access the existing MapR 6.1 secured cluster.
So the high-level steps are:
  1. Create a Kubernetes Cluster in GKE(Google Kubernetes Engine).
  2. Bootstrap the Kubernetes Cluster
  3. Create and Deploy External Info for CSpace
  4. Create a CSpace
  5. Run a Drill Cluster in CSpace
  6. Run a Spark Application in CSpace

1. Create a Kubernetes Cluster in GKE(Google Kubernetes Engine)

1.1 Create a Kubernetes cluster named "hao-cluster" in GKE

You can use GUI or gcloud commands.

1.2 Fetch the credentials for the Kubernetes cluster

gcloud container clusters get-credentials hao-cluster --zone us-central1-a
After that, make sure "kubectl cluster-info" returns correct cluster information.
This step is to make kubectl work and connect to the correct Kubernetes cluster.

1.3 Bind cluster-admin role to Google Cloud user

kubectl create clusterrolebinding user-cluster-admin-binding --clusterrole=cluster-admin

Note: "" is the my Google Cloud user.
Here we grant cluster admin role to the user to avoid any permission error in the next step when we create MapR CSI ClusterRole and ClusterRoleBinding. 

2. Bootstrap the Kubernetes Cluster

2.1 Download MKE github

git clone
cd ./mapr-operators
git checkout mke-

2.2 Run the bootstrapinstall Utility

>>> Installing to an Openshift environment? (yes/no) [no]:
>>> Install MapR CSI driver? (yes/no) [yes]:
This Kubernetes environment has been successfully bootstrapped for MapR
MapR components can now be created via the newly installed operators

2.3 Verify the PODs/DaemonSet/StatefulSet are running under namespace "mapr-csi"/"mapr-system"/"spark-operator"/"drill-operator"

kubectl get pods --all-namespaces
Make sure all of the PODs are ready and in "Running" status.

3. Create and Deploy External Info for CSpace

Follow documentation: Automatically Generating and Deploying External Info for a CSpace

3.1 Copy tools/ to one node of the MapR Cluster

gcloud compute scp tools/ scott-mapr-core-pvp1:/tmp/
chown mapr:mapr

3.2 As the admin user (typically mapr), generate a user ticket

maprlogin password

3.3 Run as the admin user(typically mapr)

The external information generated for this cluster are available at: mapr-external-secrets-hao.yaml
Please copy them to a machine where you can run the following command:
  kubectl apply -f mapr-external-secrets-hao.yaml

3.4 Copy above generated mapr-external-secrets-hao.yaml to the kubectl client node

gcloud compute scp scott-mapr-core-pvp1:/home/mapr/mapr-external-secrets-hao.yaml /tmp/

3.5 Apply external secrets

kubectl apply -f /tmp/mapr-external-secrets-hao.yaml

4. Create a CSpace

Follow documentation: Creating a Compute Space

4.1 Copy the sample CSpace CR

cp examples/cspaces/cr-cspace-full-gce.yaml /tmp/my_cr-cspace-full-gce.yaml

4.2 Modify the sample CSpace CR

At least, we need to modify the cluster name.
vim /tmp/my_cr-cspace-full-gce.yaml

4.3 Apply CSpace CR

kubectl apply -f /tmp/my_cr-cspace-full-gce.yaml

4.4 Verify the PODs are ready and running in namespace "mycspace"

kubectl get pods -n mycspace -o wide
Here are 3 PODs running:
CSpace terminal, Hive Metastore and Spark HistoryServer.

4.5 Logon one of the PODs to verify CSI is working fine and MapRFS is accessible

kubectl exec -ti hivemeta-f6d746f-n27h6 -n mycspace -- bash
su - mapr
maprlogin password
hadoop fs -ls /

5. Run a Drill Cluster in CSpace

Follow documentation: Running Drillbits in Compute Spaces

5.1 Copy the sample Drill CR

cp examples/drill/drill-cr-full.yaml /tmp/my_drill-cr-full.yaml

5.2 Modify the sample Drill CR

At least. we need to modify the name of CSpace.
vim /tmp/my_drill-cr-full.yaml

5.3 Apply Drill CR

kubectl apply -f /tmp/my_drill-cr-full.yaml

5.4 Verify the Drillbit PODs are ready and running inside CSpace

kubectl get pods -n mycspace

5.5 Logon drillbit POD to check the health of Drill Cluster

kubectl exec -ti drillcluster1-drillbit-0 -n mycspace -- bash
su - mapr
maprlogin password

/opt/mapr/drill/drill-1.16.0/bin/sqlline -u "jdbc:drill:zk=xxx:5181,yyy:5181,zzz:5181;auth=maprsasl"
apache drill> select * from sys.drillbits;
|                               hostname                                | user_port | control_port | data_port | http_port | current |    version     | state  |
| drillcluster1-drillbit-0.drillcluster1-svc.mycspace.svc.cluster.local | 21010     | 21011        | 21012     | 8047      | false   | | ONLINE |
| drillcluster1-drillbit-1.drillcluster1-svc.mycspace.svc.cluster.local | 21010     | 21011        | 21012     | 8047      | true    | | ONLINE |
2 rows selected (2.228 seconds)

5.6 Access Drillbit UI

First option is to do portforwarding:
kubectl port-forward --namespace mycspace $(kubectl get pod --namespace mycspace --selector="controller-revision-hash=drillcluster1-drillbit-57876df7bf,drill-cluster=drillcluster1," --output jsonpath='{.items[0]}') 8080:8047
And then open UI:

Second option is to use the service which is already exposed as LoadBalancer type:
$ kubectl get service -n mycspace | grep drillcluster1-web-svc
drillcluster1-web-svc               LoadBalancer   8047:31642/TCP,21010:30945/TCP   25h
And then open UI:

6. Run a Spark Application in CSpace

Follow documentation: Running Spark Applications in Compute Spaces

6.1 Logon cspace terminal POD

kubectl port-forward -n mycspace cspaceterminal-bcdcf7bbb-p6227 7777:7777
Note: the 2nd "7777" port is what you configured in CSpace CR file earlier, eg, /tmp/my_cr-cspace-full-gce.yaml:
$ grep sshPort /tmp/my_cr-cspace-full-gce.yaml
      sshPort: 7777
Then ssh to the cspace terminal POD:
ssh mapr@localhost -p 7777

6.2 Create the user ticket for the Spark Application submitter

Follow documentation: Using the Ticketcreator Utility to Generate Secrets
[mapr@cspaceterminal-bcdcf7bbb-p6227 ~]$
Create a ticket for tenant user: [mapr]:
Please provide 'mapr's password: [mapr]:
uid=1002(mapr) gid=1003(mapr) groups=1003(mapr),0(root)
Creating user ticket for mapr...
MapR credentials of user 'mapr' for cluster '' are written to '/tmp/maprticket_1002'

Please provide a name for your user secret: [mapr-user-secret-4030076998]:
secret/mapr-user-secret-4030076998 created
Please note secret name: mapr-user-secret-4030076998 for later use.

Do you want to create a dynamic MapR Volume via CSI for storage of Spark secondary dependencies?
This will create both a PVC and a PV. (y/n) [n]: y
Provide the CSI PersistentVolumeClaim Name: [mapr-csi-pvc-2696334965]:
persistentvolumeclaim/mapr-csi-pvc-2696334965 created
Please note PVC name: mapr-csi-pvc-2696334965 for later use.

Provide the CSI PersistentVolume Name: [mapr-csi-pv-2354307494]:
persistentvolume/mapr-csi-pv-2354307494 created
Please note PV name: mapr-csi-pv-2354307494 for later use.

6.3 Copy the sample Spark pi job CR

cp examples/spark/mapr-spark-pi.yaml /tmp/my_mapr-spark-pi.yaml

6.4 Modify the sample Spark pi job CR

vim /tmp/my_mapr-spark-pi.yaml
At least modify the CSpace name , spark.mapr.user.secret and serviceAccount.

6.5 Submit the spark pi job

kubectl apply -f /tmp/my_mapr-spark-pi.yaml

6.6 Verify the spark pi job is running

[mapr@cspaceterminal-bcdcf7bbb-p6227 ~]$ sparkctl list -n mycspace
| spark-pi | RUNNING | 36s            | N.A.            |

6.7 View the log for the spark pi job

On the CSpace terminal POD using sparkctl:
sparkctl log spark-pi  -n mycspace
On the kubectl client node using kubectl:
kubectl logs spark-pi-driver -n mycspace

6.8 Access Spark HistoryServer UI

Use the service which is already exposed as LoadBalancer type:
$ kubectl get service -n mycspace | grep sparkhs-svc
sparkhs-svc                         LoadBalancer   yyy.yyy.yyy.230   18480:31507/TCP                  26h
And then open UI:


No comments:

Post a Comment

Popular Posts