Goal:
【Note:MKE is no longer available so below MapR Doc links are invalid. 】
【HPE Ezmeral Container Platform is where Kubernetes operators will be made available.】
MKE(MapR Kubernetes Ecosystem ) 1.0 has been released.
Basically it puts Spark and Drill into Kubernetes environment in this release.
Below is the architecture from the documentation Operators and Compute Spaces
Env:
MKE 1.0MapR 6.1 secured
MacOS with kubectl installed as the client
Solution:
Currently we already have one MapR 6.1 secured cluster running in GCE(Google Compute Engine).We just want to create a CSpace(Compute Space) in a Kubernetes Cluster which can access the existing MapR 6.1 secured cluster.
So the high-level steps are:
- Create a Kubernetes Cluster in GKE(Google Kubernetes Engine).
- Bootstrap the Kubernetes Cluster
- Create and Deploy External Info for CSpace
- Create a CSpace
- Run a Drill Cluster in CSpace
- Run a Spark Application in CSpace
1. Create a Kubernetes Cluster in GKE(Google Kubernetes Engine)
1.1 Create a Kubernetes cluster named "hao-cluster" in GKE
You can use GUI or gcloud commands.1.2 Fetch the credentials for the Kubernetes cluster
gcloud container clusters get-credentials hao-cluster --zone us-central1-aAfter that, make sure "kubectl cluster-info" returns correct cluster information.
This step is to make kubectl work and connect to the correct Kubernetes cluster.
1.3 Bind cluster-admin role to Google Cloud user
kubectl create clusterrolebinding user-cluster-admin-binding --clusterrole=cluster-admin --user=xxx@yyy.com
Note: "xxx@yyy.com" is the my Google Cloud user.
Here we grant cluster admin role to the user to avoid any permission error in the next step when we create MapR CSI ClusterRole and ClusterRoleBinding.
2. Bootstrap the Kubernetes Cluster
2.1 Download MKE github
git clone https://github.com/mapr/mapr-operators cd ./mapr-operators git checkout mke-1.0.0.0
2.2 Run the bootstrapinstall Utility
./bootstrap/bootstrapinstall.sh >>> Installing to an Openshift environment? (yes/no) [no]: >>> Install MapR CSI driver? (yes/no) [yes]: ... This Kubernetes environment has been successfully bootstrapped for MapR MapR components can now be created via the newly installed operators
2.3 Verify the PODs/DaemonSet/StatefulSet are running under namespace "mapr-csi"/"mapr-system"/"spark-operator"/"drill-operator"
kubectl get pods --all-namespacesMake sure all of the PODs are ready and in "Running" status.
3. Create and Deploy External Info for CSpace
Follow documentation: Automatically Generating and Deploying External Info for a CSpace3.1 Copy tools/gen-external-secrets.sh to one node of the MapR Cluster
gcloud compute scp tools/gen-external-secrets.sh scott-mapr-core-pvp1:/tmp/ chown mapr:mapr gen-external-secrets.sh
3.2 As the admin user (typically mapr), generate a user ticket
maprlogin password
3.3 Run gen-external-secrets.sh as the admin user(typically mapr)
/tmp/gen-external-secrets.sh ... The external information generated for this cluster are available at: mapr-external-secrets-hao.yaml Please copy them to a machine where you can run the following command: kubectl apply -f mapr-external-secrets-hao.yaml
3.4 Copy above generated mapr-external-secrets-hao.yaml to the kubectl client node
gcloud compute scp scott-mapr-core-pvp1:/home/mapr/mapr-external-secrets-hao.yaml /tmp/
3.5 Apply external secrets
kubectl apply -f /tmp/mapr-external-secrets-hao.yaml
4. Create a CSpace
Follow documentation: Creating a Compute Space4.1 Copy the sample CSpace CR
cp examples/cspaces/cr-cspace-full-gce.yaml /tmp/my_cr-cspace-full-gce.yaml
4.2 Modify the sample CSpace CR
At least, we need to modify the cluster name.vim /tmp/my_cr-cspace-full-gce.yaml
4.3 Apply CSpace CR
kubectl apply -f /tmp/my_cr-cspace-full-gce.yaml
4.4 Verify the PODs are ready and running in namespace "mycspace"
kubectl get pods -n mycspace -o wideHere are 3 PODs running:
CSpace terminal, Hive Metastore and Spark HistoryServer.
4.5 Logon one of the PODs to verify CSI is working fine and MapRFS is accessible
kubectl exec -ti hivemeta-f6d746f-n27h6 -n mycspace -- bash su - mapr maprlogin password hadoop fs -ls /
5. Run a Drill Cluster in CSpace
Follow documentation: Running Drillbits in Compute Spaces5.1 Copy the sample Drill CR
cp examples/drill/drill-cr-full.yaml /tmp/my_drill-cr-full.yaml
5.2 Modify the sample Drill CR
At least. we need to modify the name of CSpace.vim /tmp/my_drill-cr-full.yaml
5.3 Apply Drill CR
kubectl apply -f /tmp/my_drill-cr-full.yaml
5.4 Verify the Drillbit PODs are ready and running inside CSpace
kubectl get pods -n mycspace
5.5 Logon drillbit POD to check the health of Drill Cluster
kubectl exec -ti drillcluster1-drillbit-0 -n mycspace -- bash su - mapr maprlogin password /opt/mapr/drill/drill-1.16.0/bin/sqlline -u "jdbc:drill:zk=xxx:5181,yyy:5181,zzz:5181;auth=maprsasl" apache drill> select * from sys.drillbits; +-----------------------------------------------------------------------+-----------+--------------+-----------+-----------+---------+----------------+--------+ | hostname | user_port | control_port | data_port | http_port | current | version | state | +-----------------------------------------------------------------------+-----------+--------------+-----------+-----------+---------+----------------+--------+ | drillcluster1-drillbit-0.drillcluster1-svc.mycspace.svc.cluster.local | 21010 | 21011 | 21012 | 8047 | false | 1.16.0.10-mapr | ONLINE | | drillcluster1-drillbit-1.drillcluster1-svc.mycspace.svc.cluster.local | 21010 | 21011 | 21012 | 8047 | true | 1.16.0.10-mapr | ONLINE | +-----------------------------------------------------------------------+-----------+--------------+-----------+-----------+---------+----------------+--------+ 2 rows selected (2.228 seconds)
5.6 Access Drillbit UI
First option is to do portforwarding:kubectl port-forward --namespace mycspace $(kubectl get pod --namespace mycspace --selector="controller-revision-hash=drillcluster1-drillbit-57876df7bf,drill-cluster=drillcluster1,statefulset.kubernetes.io/pod-name=drillcluster1-drillbit-1" --output jsonpath='{.items[0].metadata.name}') 8080:8047And then open UI:
https://localhost:8080/
Second option is to use the service which is already exposed as LoadBalancer type:
$ kubectl get service -n mycspace | grep drillcluster1-web-svc drillcluster1-web-svc LoadBalancer 10.0.0.111 xxx.xxx.xxx.123 8047:31642/TCP,21010:30945/TCP 25hAnd then open UI:
https://xxx.xxx.xxx.123:8047
6. Run a Spark Application in CSpace
Follow documentation: Running Spark Applications in Compute Spaces6.1 Logon cspace terminal POD
kubectl port-forward -n mycspace cspaceterminal-bcdcf7bbb-p6227 7777:7777Note: the 2nd "7777" port is what you configured in CSpace CR file earlier, eg, /tmp/my_cr-cspace-full-gce.yaml:
$ grep sshPort /tmp/my_cr-cspace-full-gce.yaml sshPort: 7777Then ssh to the cspace terminal POD:
ssh mapr@localhost -p 7777
6.2 Create the user ticket for the Spark Application submitter
Follow documentation: Using the Ticketcreator Utility to Generate Secrets[mapr@cspaceterminal-bcdcf7bbb-p6227 ~]$ ticketcreator.sh Create a ticket for tenant user: [mapr]: Please provide 'mapr's password: [mapr]: uid=1002(mapr) gid=1003(mapr) groups=1003(mapr),0(root) Creating user ticket for mapr... MapR credentials of user 'mapr' for cluster 'gce1.cluster.com' are written to '/tmp/maprticket_1002' Please provide a name for your user secret: [mapr-user-secret-4030076998]: secret/mapr-user-secret-4030076998 created Please note secret name: mapr-user-secret-4030076998 for later use. Do you want to create a dynamic MapR Volume via CSI for storage of Spark secondary dependencies? This will create both a PVC and a PV. (y/n) [n]: y Provide the CSI PersistentVolumeClaim Name: [mapr-csi-pvc-2696334965]: persistentvolumeclaim/mapr-csi-pvc-2696334965 created Please note PVC name: mapr-csi-pvc-2696334965 for later use. Provide the CSI PersistentVolume Name: [mapr-csi-pv-2354307494]: persistentvolume/mapr-csi-pv-2354307494 created Please note PV name: mapr-csi-pv-2354307494 for later use.
6.3 Copy the sample Spark pi job CR
cp examples/spark/mapr-spark-pi.yaml /tmp/my_mapr-spark-pi.yaml
6.4 Modify the sample Spark pi job CR
vim /tmp/my_mapr-spark-pi.yamlAt least modify the CSpace name , spark.mapr.user.secret and serviceAccount.
6.5 Submit the spark pi job
kubectl apply -f /tmp/my_mapr-spark-pi.yaml
6.6 Verify the spark pi job is running
[mapr@cspaceterminal-bcdcf7bbb-p6227 ~]$ sparkctl list -n mycspace +----------+---------+----------------+-----------------+ | NAME | STATE | SUBMISSION AGE | TERMINATION AGE | +----------+---------+----------------+-----------------+ | spark-pi | RUNNING | 36s | N.A. | +----------+---------+----------------+-----------------+
6.7 View the log for the spark pi job
On the CSpace terminal POD using sparkctl:sparkctl log spark-pi -n mycspaceOR
On the kubectl client node using kubectl:
kubectl logs spark-pi-driver -n mycspace
6.8 Access Spark HistoryServer UI
Use the service which is already exposed as LoadBalancer type:$ kubectl get service -n mycspace | grep sparkhs-svc sparkhs-svc LoadBalancer 10.0.0.222 yyy.yyy.yyy.230 18480:31507/TCP 26hAnd then open UI:
https://yyy.yyy.yyy.230:18480
==
No comments:
Post a Comment