How to check if Spark job runs out of quota in CSpace

Goal:

How to check if Spark job runs out of quota in CSpace.

Env:

MKE 1.0

Solution:

The example configuration file for CSpace based on MKE 1.0 version has below 3 default PODs:

terminal
hivemetastore
sparkhs

Each of them needs 2 CPUs + 8G memory.
This information is inside:

git clone https://github.com/mapr/mapr-operators
cd ./mapr-operators
git checkout mke-1.0.0.0
cat examples/cspaces/cr-cspace-full-gce.yaml

  cspaceservices:
    terminal:
      count: 1
      image: cspaceterminal-6.1.0:201912180140
      sshPort: 7777
      requestcpu: "2000m"
      requestmemory: 8Gi
      logLevel: INFO
    hivemetastore:
      count: 1
      image: hivemeta-2.3:201912180140
      requestcpu: "2000m"
      requestmemory: 8Gi
      logLevel: INFO
    sparkhs:
      count: 1
      image: spark-hs-2.4.4:201912180140
      requestcpu: "2000m"
      requestmemory: 8Gi
      logLevel: INFO

So when we are calculating how much resources are available for other ecosystems like Spark and Drill, we need to take those resource into consideration.

How to check if the Spark job is running out of quota in CSpace?
We need to get the Spark driver log using below commands:
Take pi job for example:

kubectl logs spark-pi-driver -n mycspace
or
sparkctl log spark-pi  -n mycspace

Here are 3 scenarios at least:

1. No nodes in Kubernetes cluster have sufficient resources

For example, if the CSpace quota has 50 CPUs, and no any other PODs running besides the 3 default PODs.
We still have 50-6=44 CPUs available for running one Spark job.
If the Spark driver only needs 1 CPU, then we still have 43 CPUs available for Spark executors.
For below definition in the Spark job YAML file:

  executor:
    cores: 20
    instances: 2
    memory: "1024m"
    labels:
      version: 2.4.4

I need to start 2 Spark executors with 20 CPUs each.

Symptom:
The requirement(40 CPUs) is below the available quota(43 CPUs), however it may hit below error from Spark driver log:

WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Troubleshooting:
2 Spark executor PODs are pending forever.

$ kubectl get pods -n mycspace
NAME                             READY   STATUS    RESTARTS   AGE
spark-pi-1581449230742-exec-1    0/1     Pending   0          17m
spark-pi-1581449230742-exec-2    0/1     Pending   0          16m
spark-pi-driver                  1/1     Running   0          17m
...

"kubectl describe executor-POD" tells the reason why they are pending:

$ kubectl describe pod spark-pi-1581449230742-exec-1 -n mycspace
Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  2m24s (x29 over 17m)  default-scheduler  0/3 nodes are available: 3 Insufficient cpu.

Basically it means on any nodes have sufficient resources.
This can be confirmed by below commands:

$ kubectl describe node
...
Allocatable:
 attachable-volumes-csi-com.mapr.csi-kdf:  20
 attachable-volumes-gce-pd:                127
 cpu:                                      15890m
 ephemeral-storage:                        47093746742
 hugepages-2Mi:                            0
 memory:                                   56288592Ki
 pods:                                     110
...

Root Cause:
In this Kubernetes cluster, we have 3 nodes.
The most empty node can allocate 15.89 CPUs at most, which is less than 20 CPUs request.

2. Spark executors run out of quota of CSpace

For example, if the CSpace quota has 10 CPUs, and no any other PODs running besides the 3 default PODs.
We still have 10-6=4 CPUs available for running one Spark job.
If the Spark driver only need 1 CPUs, then we still have 3 CPUs available for Spark executors.
For below definition in the Spark job YAML file:

  driver:
    cores: 1
    coreLimit: "1000m"
    memory: "1024m"
    labels:
      version: 2.4.4
    serviceAccount: mapr-mycspace-cspace-sa
  executor:
    cores: 2
    instances: 2
    memory: "1024m"
    labels:
      version: 2.4.4

I need to start 2 Spark executors with 2 CPUs each.

Symptom:
The requirement(4 CPUs) is above the available quota(3 CPUs), it may show below error from Spark driver log:

ERROR util.Utils: Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc/api/v1/namespaces/mycspace/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-1581464839789-exec-3" is forbidden: exceeded quota: mycspacequota, requested: cpu=2, used: cpu=9, limited: cpu=10.

However the job can still completes because it will put both tasks in one Spark executor.
The SparkHistoryServer should show below from "Executors" tab:

If we reduce the CPU requirement for each Spark executor to 1 from 2, SparkHistoryServer should show below as a comparison:

3. Spark driver run out of quota of CSpace

For example, if the CSpace quota has 10 CPUs, and no any other PODs running besides the 3 default PODs.
We still have 10-6=4 CPUs available for running one Spark job.
If the Spark driver need 5 CPU then it is already above the available quota.
For below definition in the Spark job YAML file:

  driver:
    cores: 5
    coreLimit: "5000m"
    memory: "1024m"
    labels:
      version: 2.4.4
    serviceAccount: mapr-mycspace-cspace-sa

I need to start 1 Spark driver with 5 CPUs.

Symptom:
The spark job will fail by checking their status using sparkctl:

$ sparkctl list -n mycspace
+----------+--------+----------------+-----------------+
|   NAME   | STATE  | SUBMISSION AGE | TERMINATION AGE |
+----------+--------+----------------+-----------------+
| spark-pi | FAILED | 1m             | N.A.            |
+----------+--------+----------------+-----------------+

Troubleshooting:
No driver log is generated yet:

$ kubectl logs spark-pi-driver -n mycspace -f |tee /tmp/sparkjob.txt
Error from server (NotFound): pods "spark-pi-driver" not found

This is because even Spark driver POD is not started yet:

$ kubectl get pods -n mycspace
NAME                             READY   STATUS    RESTARTS   AGE
cspaceterminal-bcdcf7bbb-f68r9   1/1     Running   0          5h18m
hivemeta-f6d746f-jq5rj           1/1     Running   0          5h18m
sparkhs-667f46dcfd-24k86         1/1     Running   0          5h18m

"kubectl describe sparkapplication" should show the reason:

$ kubectl describe sparkapplication spark-pi -n mycspace
...
Application State:
    Error Message:  failed to run spark-submit for SparkApplication mycspace/spark-pi: 20/02/11 23:59:59 ERROR deploy.SparkSubmit$$anon$2: Failure executing: POST at: https://10.0.32.1/api/v1/namespaces/mycspace/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-driver" is forbidden: exceeded quota: mycspacequota, requested: cpu=5, used: cpu=6, limited: cpu=10.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://10.0.32.1/api/v1/namespaces/mycspace/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-driver" is forbidden: exceeded quota: mycspacequota, requested: cpu=5, used: cpu=6, limited: cpu=10.
...

Root Cause:
Spark driver POD could not be started because it is out of quota of CSpace already.

Tuesday, February 11, 2020