Thursday, September 15, 2016

Using Spark job to upload files to AWS S3 with Server Side Encryption enabled


This article shows an example java code for:
Using Spark job to upload files to AWS S3 with Server Side Encryption enabled


MapR 5.1 with Hadoop 2.7.0(with aws-java-sdk-1.7.4.jar shipped together)
Spark 1.5.2


1. Download my source code from github

git clone
Please note that in AWS SDK 1.7.4, to enable SSE feature, the method "setServerSideEncryption" in java class "ObjectMetadata" should be used:
In the later version of AWS SDK, say 1.7.15, this method was replaced by method "setSSEAlgorithm":

So please make sure you are using the right method in right AWS SDK version, otherwise, you may trigger "NoSuchMethod" error.

2. Compile using maven

mvn clean package
Please note that in pom.xml, I am using aws java sdk 1.7.4 as dependency because Hadoop 2.7.0 also ships with the same version -- aws-java-sdk-1.7.4.jar.
This is to make sure the libs used by spark application are in sync with Hadoop cluster:

3. Run the spark job

/opt/mapr/spark/spark-1.5.2/bin/spark-submit \
  --class example.uploads3.UploadS3 \
  --master yarn \
  /mapr/ \
This sample job will upload the data.txt to S3 bucket named "haos3" with key name "test/byspark.txt".

4. Confirm that this file will be SSE encrypted.

Check AWS S3 web page, and click "Properties" for this file, we should see SSE enabled with "AES-256" algorithm:


  1. In case you are new to making sites the facilitating supplier's client care is very significant.

  2. What is email hosting and what can it do for you? The main thing achieve is the ability to have an email address with your own domain name.


Popular Posts