Using AWS lambda for cheap S3 content processing

TL/DR: If you use Amazon S3 to store user generated content, their new service, AWS Lambda, can be easily setup to intelligently post-process objects by calling out third party services and even handle callbacks (using API gateway) - all for a price point that it's really hard to beat.

In this example we're going to:

*Albeit the example above uses content processing service you could easily swap that service out for anything with a sane API, for example, you could use AWS lambda to automatically OCR your S3 objects using Google's cloud vision API.

In a nutshell, here's how things will look:


Protip: If you are familiar with AWS Lambda and S3, feel free to skip this section.

If you know me, you know that I believe S3 is probably one of the most impactful pieces of technology infrastructure of the last decade and no, its not due to its price point and novelty factor (netapp built an entire business around storing blobs of data in the 90's), but because of how ubiquitous it is, having something effectively omnipresent that you can use to store your files is just too convenient, that's why just about every company use it and I would venture to say that, if AWS were to publish these numbers, we would see that S3 is probably their biggest product in terms of penetration.

Inevitably, once all your files are being stored in S3, you are going to want to do something with them and, until recently, you had to write/deploy/manage your own application to do so, but not anymore, the clever folks at Amazon have a better solution for us, lambda functions.

Lambda functions are snippets of code that run based upon event triggers, these triggers can be an API call flowing through AWS API gateway, changes to an object in S3 or many others. Under normal circumstances, AWS lambda can also be extremely cost effective since, instead of paying per cpu/hours like a regular virtual server, you pay for the number of times your function is called and the amount of CPU time it consumes rounded up to the nearest 100ms.

More importantly, as of this writing, AWS lambda comes in with a very generous free tier covering 1M requests and 400k GB-seconds of compute time per month that should be more than enough for most users (details here).

Setting things up

For the lambda function code we will utilize scanii's lambda sample code ( that automatically deletes the content from S3 if malware is found and can be easily extended to do perform other operations. For this example you will also need a free API key from, if you don't have and API key yet, you can quickly create one here.

IAM Role

First we need to create a IAM role that grants our lambda function access to S3 (we'll start with read only access only for now) and basic lambda execution rights (so we can save logs to Cloudwatch logs).

  1. Login to the AWS console Identity and access management page
  2. Click Roles
  3. Select "Create New Role"
  4. Give your role a name, from this example we'll use "scanii-lambda-role" and click "next step"
  5. Under Select Role Type choose "AWS Lambda" and click "next step"
  6. Under Attach Policy choose "AmazonS3ReadOnlyAccess" and " AWSLambdaBasicExecutionRole" then click "next step"
  7. Finally select "Create Role"

Lambda function

Now we need to actually setup the lambda function and its events sources.

  1. Login to the AWS console lambda page and click "Get Started Now" (or "Create Lambda function" if you already have existing functions)
  2. Under Select Blueprint select "Blank function"
  3. Under Configure Triggers
    1. Just click next for now, we'll come back to this later
  4. Under Configure Function
    1. Fill in the name, description and select Node.js 4.3 for the runtime, for this example, we're calling our function scanii-process-content
    2. Copy and paste the contents of the sample function into the code window
    3. Update the credentials in the sample code with your own API key and secret created above, under SCANII_CREDS in the format KEY:SECRET - leave CALLBACK_URL as is for now, we'll come back to it later
  5. Under Lambda function handler and role
    1. Leave index.handler for the handler function
    2. Under Role select "scanii-lambda-role" (the role we created earlier)
  6. Under Advanced settings bump up the timeout to 30 seconds
  7. Click Next, review that everything checks out and hit "Create function"
  8. Now hang tight on that page, all we have left to do is configure event sources.

Configuring event sources (aka triggers)

Now that we have a lambda function with a IAM role ready to go, we need to configure when and how that function should run.

  1. Under the Triggers tab
    1. Click "Add trigger"
    2. Click on silly looking dashed square and select S3
    3. Under Bucket select the bucket you would like to have objects processed
    4. Under Event type select "Object Created (All)"
    5. Leave prefix and suffix empty unless you have specific need to restrain processing
    6. Click "Submit"

Now we basically do the same to configure the lambda function to fire when our callback is called

  1. Under the Triggers tab
    1. Click "Add trigger"
    2. Click on silly looking dashed square and select API Gateway
    3. Under API name enter "ScaniiLambda" so we can easily reference it later
    4. Under Deployment stage select prod
    5. Under Security select "Open" (but our code will enforce authorization)
    6. Click "Submit" and take note of the API endpoint URL
  2. Under the Code tab
    1. Edit the CALLBACK_URL to point to the API endpoint URL from the previous step
  3. Click "Save" and your are done!

Processing content

Now that you have your lambda function and API gateway callback setup you can start adding content to the bucket you chose as the event source and you should see content being submitted for processing. Keep in mind:

Enabling S3 object deletion

You might remember that when we set up the IAM role for our lambda function we only gave it read access to s3. Albeit that's a good (and safe) place to start, once you are comfortable with your lambda setup you can modify the role to grant it delete rights to the bucket you've setup for processing, that way, our sample code will automatically delete objects with malware findings (details here).

  1. Login to the AWS console Identity and access management page
  2. Click Roles
  3. Click on the role you created above
  4. Under Permissions click on "Inline Policies" and create a new one
  5. Under Set Permissions select "Custom Policy" and "Select"
  6. Under Review Policy paste the sample policy below adjusting the bucket name accordingly and click "Validate Policy"
  7. Once the policy is validated click on "Apply Policy"

Sample policy granting object delete rights on bucket "scanii-test"

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Action": [
      "Resource": [

Now, for a final and glorious test, copy a known malicious file to your s3 bucket and watch it automatically disappear after a few seconds. Don't have a known malicious file handy? You can download our sample EICAR file here

That's all folks, I hope you enjoy everything you see here and if you have any questions/comments please reach out to us at



Last updated on 02/20/2016.