How do I analyze content stored on Amazon S3?

Uva Software's Scanii integrates easily with  Amazon Web Services's Simple Storage Service (S3) via a one-click integration using our SAM packaged scanii-lambda application

How it works

This is, essentially, a series of lambda functions packaged in a  one-click deployable application that configures everything needed so your S3 objects are submitted automatically to scanii’s content analysis API.  That includes a lambda function that submit files in S3 for processing by our API service and another function that receive callback events and takes appropriate action depending on findings being present or not. 

Currently,  you can choose from a couple of different actions:

1)  Tag the S3 object, this is defaulted to on and adds the following AWS tags to objects processed: 

Tag name Tag purpose
ScaniiId  the resource id of the processed content
ScaniiFindings  list of identified findings (content engine dependent) 
ScaniiContentType  the identified content type of the file processed 

2)  Delete the S3 object with findings - this is defaulted to off and will delete S3 objects with findings (such as malware or NSFW content) - for a full list of available content identification see https://docs.scanii.com/article/149-how-do-the-different-detection-engines-work.

Deploying

Step 1 - Deploying our S3 integration to your AWS account

👉🏽 Deploy our application to your account by clicking  here

For that you will need 3 things: 

  1. An active Amazon Web Services account 
  2. Your scanii API credentials, If you don't already have one, create a scanii.com account and API key
  3. The name of the bucket you would like to monitor for events

Step 2 - Enable events from your S3 bucket to our integration 

Due to a quirk with the way that SAM applications work, our application cannot automatically wire itself into object creation events for your S3 bucket, so, the last step in setting up our integration is setting up those events. For that you need to: 

1) Log into the AWS Lambda Console and click on the function called  serverlessrepo-UvaSoftware-Scanii-Lambda-Submit(this is the function that submits content to scanii.com for processing). Keep in mind that your exact function name could be different, in essence it's composed of: "servelessrepo-${SAM APPLICATION NAME}-Submit". 

2) Under "Add triggers" select S3 and fill in the bucket information such as the bucket name and optional prefix path, the important thing is to leave under "Event type" Object Created (All) which will ensure that this lambda function is notified every time a new object is created. Lastly click on Add to add the event and Save to save your changes to the lambda function. 

That it, from this point on all files/objects added to that bucket will get automatically processed and tag or deleted 🎉

Advanced topics

Turning on deletion on findings

You turn on the deletion action, which will automatically delete object from S3 that have findings such as malware content, by setting the value to  true for actionDeleteObjectOnFindings under application settings. If needed, this can be done via a redeployment of the SAM packaged application. 

Upgrading the SAM application

To upgrade our SAM application, all you need to do is trigger a redeploy and AWS will handle everything automatically. Please note that, as of this writing, environment variables changed via the Lambda UI will not be honored after the redeploy so you need to make all config changes during a deploy. 

Why you see the "Creates custom IAM roles or resource policies" warning

You see that warning during deploying because SAM applications cannot update object tags using SAM default permission rules forcing us to provide a custom IAM Policy - this is a shortcoming of the SAM specification that we're working with AWS on improving. 

Troubleshooting error "Configuration is ambiguously defined" while setting up S3 event sources. 

For some reason known only to AWS, deleting a lambda function that has S3 events configured do not delete the associated S3 event, so you need to go into the S3 console to clean those up under bucket/properties/events. Details here

Multi version deploys

As of version 1.3.0 or later, scanii-lambda supports parallel deploys allowing you to deploy multiple independent copies of our SAM application. For that, all you need to do is name the applications differently. 

Available configuration items

You can see a list of all available configuration items here: https://github.com/uvasoftware/scanii-lambda/wiki/Configuration

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us