How do I analyze content stored on Amazon S3?
Uva Software's Scanii integrates easily with Amazon Web Services's Simple Storage Service (S3) via a one-click integration using our SAM packaged scanii-lambda application.
How it works
This is, essentially, a series of lambda functions packaged in a one-click deployable application that configures everything needed so your S3 objects are submitted automatically to scanii’s content analysis API. That includes a lambda function that submit files in S3 for processing by our API service and another function that receive callback events and takes appropriate action depending on findings being present or not.
Currently, you can choose from a couple of different actions:
1) Tag the S3 object, this is defaulted to on and adds the following AWS tags to objects processed:
Tag name | Tag purpose |
ScaniiId | the resource id of the processed content |
ScaniiFindings | list of identified findings (content engine dependent) |
ScaniiContentType | the identified content type of the file processed |
2) Delete the S3 object with findings - this is defaulted to off and will delete S3 objects with findings (such as malware or NSFW content) - for a full list of available content identification see https://docs.scanii.com/article/149-how-do-the-different-detection-engines-work.
Deploying
Step 1 - Deploying our S3 integration to your AWS account
👉🏽 Deploy our application to your account by clicking here.
For that you will need 3 things:
- An active Amazon Web Services account
- Your scanii API credentials, If you don't already have one, create a scanii.com account and API key
- The name of the bucket you would like to monitor for events
Step 2 - Enable events from your S3 bucket to our integration
Due to a quirk with the way that SAM applications work, our application cannot automatically wire itself into object creation events for your S3 bucket, so, the last step in setting up our integration is setting up those events. For that you need to:
1) Log into the AWS Lambda Console and click on the function called serverlessrepo-UvaSoftware-Scanii-Lambda-Submit(this is the function that submits content to scanii.com for processing). Keep in mind that your exact function name could be different, in essence it's composed of: "servelessrepo-${SAM APPLICATION NAME}-Submit".
2) Under "Add triggers" select S3 and fill in the bucket information such as the bucket name and optional prefix path, the important thing is to leave under "Event type" Object Created (All) which will ensure that this lambda function is notified every time a new object is created. Lastly click on Add to add the event and Save to save your changes to the lambda function.
That it, from this point on all files/objects added to that bucket will get automatically processed and tag or deleted 🎉
Advanced topics
Turning on deletion on findings
You turn on the deletion action, which will automatically delete object from S3 that have findings such as malware content, by setting the value to true for actionDeleteObjectOnFindings under application settings. If needed, this can be done via a redeployment of the SAM packaged application.
Upgrading the SAM application
To upgrade our SAM application, all you need to do is trigger a redeploy and AWS will handle everything automatically. Please note that, as of this writing, environment variables changed via the Lambda UI will not be honored after the redeploy so you need to make all config changes during a deploy.
Why you see the "Creates custom IAM roles or resource policies" warning
You see that warning during deploying because SAM applications cannot update object tags using SAM default permission rules forcing us to provide a custom IAM Policy - this is a shortcoming of the SAM specification that we're working with AWS on improving.
Troubleshooting error "Configuration is ambiguously defined" while setting up S3 event sources.
For some reason known only to AWS, deleting a lambda function that has S3 events configured do not delete the associated S3 event, so you need to go into the S3 console to clean those up under bucket/properties/events. Details here.
Multi version deploys
As of version 1.3.0 or later, scanii-lambda supports parallel deploys allowing you to deploy multiple independent copies of our SAM application. For that, all you need to do is name the applications differently.
Available configuration items
You can see a list of all available configuration items here: https://github.com/uvasoftware/scanii-lambda/wiki/Configuration