Virus scanning your S3 content

Hi there! You can now integrate Scanii with S3 without writing any code using AWS lambda, find out more about it here.

A question we get all the time revolves around using scanii to virus scan user content stored in Amazon's S3. In this article we will cover one of the many ways this can be done by utilizing Scanii's fetch mode coupled with S3's query string authentication (sometimes refered to as "pre-signed" object url) to securely and asynchronously virus scan content stored in S3.

☛ This article requires a basic working knowledge of S3 and fairly minimal experience with Scanii’s APIs. All examples utilize cURL - a command line HTTP client - but should be easily translated to any programming language.

Background

For sake of this example, lets assume that company Acme.com stores its user-generated content in Amazon S3 via its internal content management application called AcmeCMS.

Original workflow:

image

Now let’s say that Acme is concerned that malicious user generated content could spread to other users of the system since this is, in essence, a content sharing platform. To solve this problem, Acme engineers decide to introduce an arbitration mechanism in which user generated content is hidden from other users until it is virus scanned and sanitized.

Workflow with Scanii:

image

To prevent having to make your S3 content public, we will rely on Amazon S3’s query string authentication to grant Scanii.com temporary access into your resources without sharing your Amazon credentials (in depth explanation of how query string authentication works is outside of the scope of this article). Query string authentication urls look like this:

https://scanii.s3.amazonaws.com/eicarcom2.zip?Signature=FlWxZH1buUtEeDoZga%2FfytvxKIw%3D&Expires=1330411319&AWSAccessKeyId=AKIAZZZZZZZZZZZ

Notifying scanii of content to be scanned

In this new workflow, AcmeCMS marks all user-generated content as pending arbitration before storing it into S3. Once that is done, it notifies Scanii that new user content must be scanned using the “fetch” API.

For example:

	$ curl -i -u 8eb05c68f386421db2dd4929fc4f77ad:12345678 -d location="https%3A%2F%2Fscanii.s3.amazonaws.com%2Feicarcom2.zip%3FAWSAccessKeyId%3DAKIAJNN3CBMBGCMQDU4A%26Expires%3D1432966418%26Signature%3DQjxrlqDq587fSDkhqfI5Kt2LVN8%253D" -d callback=http://acme-cms.com/callback/12345 https://api.scanii.com/v2.0/files/fetch
	HTTP/1.1 202 Accepted
	Access-Control-Allow-Headers: Authorization
	Access-Control-Allow-Origin: *
	Content-Type: application/json
	Date: Fri, 29 May 2015 06:19:32 GMT
	Location: https://api.scanii.com/v2.0/files/425700a659d88a3e4fc8551f2da1eed1
	X-Runtime: 0ms
	X-Scanii-Host-Id: 613a7f69
	X-Scanii-Request-Id: 538547c8-37be-47b3-9333-ca489eb68bd5
	Content-Length: 41
	Connection: keep-alive

	{"id":"425700a659d88a3e4fc8551f2da1eed1"}

What happens next?

Once scanii is done fetching and processing the content requested, it will notify the Acme application via a simple HTTP POST request to the “callback" url including the result of the scan. The body of the POST request will look like this:

    {
      "id" : "decad1d51b7981911113eb735739e73f",
      "checksum" : "edbb54821bc3f5666be48184a822c3df59392c31",
      "content_length" : 1579562,
      "findings" : [ "av.crdf.malware-generic.2462546599.unofficial" ],
      "creation_date" : "2015-05-29T06:00:37.772Z",
      "content_type" : "application/x-msdownload"
    }

Finally, once AcmeCMS has received the scan results from Scanii, it should remove the pending arbitration flag from the user content.

Reference

Last updated on 07/07/2015.