I spent several days searching for ANY information on background uploads to Amazon S3 with Paperclip and I just couldn’t find anything concrete. There were a few posts on the Paperclip Google Code board that talked about it but with no clear examples of how to do it.
Anyway, we know it’s very easy to add S3 support into Paperclip, and there’s tons of information on setting that up. It works flawless and it’s simple, yet your upload now takes twice as long in comparison to using your servers filesystem for storage. The problem is pretty obvious and I’ll explain why in this example in case you’re wondering.
1) User uploads a file 2) Application re sizes file into thumbnails
Example B(Amazon S3)
1) User uploads a file 2) Application re sizes file into thumbnails 3) Application then connects to Amazon S3 and uploads thumbnail 4) Repeat 3 until each thumbnail is uploaded
As you can see in Example B, the more thumbnails you need the longer your upload is going to take because Paperclip needs to send each file to Amazon S3 one at time. All the while a user is still waiting for the website to respond back. If you use New Relic RPM you’ll notice that your Apdex gets destroyed by this. Which is basically telling you that a user is waiting longer than a second(more like 3-5 seconds) for an upload process.
Although there’s nothing that can be done about the initial upload, why does the user need to wait around while the application uploads to Amazon S3(which could take 2-5 seconds longer)?
I worked on several ways to get Paperclip to process it’s Amazon S3 storage in the background that led me to more complications than a solution. I realized I wanted to try my best to not modify the original Paperclip code so there wouldn’t be any problems updating Paperclip in the future.
The next idea was to have some sort of ‘syncing’ to Amazon S3 system. Let Paperclip upload to the filesystem, then have a background process move the files to Amazon S3, then delete the files off the filesystem.
Let’s assume you already have your rails app, starling/workling installed and running, and paperclip installed and working on the model of your choice. Let’s also assume we’re applying this to a User model and the attachment is an avatar. You’ll also need the aws-s3 gem(http://amazon.rubyforge.org) and add that to your environment.rb.
First let’s modify the Paperclip settings in our User model.
The :url is set to the Amazon’s S3 link so that when we generate urls with object.avatar.url(:medium) it will output the Amazon S3 url instead of the filesystem path. I also set the :path to the location I want the file to be uploaded to and re sized. I chose the tmp folder since it seemed appropriate. (Note: I’m considering moving the location to system because it’s a shared folder across capistrano.)
A new column is needed for this model to let us know when the background uploading is being processed.
It’s your choice if you want to set the default to true, assuming upon creation of a User record that an upload will be included. In my case, a user gets created first and if the user decides to upload an avatar, it’s done through the user settings later, so I set the default to false.
Next we’ll setup the worker which will handle the upload to Amazon S3.
This is pretty simple. The worker uses the aws-s3 gem to upload the files and once it finishes they get removed from the filesystem. You’ll need to include your Amazon S3 Access Key ID and Secret Access Key and also fill in your Amazon S3 Bucket name that the files are getting uploaded to where it says ‘YOUR_AMAZONS3_BUCKET_NAME”. You may also notice a new method named ’path_s3’ which I had to add to Paperclips attachment file.
Open up the ‘attachment.rb’ file in the ‘vendor/plugins/paperclip/lib/paperclip/’ folder. Under ‘def initialize’ add the two ‘@path_s3 =’ lines code(around line 47):
Then scroll down, in the the same file, to about line 113 where there’s a method named ‘path’(def path style = nil #:nodoc:). Under that method we are going to make a new one.
This will let us define a new key value(:path_s3) in the has_attached_file hash in the User model. If you’re wondering what the ‘path_s3’ does, it determines the path within the Amazon S3 Bucket that the file gets saved to and by having it in our has_attached_file options, we can change this if we need to later on. Which I’ll show you next.
Back to the User model, we add a new line in the has_attached_file hash:
Don’t forget to restart the server since the Paperclip plugin was modified and also restart workling to generate the new workling we created earlier.
The last part is to call the worker through Starling. In this example, the file upload is being handled normally through the User controller and the Update action.
That’s it! The next step is to create a way to let your users know that the file is being processed. All you have to do is check if processing_upload = true.
This is my first tutorial type of post so bear with me if I missed anything and please let me know asap! Also if you have any experience doing something similar to this, let me know!