archive of pdf uploads safe from bots

6 posts by 4 authors in: Forums > CMS Builder
Last Post: March 5, 2012   (RSS)

By markr - February 28, 2012

Whats the best way to secure an archive of pdf uploads from unwanted web viewing using cms?

If you use a custom upload directory to hide them, aren't they still just waiting for a clever bot or patient crawler to see them?

Re: [markr] archive of pdf uploads safe from bots

By Damon - February 28, 2012

Hi,

You can prevent directory from being browsable using htaccess edit or just by simply dropping an index.php file into it.

If anyone (including bots) go to the directory, they will just see the index.php file.

The only other way bots would pick up the links to PDF files is from links within your site content.

If your content is protected by username/password, then the bots wouldn't be able to find any links to PDFs.

Hope that helps!
Cheers,
Damon Edis - interactivetools.com

Hire me! Save time by getting our experts to help with your project.
http://www.interactivetools.com/consulting/

Re: [Damon] archive of pdf uploads safe from bots

By markr - February 28, 2012

I was hoping to do it without an htaccess login, e.g. perhaps using the membership plugin.

And even if I drop a blank index.php file in the directory, a browser can still load the pdf if landed upon. Let's say the client names the pdf something simple like a.pdf, the bot would randomly generate that name pretty quickly. A longer name would only delay the beast, no?

When you say "protected by username/password", are you referring to the htaccess edit?

I was wondering if maybe the pdf could be stored in a non-public directory and displayed on a secure (members only) html page using an embed thing. In that hypo, can cmsb upload to a non-public area of the server?

Re: [markr] archive of pdf uploads safe from bots

By Dave - March 4, 2012

Hi Markr,

>can cmsb upload to a non-public area of the server?

Yes, you can set custom upload dirs in the fields editor for upload fields.

We've dealt with document security a number of times and there's a few common issues that come up.

- Bot Security, Generally bots won't find your upload directory unless it's linked from somewhere, and if they do it's not usually a problem if it doesn't list all the files (a blank index.html/php will hide directory listings). It's true they could guess at filenames, but this is as secure as passwords which can also be guessed. Assuming a-z is 26 chars, plus 0-9 if another 10, each filename char has 36 possibilities, so a 3 char filenamecould take over 46 thousand guesses (36*36*36). It's usually not a problem unless your filenames follow a pattern, eg: 1001.pdf, 1002.pdf or if they match something else on your site (product SKUs, etc).

- User Security, the next concern is limiting download links to logged in users, since once someone has the link they could just share it and anyone could access it. The easiest way to do this is to create a custom wrapper script that requires users to be logged in and displays the PDF. A link such as memberPdfDownload.php?table=products&num=123 could let them download the PDF, but only if they were logged in so sending that link to others wouldn't help.

- Home PC Security, of course, nothing prevents a user from saving the file to their computer and emailing it around as an attachment. And even complicated systems that don't let a user download a file are still susceptible to someone taking a picture of their screen with their camera. Basically, there's no way to prevent a user from copying the data once they have it, just lots of ways of making it more difficult.

Hope that helps, let me know any questions. Thanks!
Dave Edis - Senior Developer
interactivetools.com

Re: [Dave] archive of pdf uploads safe from bots

By markr - March 5, 2012

Great info. Thanks all.