archive of pdf uploads safe from bots

6 posts by 4 authors in: Forums > CMS Builder
Last Post: March 5, 2012   (RSS)

By markr - February 28, 2012

Whats the best way to secure an archive of pdf uploads from unwanted web viewing using cms?

If you use a custom upload directory to hide them, aren't they still just waiting for a clever bot or patient crawler to see them?

Re: [markr] archive of pdf uploads safe from bots

By Damon - February 28, 2012

Hi,

You can prevent directory from being browsable using htaccess edit or just by simply dropping an index.php file into it.

If anyone (including bots) go to the directory, they will just see the index.php file.

The only other way bots would pick up the links to PDF files is from links within your site content.

If your content is protected by username/password, then the bots wouldn't be able to find any links to PDFs.

Hope that helps!
Cheers,
Damon Edis - interactivetools.com

Hire me! Save time by getting our experts to help with your project.
http://www.interactivetools.com/consulting/

Re: [Damon] archive of pdf uploads safe from bots

By markr - February 28, 2012

I was hoping to do it without an htaccess login, e.g. perhaps using the membership plugin.

And even if I drop a blank index.php file in the directory, a browser can still load the pdf if landed upon. Let's say the client names the pdf something simple like a.pdf, the bot would randomly generate that name pretty quickly. A longer name would only delay the beast, no?

When you say "protected by username/password", are you referring to the htaccess edit?

I was wondering if maybe the pdf could be stored in a non-public directory and displayed on a secure (members only) html page using an embed thing. In that hypo, can cmsb upload to a non-public area of the server?

Re: [markr] archive of pdf uploads safe from bots

By sublmnl - March 4, 2012 - edited: March 4, 2012

either move the directory to somewhere only FTP can get to but not in the public or www folder.

Or if you can't move to above/outside the www folder, then put a robots.txt file in your root and dissallow the pdf directory you are talking about. Also put a index.php file in the root of the pdf folder that makes the listing die or redirect. Also put a .htaccess file in the PDF folder that doesn't allow listing of file contents for that folder.
Or do like he said and put a login on the pdf folder.
you could also apply document security to the PDF's so that you would have to put in a password if you tried to open them. You can do all of the above as good practice if needed.

We have done this with a client and moved an entire section of their site 'behind the wall' - it was all learning module content and run from the CMS. So was the content on the outside of the login wall. Win win for us. It may be a win win for your client as well.

Re: [Dave] archive of pdf uploads safe from bots

By markr - March 5, 2012

Great info. Thanks all.