Is this relatively simple to create or other? Image location scrambler thingy.

3 posts by 2 authors in: Forums > CMS Builder
Last Post: May 8, 2023   (RSS)

By Codee - May 8, 2023

Dave and I.T. Team,

Been doing some thinking on security procedures and recent influx of site scrapers for images by A.I.  Sometime back your team created the Spambot Email Protector which was a GREAT idea with only one glitch (order of operations, if someone used dynamic entry from CMSB it also got scrambled so legit emails didn't work  in that case) but now I am wondering if there is a simple way (like plugin or standard coding) to either prevent, or screw with, AI image scraping?  I was thinking it would/could work by adjusting the IRL after the page is loaded (so displaying the correct image or thumbnail but if someone tries to right-click-download, right-click-open-in-new-window, or just scrape the IRL from the source code, they get the wrong IRL/URL.) 

Does that make sense?

By Dave - May 8, 2023

Hi Codee, 

There is no perfect solution to this issue, as browsers need to download images. A bot that perfectly replicates browser behaviour could still download them. However, there are ways to make it challenging for many scrapers.

One method involves requiring an HTTP_REFERER value, which is the URL of the page that linked to or referred to the current page or image. Simple bots often request a file directly without sending this value. You can filter based on this field (with PHP or .htaccess) to display a different image or an error if there is no referrer.

Another approach is to filter based on the HTTP_USER_AGENT value. All browsers send their name and version when making a web request, and well-behaved bots should include something that identifies their source. You can find more information about this here: https://datadome.co/threat-research/how-chatgpt-openai-might-use-your-content-now-in-the-future/

Additionally, you can ban access from blocks of IP addresses. If you know a certain service is scraping your site (or might) and you know their IP range, you can block it. However, be cautious not to exclude valid search engine indexing bots.

Other ideas to consider:

  • Disallowing bots with robots.txt
  • Loading images with JavaScript to prevent access by bots that can't use JavaScript
  • Watermarking your images to make them traceable and less useful
  • Limiting the rate of requests so scrapers and bots can't download everything all at once

Hope that helps, the solution might vary based on what you're trying to protect and why, but there are a few options.

Dave Edis - Senior Developer
interactivetools.com