Epik Escrow

Website Scrapers?

Located in Website and Development Discussion started by stub, Oct 23, 2018.

Replies:
42
Views:
1,814

  1. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    This question is for those people with well established domains for sale websites. It's a serious question.

    How do you stop the scrapers? From my hosts logs. I have 1 domain on my website with 139K of visitors from 1 URL in the space of 2 1/2 weeks. It's obvious in this case. But how do you spot the scrapers from the real people? Stop the scrapers and let the real people thru to your website. There must be some solution for this other than to let the scrapers run wild? Any free or paid solutions considered?
     
    The views expressed on this page by users and staff are their own, not those of NamePros.
  2. carob

    carob Active Member VIP

    Posts:
    3,102
    Likes Received:
    3,745
  3. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    Would you mind explaining further? How does that help?
     
  4. xynames

    xynames XYNames.com PRO VIP

    Posts:
    2,595
    Likes Received:
    4,238
    It depends on what the website is. My highest traffic websites are forums. I have filters in place to stop spammers from registering to the point of being able to post anything.

    I also have blog websites some medium traffic some low and same for these I have filters in place to keep the spammers from posting anything.

    Otherwise, I don’t care who stops by spammer or not.
     
    Last edited: Oct 23, 2018
  5. carob

    carob Active Member VIP

    Posts:
    3,102
    Likes Received:
    3,745
    Hi best to read what they say for themselves - by default they filter out a lot of bot traffic and can do more:
    https://www.cloudflare.com/dns/dns-firewall/

    Not sure what you mean by scrapers - collectors/copiers of data? Just repeat bot visitors consuming bandwidth & maybe slowing you down or costing you money?

    If for example using WP with Wordfence, that blocks a lot of known bad IPs. And usually in your hosting you can create a list of blocked IPs. Others have created honeypots only bots would enter, and when they do that IP is banned.
     
  6. frank-germany

    frank-germany F1lter.com xpired domain search engine Gold Account VIP

    Posts:
    5,847
    Likes Received:
    6,666
    coudflare
    don't waste your time
    I tried a lot of ways
    : cloudflare
     
  7. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    I get A LOT of "scraper-like" activity. IP addresses going to the same pages over and over again maybe from 1 IP might go 1 page 10 times in 2 weeks up to 3/4M times in 2 weeks. maybe 500+ different IP addresses in total (or more) They don't appear to be actually doing any scraping though. But they sure eat up a lot of bandwidth,50GB is not unusual. Basically hitting this 1 page.(sometimed form a different domain) all of them. What would be the best method of attack? rename the old page and put a 404 on the old page. Employ Cloudfare and ban all their IP Addresses. I'm getting a lot of grumps from my host that the shere voulme of blacklisted IPs slowing access to my other websites to a crawl. I'm kinda overwhemed with the problem actually. I've fixed a lot of stuff related to this but it just keeps getting worse. These pages have very little content, There would be no need to visit the page more than once. They are bare static pages. It just doesn't make any sense..

    @frank-germany. I suppose your recommendation is a bit like @carob's. It needs some specifics.
     
  8. carob

    carob Active Member VIP

    Posts:
    3,102
    Likes Received:
    3,745
    Details available from Cloudflare website.

    Another thing you can do is use robots.txt to set a crawl delay so - if they obey - they space out their requests.
     
  9. DnameAgame

    DnameAgame Check out the new BrandPlease.com Gold Account

    Posts:
    501
    Likes Received:
    944
    If it makes it to Archive.org - there is just abut no way to stop it, I dont think. From Archive I can pull just about everything & from anytime (if recorded) - not exactly sure how its all stored - but it seems you can even pull what you wouldn't find searching it. I'm not an expert on this - just passing on something I found very recently.
     
  10. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    @carob - The only think I found out at Cloudfare wesite was they would serve all these requests probably thru their server cache, or directly to my server if not in the cache. The problem as I see it, is not resolved. The scraping goes on unabaited either thru their cache or directly via my website. There is no penalty to them to stop what they are actually doing. This really is the crux of my beef with these people. Without punishment they will never learn good manners. Hence my feelings are so heavy
     
  11. carob

    carob Active Member VIP

    Posts:
    3,102
    Likes Received:
    3,745
    Precisely, that means many less requests to your website.

    And they do block what they regard as malicious traffic.
     
  12. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    Can you show me ho to implement this blocking of IP's at CloudFlare?
     
  13. techpr

    techpr Established Member

    Posts:
    115
    Likes Received:
    60
    Go to Firewall, under access rules you can Ban or Challenge the user by Country or IP.

    Screen Shot 2018-10-23 at 11.08.54 AM.png
     
  14. xynames

    xynames XYNames.com PRO VIP

    Posts:
    2,595
    Likes Received:
    4,238
    I think what Frank-Germany is saying is do not waste your time with Cloudflare. Anyway, doesn't cloudflare just slow things down?
     
  15. frank-germany

    frank-germany F1lter.com xpired domain search engine Gold Account VIP

    Posts:
    5,847
    Likes Received:
    6,666
    just do it
    if it's recommended

    you can always rewind

    why ask otherwise?
     
  16. frank-germany

    frank-germany F1lter.com xpired domain search engine Gold Account VIP

    Posts:
    5,847
    Likes Received:
    6,666
    I said
    use cloudflare : no hassle
     
  17. frank-germany

    frank-germany F1lter.com xpired domain search engine Gold Account VIP

    Posts:
    5,847
    Likes Received:
    6,666
    they know the bad IPs already
     
  18. frank-germany

    frank-germany F1lter.com xpired domain search engine Gold Account VIP

    Posts:
    5,847
    Likes Received:
    6,666
  19. creataweb

    creataweb Some Guy with Awesome Senior High School Photo VIP ★★★★★★★★★★

    Posts:
    5,647
    Likes Received:
    6,693
  20. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    Thank you @techpr This is what I was trying to get at. Where to ban this stuff in cloudflare. I haven't tried it yet but it looks like a much better idea/implementation than blocking these in the hosts firewall. Which they seem to be saying is not a good idea because it will slow things down a lot.
     
  21. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    OK. I've implemented my firewall in CloudFlare (which I've only been using a couple of days since you guys have been solidly recommending it). So I'm a noob. I've still got some tinkering to do related to my last report from my host. But I'm set to go. I gave my host instructions to delete all my firewall instructions from me. We'll see how things operate from here.

    PS: I'm liking CloudFlare very much indeed.
     
    Last edited: Oct 24, 2018
  22. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    It's principally a cacheing (CDN) system. It speeds things up for the visitor.
     
  23. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    I'm still analyzing my hosts records. I'm not finished yet. But I'm finding a lot of these connections are coming from CloudFare IP's. This was before I started with CloudFlare. How can that be? I thought they were supposed to be protecting inbound activities. How does someone use a CloudFare IP for actually outbound scraping a website?
     
  24. frank-germany

    frank-germany F1lter.com xpired domain search engine Gold Account VIP

    Posts:
    5,847
    Likes Received:
    6,666
    the visitors will come from cloudflare
    if you need to see the origin country they supply it
    $country = $_SERVER["HTTP_CF-IPCountry"];

    there may be a way to get the original ip as well if you need it

    if they don't
    you need to redirect the original traffic
    -as I do it -
    and store the ip
     
  25. stub

    stub Top Member PRO VIP ★★★★★★★★★★

    Posts:
    22,645
    Likes Received:
    8,093
    @frank-germany - Thanks. I'm a tad tired. It's been a long day. I'll re-read what you said in the morning.
     

Want to reply or ask your own question?

It only takes a minute to sign up – and it's free!

Share This Page

  1. NamePros uses cookies and similar technologies. By using this site, you are agreeing to our privacy policy, terms, and use of cookies.
    Dismiss Notice
Loading...