• MysticKetchup@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    14 hours ago

    At this point we need to treat AI web scrapers as DDoS attacks and prosecute the companies and people involved the same way we would those

  • pcouy@lemmy.pierre-couy.fr
    link
    fedilink
    English
    arrow-up
    16
    ·
    22 hours ago

    I’ve been in a similar situation, and I’m also blocking large ranges of IP addresses in addition to running Anubis in front of my most scraped services (Git/forgejo and Lemmy)

    I came up with a hacky python script that watches my fail2ban logs, counts bans for IP ranges going from /28 to /8, applies some heuristics (based on range size n and how offending IPs are split between the 2 /(n+1) subranges) I came up with to detect ranges that should be blocked, the issues a log line that is picked up by fail2ban to manage bans of increasing length on récidive.

    It’s quite contrived and I often fear it will be too agressive and block something I rely on, but it has been working really wellin my experience.

    It will initially block a lot of small ranges, but over time the ranges will grow larger. Smaller ranges having a lower threshold helps it block only the narrowest ranges needed, which gives some time for larger ranges that contain them to drop out of fail2ban’s watchlist.

    I should clean up this mess and make it a git repo, maybe even try to have it merged in fail2ban

  • thericofactor@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    15
    ·
    22 hours ago

    So we’re at the point where A I. Is not only stealing intellectual property, but also driving up costs for people while doing it.