add tarpit responder #48

circa10a · 2025-02-01T23:59:39Z

Addresses #38, which is to add a tarpit responder, ultimately designed to waste time of crawlers. The responder allows a user to configure headers, how long for the request to take, response code, and reuses the existing message field to optionally return a customized message. If a tarpit responder is enabled, but no configuration options are provided, the defaults are:

Headers: {}
Delay: 10s
Message: "Access Denied"
Response/Status Code: 403

Validation:

Caddyfile:

{
	auto_https off
	order defender after header
	debug
}

:80 {
	bind 127.0.0.1 ::1

	defender tarpit {
		ranges private
        message "Got eem"
        tarpit_config {
            headers {
                X-You-Got Played
            }
            delay 10s
            response_code 429
        }
	}
}

Request/response:

❯ time curl http://localhost:8080 -v
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 429 Too Many Requests
< Server: Caddy
< X-You-Got: Played
< Date: Sun, 02 Feb 2025 01:00:41 GMT
< Content-Length: 7
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact
Got eemcurl http://localhost:8080 -v  0.01s user 0.01s system 0% cpu 10.024 total

circa10a · 2025-02-02T00:00:01Z

Signed-off-by: circa10a <[email protected]>

JasonLovesDoggo · 2025-02-02T01:17:23Z

I love the idea of it lol, but I don't think just time.Sleep(t.Delay)'ing globally then dumping out a short response is the right implementation.

I think something more akin to instantly responding with headers then very slowly reading out text (like the bee movie script I linked in #39 ) would be better. Right now I would assume most scrapers would simply timeout and move on instead of waiting for headers + response.

perhaps something like the following

func (t *TarpitResponder) ServeHTTP(w http.ResponseWriter, r *http.Request, _ caddyhttp.Handler) error {
    // Send the headers 
    ... 
    
    
    
    flusher, ok := w.(http.Flusher)
    if !ok {
        return errors.New("streaming not supported") // or just... return something else
    }
    
    ctx := r.Context()
    go func() {
        buf := []byte(t.Response) // t.Response being the movie  script or whatever user chooses (could be loaded from a file  
        for i := 0; i < len(buf); i++ {
            select {
            case <-ctx.Done():
                return
            default:
                w.Write([]byte{buf[i]})
                flusher.Flush()
                time.Sleep(100 * time.Millisecond) // Configurable bytes/sec
            }
        }
    }()
    
    return nil
}

Do note my POC does not handle client disconnects.

Also, do note my impl does byte-to-byte chunking which would NOT be efficient. a better chunking method would be needed

JasonLovesDoggo

see comment

circa10a · 2025-02-02T01:48:04Z

Ah yeah I see what you mean. I wasn't quite sure of the meaning of #39 at first.

I like the idea of being able to flood attackers/crawlers with more nonsense much more. I'll work on getting that implemented

JasonLovesDoggo · 2025-02-02T02:57:35Z

Ah yeah I see what you mean. I wasn't quite sure of the meaning of #39 at first.

I like the idea of being able to flood attackers/crawlers with more nonsense much more. I'll work on getting that implemented

In that case, #1 may be of particular interest to you.

I link the repository where I have my work for that done already but essentially it's Markov text chain generation.

I experimented with a couple of different options on how to generate realistic looking text, but I couldn't think of anything besides Markov chains that would have a low enough memory and compute footprint to be usable.

If you have any other ideas just put them in there!

JasonLovesDoggo · 2025-02-03T07:16:59Z

I would expand on this to include a few anchor tags that link back to itself. So the scraper gets stuck inside of an endless scraping loop.

circa10a force-pushed the tarpit-responder branch 5 times, most recently from 5bb9f26 to d9bea6c Compare February 2, 2025 00:05

add tarpit responder

7fa53e5

Signed-off-by: circa10a <[email protected]>

circa10a force-pushed the tarpit-responder branch from d9bea6c to 7fa53e5 Compare February 2, 2025 01:00

JasonLovesDoggo linked an issue Feb 2, 2025 that may be closed by this pull request

Add a tarpit responder #38

Open

JasonLovesDoggo requested changes Feb 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tarpit responder #48

add tarpit responder #48

circa10a commented Feb 1, 2025 •

edited

Loading

circa10a commented Feb 2, 2025

JasonLovesDoggo commented Feb 2, 2025 •

edited

Loading

JasonLovesDoggo left a comment

circa10a commented Feb 2, 2025

JasonLovesDoggo commented Feb 2, 2025

JasonLovesDoggo commented Feb 3, 2025

add tarpit responder #48

Are you sure you want to change the base?

add tarpit responder #48

Conversation

circa10a commented Feb 1, 2025 • edited Loading

circa10a commented Feb 2, 2025

JasonLovesDoggo commented Feb 2, 2025 • edited Loading

JasonLovesDoggo left a comment

Choose a reason for hiding this comment

circa10a commented Feb 2, 2025

JasonLovesDoggo commented Feb 2, 2025

JasonLovesDoggo commented Feb 3, 2025

circa10a commented Feb 1, 2025 •

edited

Loading

JasonLovesDoggo commented Feb 2, 2025 •

edited

Loading