Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tarpit responder #48

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

circa10a
Copy link
Contributor

@circa10a circa10a commented Feb 1, 2025

Addresses #38, which is to add a tarpit responder, ultimately designed to waste time of crawlers. The responder allows a user to configure headers, how long for the request to take, response code, and reuses the existing message field to optionally return a customized message. If a tarpit responder is enabled, but no configuration options are provided, the defaults are:

  • Headers: {}
  • Delay: 10s
  • Message: "Access Denied"
  • Response/Status Code: 403

Validation:

Caddyfile:

{
	auto_https off
	order defender after header
	debug
}

:80 {
	bind 127.0.0.1 ::1

	defender tarpit {
		ranges private
        message "Got eem"
        tarpit_config {
            headers {
                X-You-Got Played
            }
            delay 10s
            response_code 429
        }
	}
}

Request/response:

❯ time curl http://localhost:8080 -v
* Host localhost:8080 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 429 Too Many Requests
< Server: Caddy
< X-You-Got: Played
< Date: Sun, 02 Feb 2025 01:00:41 GMT
< Content-Length: 7
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host localhost left intact
Got eemcurl http://localhost:8080 -v  0.01s user 0.01s system 0% cpu 10.024 total

@circa10a
Copy link
Contributor Author

circa10a commented Feb 2, 2025

alt text

@circa10a circa10a force-pushed the tarpit-responder branch 5 times, most recently from 5bb9f26 to d9bea6c Compare February 2, 2025 00:05
Signed-off-by: circa10a <[email protected]>
@JasonLovesDoggo JasonLovesDoggo linked an issue Feb 2, 2025 that may be closed by this pull request
@JasonLovesDoggo
Copy link
Owner

JasonLovesDoggo commented Feb 2, 2025

I love the idea of it lol, but I don't think just time.Sleep(t.Delay)'ing globally then dumping out a short response is the right implementation.

I think something more akin to instantly responding with headers then very slowly reading out text (like the bee movie script I linked in #39 ) would be better. Right now I would assume most scrapers would simply timeout and move on instead of waiting for headers + response.

perhaps something like the following

func (t *TarpitResponder) ServeHTTP(w http.ResponseWriter, r *http.Request, _ caddyhttp.Handler) error {
    // Send the headers 
    ... 
    
    
    
    flusher, ok := w.(http.Flusher)
    if !ok {
        return errors.New("streaming not supported") // or just... return something else
    }
    
    ctx := r.Context()
    go func() {
        buf := []byte(t.Response) // t.Response being the movie  script or whatever user chooses (could be loaded from a file  
        for i := 0; i < len(buf); i++ {
            select {
            case <-ctx.Done():
                return
            default:
                w.Write([]byte{buf[i]})
                flusher.Flush()
                time.Sleep(100 * time.Millisecond) // Configurable bytes/sec
            }
        }
    }()
    
    return nil
}

Do note my POC does not handle client disconnects.

Also, do note my impl does byte-to-byte chunking which would NOT be efficient. a better chunking method would be needed

Copy link
Owner

@JasonLovesDoggo JasonLovesDoggo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment

@circa10a
Copy link
Contributor Author

circa10a commented Feb 2, 2025

Ah yeah I see what you mean. I wasn't quite sure of the meaning of #39 at first.

I like the idea of being able to flood attackers/crawlers with more nonsense much more. I'll work on getting that implemented

@JasonLovesDoggo
Copy link
Owner

Ah yeah I see what you mean. I wasn't quite sure of the meaning of #39 at first.

I like the idea of being able to flood attackers/crawlers with more nonsense much more. I'll work on getting that implemented

In that case, #1 may be of particular interest to you.

I link the repository where I have my work for that done already but essentially it's Markov text chain generation.

I experimented with a couple of different options on how to generate realistic looking text, but I couldn't think of anything besides Markov chains that would have a low enough memory and compute footprint to be usable.

If you have any other ideas just put them in there!

@JasonLovesDoggo
Copy link
Owner

I would expand on this to include a few anchor tags that link back to itself. So the scraper gets stuck inside of an endless scraping loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a tarpit responder
2 participants