-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added probe to identify copyright year #1955
base: dev
Are you sure you want to change the base?
Conversation
Thanks for your contribution @nyxgeek ! We also have a Discord server, which you’re more than welcome to join. It's a great place to connect with fellow contributors and stay updated with the latest developments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge conflict + lint fail
httpx v1.6.9 release prep
Updated and tested, should be good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I left some comments.
"strings" | ||
) | ||
|
||
var crreYear = regexp.MustCompile(`(?:copyright|Copyright|COPYRIGHT|\(C\)|\(c\)|©|©|©)?\s*(?:[a-zA-Z0-9 ,-]+\s*)?[\s,]*(199[0-9]|20[0-1][0-9]|202[0-4])[\s,<-]+(?:copyright|Copyright|COPYRIGHT|\(C\)|\(c\)|©|©|©|199[0-9]|20[0-1][0-9]|202[0-4])?`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're limiting the year statically with 2024. This must be dynamic. It'll not detect © 2025 Dummy Media Group. All Rights Reserved.
, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I'll extend it through 2029 if that is acceptable. I am trying to avoid false positives so trying to keep it to a realistic range.
} | ||
} | ||
|
||
green := "\033[32m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using aurora
for colored output. We can do the same here.
|
||
|
||
// Apply regex to extract the years and check for indicators | ||
matches := crreYear.FindAllStringSubmatch(textContent, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect multiple copyright text in pages? If not, we should rethink post-processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex will match strings like Copyright 2024, as well as Copyright 1995-2001, in which case it will display both dates.
@@ -1800,6 +1801,21 @@ retry: | |||
builder.WriteRune(']') | |||
} | |||
|
|||
var copyright string | |||
if httpx.CanHaveTitleTag(resp.GetHeaderPart("Content-Type", ";")) { | |||
copyright = httpx.ExtractCopyright(resp) // This will return a space-delimited string of years |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we extract copyright text here and not under scanopts.OutputCopyright
if block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I basically just copied the same functions that exist for Title extraction, but changed them to copyright instead. If there is a better way, or if I mis-copied that format from Title extraction, happy to mod.
Isn't better a nuclei template? |
I don't use nuclei, but I do use my fork of httpx all the time on giant internal pentests because it's super easy to find the old software with this feature. |
I think a nuclei template like the following one should do the job as internally nuclei already uses httpx for pre-flight: id: copyright-year-detector
info:
name: Copyright Year Detector
author: AI
severity: info
description: Detects copyright years in web responses to identify potentially outdated software
tags: tech,copyright
requests:
- method: GET
path:
- "{{BaseURL}}"
matchers-condition: and
matchers:
- type: status
status:
- 200
- type: word
words:
- "copyright"
- "©"
- "(c)"
- "(C)"
condition: or
extractors:
- type: regex
name: copyright-years
group: 1
regex:
- '(?i)(?:copyright|©|\(c\)|\(C\)|©|©)\s*(?:[a-zA-Z0-9 ,-]+\s*)?[\s,]*(\d{4}(?:\s*-\s*\d{4})?)'
- type: regex
name: possible-years
group: 1
regex:
- '[^0-9]((?:199[0-9]|20[0-2][0-9])(?:\s*-\s*(?:199[0-9]|20[0-2][0-9]))?)[^0-9]' $ nuclei -t copyright-year.yaml -u https://projectdiscovery.io
__ _
____ __ _______/ /__ (_)
/ __ \/ / / / ___/ / _ \/ /
/ / / / /_/ / /__/ / __/ /
/_/ /_/\__,_/\___/_/\___/_/ v3.3.7
projectdiscovery.io
[WRN] Found 1 templates loaded with deprecated protocol syntax, update before v3 for continued support.
[INF] Current nuclei version: v3.3.7 (outdated)
[INF] Current nuclei-templates version: v10.1.0 (latest)
[WRN] Scan results upload to cloud is disabled.
[INF] New templates added in latest release: 114
[INF] Templates loaded for current scan: 1
[WRN] Loading 1 unsigned templates for scan. Use with caution.
[INF] Targets loaded for current scan: 1
[copyright-year-detector:copyright-years] [http] [info] https://projectdiscovery.io ["2024"]
[copyright-year-detector:possible-years] [http] [info] https://projectdiscovery.io ["2021","2015","2014","2019","2018","2002","2029","1996","2003","2022","2007","2006","2000","1997","2008","2025","1994","1995","2024","2027","2023"] |
Added copyright probe, useful for identifying old software
Closes #1965