Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the number of allowed entities/mentions in a message #222

Closed
BoberMod opened this issue Jan 10, 2025 · 4 comments · Fixed by #262
Closed

Limit the number of allowed entities/mentions in a message #222

BoberMod opened this issue Jan 10, 2025 · 4 comments · Fixed by #262

Comments

@BoberMod
Copy link

BoberMod commented Jan 10, 2025

I faced a new spam type when the first message contained mass-mention of chat members and it was edited to ad-message immediately after that. Telegram keeps mention notifications even after message is edited, so users go to chat from notification and see ad message.

image

The classifier doesn't detect such messages as spam, because they contain a lot of random text (usernames) even if the original message contains spam too:

Note: Message without mentions is added to spam samples, and has 99% detection.

Screenshot of how it looks

image

Each mention in the message is counted as a separate entity of the type mention. I suggest a feature request to allow limiting the number of entities by type or specifically restricting mention entities.

I think it would be useful to block/limit any entity type because spam also contains telegram cashtags ($USD) and hashtags.

Message JSON from Telegram API
  {
   "update_id": 936949643,
   "message": {
    "message_id": 1881484,
    "from": {
     "id": 155807040,
     "is_bot": false,
     "first_name": "BoberMod",
     "username": "BoberMod",
     "language_code": "ru",
     "is_premium": true
    },
    "chat": {
     "id": 155807040,
     "first_name": "BoberMod",
     "username": "BoberMod",
     "type": "private"
    },
    "date": 1736530089,
    "text": "Ищу энергичных партнёрοв для запуска прибыльных криптοпρоектов. У вас есть желание заρабатывать пассивнο? Ηапишите мне!\n\nИщу энергичных партнёрοв для запуска прибыльных криптοпρоектов. У вас есть желание заρабатывать пассивнο? Ηапишите мне!  @vladimiir49 @AliLit6062 @j_evgenyyyy @kravtsov_dya @Charger69 @Militant_Hamster @Bearded_alex @deshik80 @vovazlv @melkosofter @Vyacheslav_Voznyy0 @andyvers @Leonid_ur5yar @Andrey_911_psg @S_bobo",
    "entities": [
     {
      "offset": 242,
      "length": 12,
      "type": "mention"
     },
     {
      "offset": 255,
      "length": 11,
      "type": "mention"
     },
     {
      "offset": 267,
      "length": 12,
      "type": "mention"
     },
     {
      "offset": 280,
      "length": 13,
      "type": "mention"
     },
     {
      "offset": 294,
      "length": 10,
      "type": "mention"
     },
     {
      "offset": 305,
      "length": 17,
      "type": "mention"
     },
     {
      "offset": 323,
      "length": 13,
      "type": "mention"
     },
     {
      "offset": 337,
      "length": 9,
      "type": "mention"
     },
     {
      "offset": 347,
      "length": 8,
      "type": "mention"
     },
     {
      "offset": 356,
      "length": 12,
      "type": "mention"
     },
     {
      "offset": 369,
      "length": 19,
      "type": "mention"
     },
     {
      "offset": 389,
      "length": 9,
      "type": "mention"
     },
     {
      "offset": 399,
      "length": 14,
      "type": "mention"
     },
     {
      "offset": 414,
      "length": 15,
      "type": "mention"
     },
     {
      "offset": 430,
      "length": 7,
      "type": "mention"
     }
    ]
   }
  }

Check for entities also allows to update/unify LinksCheck function, because each URL in the message is also an entity of url type.

func LinksCheck(limit int) MetaCheck {
return func(req spamcheck.Request) spamcheck.Response {
links := req.Meta.Links
if links == 0 {
links = strings.Count(req.Msg, "http://") + strings.Count(req.Msg, "https://")
}
if links > limit {
return spamcheck.Response{
Name: "links",
Spam: true,
Details: fmt.Sprintf("too many links %d/%d", links, limit),
}
}
return spamcheck.Response{Spam: false, Name: "links", Details: fmt.Sprintf("links %d/%d", links, limit)}
}
}

Bot API documentation: https://core.telegram.org/bots/api#messageentity

@umputun
Copy link
Owner

umputun commented Jan 10, 2025

I like the idea of this new checker

@BoberMod
Copy link
Author

I'll try to implement it myself and submit the PR.

@umputun
Copy link
Owner

umputun commented Jan 11, 2025

I'll try to implement it myself and submit the PR.

Cool. I don't think we want to reimplement LinksCheck because currently it is a part of a library that does the job on any text, not just on TG meta info. It seems to work and feels like a more universal approach to me.

@umputun
Copy link
Owner

umputun commented Mar 6, 2025

This issue has been addressed in PR #262 (now merged).

Added the ability to limit the number of mentions (@username) in a message with a new '--meta.mentions-limit' parameter. When the limit is exceeded, the message will be treated as spam.

  • Default value is -1 (check disabled)
  • Set to 0 to allow no mentions
  • Set to a positive number to allow that many mentions
  • It counts both 'mention' and 'text_mention' entity types from Telegram

The feature follows the same pattern as the existing links limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants