Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error processing forwarded messages #107

Open
coperius opened this issue Aug 5, 2024 · 4 comments
Open

Error processing forwarded messages #107

coperius opened this issue Aug 5, 2024 · 4 comments

Comments

@coperius
Copy link
Contributor

coperius commented Aug 5, 2024

  1. Messages are stored using a hash of their content.
  2. This causes overwriting when different messages have the same content hash.
  3. When forwarding a message that has content duplicates:
    • The system processes the most recent message with that content.
    • This leads to deletion and banning of the sender of the most recent message.
    • The original forwarded message is not processed as intended.
  4. This creates issues with subsequent forwarding of messages.

This seems to be a significant problem in the message handling system. It could lead to incorrect message attribution, unintended bans, and loss of original message context.

Version: master-68dcc02-20240805T10:42:41
paranoid mode on

How to reproduce the problem

  1. Send message from first user
Screenshot 2024-08-05 at 18 33 03 2. Send message from second user Screenshot 2024-08-05 at 18 33 30 the record in the database is overwritten Messages in chat Screenshot 2024-08-05 at 18 34 00 3. Forward first message to admin chat Screenshot 2024-08-05 at 18 37 36 But in the main chat, the second message was deleted and the user Coperius was banned Screenshot 2024-08-05 at 18 37 17
  1. Try to forward the first message again
Screenshot 2024-08-05 at 18 38 27

And it is not processed

Is this functionality needed at all if it's fully implemented through reply message with a spam tag?

@umputun
Copy link
Owner

umputun commented Aug 5, 2024

I understand your point, but I'm not entirely certain about the significance of the problem. If two identical spam messages are not detected and are forwarded as spam by the admin, we won't be able to process the first one. However, in practice, I have not encountered a situation like this yet.

As you may already know, hash matching is a component of the locator. It involves records added by TelegramListener for both spam and non-spam messages, which are used by the admin to locate matched messages as part of MsgHandler (for handling admin forwards) in order to find the original message. To the best of my recollection, this is the only purpose hash serves, unless I am overlooking something.

If this scenario occurs, the issue primarily impacts an uncommon and somewhat unnecessary feature related to forwarding support for admin bans. In the past, this method was the sole means to manually ban undetected spam. However, nowadays, responding with /spam accomplishes this task more efficiently and directly. Therefore, the hash lookup doesn't concern me greatly, as the potential problem is relatively minor. Nevertheless, another more significant consequence of using the hash as a primary key is the possibility of generating incorrect results for UserNameByID and UserIDByName. Although I haven't investigated the implications of this mismatch, it doesn't instill confidence in me.

One potential solution could involve changing the primary key to UUID, which would help prevent unintended overwrites. The Message method still poses challenges, as it may restrict us to processing only one message. Yet, as mentioned earlier, this aspect doesn't trouble me too much, and I can accept that admin forwards will ban the most recent matched spam instead of the initial one. However, if you have a better suggestion, please share it. Additionally, if you believe the fix I outlined would be effective, feel free to submit a pull request with the proposed solution.

@coperius
Copy link
Contributor Author

coperius commented Aug 5, 2024

The pair chatID/MessageID is a unique key within a chat. We can use it. And if we're not concerned about deleting the wrong spammer, then by using this pair we can delete all spam messages with the text from the forwarded message and ban all spammers who sent them

@umputun
Copy link
Owner

umputun commented Aug 5, 2024

The reason why we have a hash of the message in the first place is because for forwarded messages, I have not found any way to retrieve the MessageID of the original message. I think this is intentional in Telegram for privacy reasons.

@coperius
Copy link
Contributor Author

coperius commented Aug 5, 2024

We will use the chatID/MessageID pair only for message uniqueness, to avoid introducing a UUID. It's already present in the table and will suit us just fine. We'll continue to search for messages to delete by hash. But now instead of one, we'll get a list and process them all (delete and ban the senders).

coperius added a commit to coperius/tg-spam that referenced this issue Aug 6, 2024
…mputun#107 )

- change primary key for message table
- add "deleted" field for message table to avoid retrying the processing
- add migration for message table
- add DeleteMessage function to mark processed and deleted messages
- change Messages function to return all messages by msg
fix and add unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants