Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Hardcoded textContentItem.trackingSpaceMin causes incorrect fake spaces #18768

Open
jukkaleh-atoz opened this issue Sep 20, 2024 · 0 comments

Comments

@jukkaleh-atoz
Copy link

Attach (recommended) or Link to PDF file

thames_bug_example_2.pdf
thames_bug_example_1.pdf

Web browser and its version

Chome 128

Operating system and its version

Windows 11

PDF.js version

v4.6.82

Is the bug present in the latest PDF.js version?

Yes

Is a browser extension

No

Steps to reproduce the problem

  1. From thames_bug_example_1.pdf highlight and copy "Robert Thorogood"
  2. Paste text to text editor. Result is "Robe rt Thorogood"

  1. From thames_bug_example_2.pdf highlight and copy "Thamesjoen murhat" from upper footer.
  2. Paste text to text editor. Result is "THame sjoe n murhat"

What is the expected behavior?

There should not be spaces. This doesn't happen when copying the same text using chome's built-in pdf reader.

What went wrong?

evaluator.js#addFakeSpaces is responsible for adding additional spaces. In these two cases it seems that letter after 'E' is the common denominator. Using hardcoded value to determine space might not be the right way when pdf contains Unicode mappings

From example 1

if (advanceX <= textOrientation * textContentItem.trackingSpaceMin) {
        if (shouldAddWhitepsace()) {
          // The space is very thin, hence it deserves to have its own span in
          // order to avoid too much shift between the canvas and the text
          // layer.
          resetLastChars();
          flushTextContentItem();
          pushWhitespace({ width: Math.abs(advanceX) });
        } else {
          textContentItem.width += advanceX;
        }
      } else if (
        !addFakeSpaces(advanceX, textContentItem.prevTransform, textOrientation)
      ) {
           .....
      }

When textContentItem.str === ['R', 'o', 'b'] we enter if statement correctly and increase the textContent width

advanceX = (136.31300000000002 - 134.513) / 18
advanceX: 0.10000000000000063
textOrientation: 1
textContentItem.trackingSpaceMin: 0.102

However when textContentItem.str === ['R',' o', 'b', 'e'] it goes to else if because advanceX is slightly too big resulting addFakeSpaces being called.
advanceX = (145.979 - 143.81900000000002) / 18
advanceX: 0.11999999999999982
textOrientation: 1
textContentItem.trackingSpaceMin: 0.102

textContentItem.trackingSpaceMin = fontSize * TRACKING_SPACE_FACTOR;
fontSize: 1
TRACKING_SPACE_FACTOR: 0.102

Link to a viewer

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants