You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From thames_bug_example_1.pdf highlight and copy "Robert Thorogood"
Paste text to text editor. Result is "Robe rt Thorogood"
From thames_bug_example_2.pdf highlight and copy "Thamesjoen murhat" from upper footer.
Paste text to text editor. Result is "THame sjoe n murhat"
What is the expected behavior?
There should not be spaces. This doesn't happen when copying the same text using chome's built-in pdf reader.
What went wrong?
evaluator.js#addFakeSpaces is responsible for adding additional spaces. In these two cases it seems that letter after 'E' is the common denominator. Using hardcoded value to determine space might not be the right way when pdf contains Unicode mappings
From example 1
if (advanceX <= textOrientation * textContentItem.trackingSpaceMin) {
if (shouldAddWhitepsace()) {
// The space is very thin, hence it deserves to have its own span in
// order to avoid too much shift between the canvas and the text
// layer.
resetLastChars();
flushTextContentItem();
pushWhitespace({ width: Math.abs(advanceX) });
} else {
textContentItem.width += advanceX;
}
} else if (
!addFakeSpaces(advanceX, textContentItem.prevTransform, textOrientation)
) {
.....
}
When textContentItem.str === ['R', 'o', 'b'] we enter if statement correctly and increase the textContent width
However when textContentItem.str === ['R',' o', 'b', 'e'] it goes to else if because advanceX is slightly too big resulting addFakeSpaces being called.
advanceX = (145.979 - 143.81900000000002) / 18
advanceX: 0.11999999999999982
textOrientation: 1
textContentItem.trackingSpaceMin: 0.102
Attach (recommended) or Link to PDF file
thames_bug_example_2.pdf
thames_bug_example_1.pdf
Web browser and its version
Chome 128
Operating system and its version
Windows 11
PDF.js version
v4.6.82
Is the bug present in the latest PDF.js version?
Yes
Is a browser extension
No
Steps to reproduce the problem
What is the expected behavior?
There should not be spaces. This doesn't happen when copying the same text using chome's built-in pdf reader.
What went wrong?
evaluator.js#addFakeSpaces is responsible for adding additional spaces. In these two cases it seems that letter after 'E' is the common denominator. Using hardcoded value to determine space might not be the right way when pdf contains Unicode mappings
From example 1
When textContentItem.str === ['R', 'o', 'b'] we enter if statement correctly and increase the textContent width
advanceX = (136.31300000000002 - 134.513) / 18
advanceX: 0.10000000000000063
textOrientation: 1
textContentItem.trackingSpaceMin: 0.102
However when textContentItem.str === ['R',' o', 'b', 'e'] it goes to else if because advanceX is slightly too big resulting addFakeSpaces being called.
advanceX = (145.979 - 143.81900000000002) / 18
advanceX: 0.11999999999999982
textOrientation: 1
textContentItem.trackingSpaceMin: 0.102
textContentItem.trackingSpaceMin = fontSize * TRACKING_SPACE_FACTOR;
fontSize: 1
TRACKING_SPACE_FACTOR: 0.102
Link to a viewer
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: