-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Allow to, optionally, keep Unicode escape sequences in stringToPDFString
(PR 17331 follow-up)
#19884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Whenever we cannot find a destination we'll fallback to checking all destinations, to account for e.g. out-of-order NameTrees, and in those cases any subsequent destination-lookups can be made a tiny bit more efficient by immediately checking the already cached destinations.
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/c3901ba98dd851b/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/835a3cbad009c87/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.241.84.105:8877/c3901ba98dd851b/output.txt Total script time: 29.68 mins
|
From: Bot.io (Windows)FailedFull output at http://54.193.163.58:8877/835a3cbad009c87/output.txt Total script time: 60.27 mins
|
…ring` (PR 17331 follow-up) Currently *some* of the links[1] on page three of the `issue19835.pdf` test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty. The reason is that these destinations include the character `\x1b`, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section [7.9.2.2 Text String Type](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1957385) in the PDF specification. Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are: - Parsing named destinations[2] and URLs. - Handling "strings" that are actually /Name-instances. - Building a lookup Object/Map based on some PDF data-structure. *NOTE:* The issue that prompted this patch is obviously related to destinations, however I've gone through the `src/core/` folder and updated various other `stringToPDFString` call-sites that (directly or indirectly) fit the categories listed above. --- [1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27". [2] Unfortunately just skipping `stringToPDFString` in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues 14847 and 14864.
d763b3d
to
b629baf
Compare
/botio unittest |
From: Bot.io (Linux m4)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.241.84.105:8877/468d84bcae5e873/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_unittest from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.193.163.58:8877/17c380f3774666e/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.241.84.105:8877/468d84bcae5e873/output.txt Total script time: 2.41 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.193.163.58:8877/17c380f3774666e/output.txt Total script time: 8.19 mins
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you.
Currently some of the links[1] on page three of the
issue19835.pdf
test-case aren't clickable, since the destination (of the LinkAnnotation) becomes empty.The reason is that these destinations include the character
\x1b
, which is interpreted as the start of a Unicode escape sequence specifying the language of the string; please refer to section 7.9.2.2 Text String Type in the PDF specification.Hence it seems that we need a way to optionally disable that behaviour, to avoid a "badly" formatted string from becoming empty (or truncated), at least for cases where we are:
NOTE: The issue that prompted this patch is obviously related to destinations, however I've gone through the
src/core/
folder and updated various otherstringToPDFString
call-sites that (directly or indirectly) fit the categories listed above.[1] Try clicking on anything on the line containing "Item 7A. Quantitative and Qualitative Disclosures About Market Risk 27".
[2] Unfortunately just skipping
stringToPDFString
in this case would cause other issues, such as the named destination becoming "unusable" in the viewer; see e.g. issues #14847 and #14864.