Skip to content

[api-minor] Simplify clean-up of page resources after rendering #19368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 26, 2025

Conversation

Snuffleupagus
Copy link
Collaborator

After PR #2317, which landed in 2012, we'd immediately clean-up after rendering for pages with large image resources. This had the effect that re-rendering, e.g. after zooming, would force us to re-parse the entire page which could easily lead to bad performance.
In PR #16108, which landed in 2023, we tried to lessen the impact of that by slightly delaying clean-up however that's obviously not a perfect solution (and it increased the complexity of the relevant code).

Furthermore, the condition for this "immediate" clean-up seems a bit arbitrary to me since a page could easily contain a large number of smaller images whose total size vastly exceeds the threshold.

Hence this patch, which suggests that we remove the conditional and delayed clean-up after rendering. Compared to the situation back in 2012, a number of things have improved since:

  • We have multiple caches for repeated image-resources on the worker-thread[1], which helps reduce overall memory usage and improves performance.
  • We downsize huge images on the worker-thread, which means that the images we're using on the main-thread cannot be arbitrarily large.
  • The amount of available RAM on devices should be a lot higher, since more than a decade has passed.

A future improvement here, for more resource constrained environments, could be to instead clean-up when actually needed using e.g. WeakRefs (see issue 18148).


[1] More specifically:

  • LocalImageCache, which caches image-data by /Name and /Ref on the PartialEvaluator.prototype.getOperatorList level.
  • RegionalImageCache, which caches image-data by /Ref on the PartialEvaluator-instance (i.e. at the page) level.
  • GlobalImageCache, which caches image-data by /Ref globally at the document level.

After PR 2317, which landed in 2012, we'd immediately clean-up after rendering for pages with large image resources. This had the effect that re-rendering, e.g. after zooming, would force us to re-parse the entire page which could easily lead to bad performance.
In PR 16108, which landed in 2023, we tried to lessen the impact of that by slightly delaying clean-up however that's obviously not a perfect solution (and it increased the complexity of the relevant code).

Furthermore, the condition for this "immediate" clean-up seems a bit arbitrary to me since a page could easily contain a large number of smaller images whose total size vastly exceeds the threshold.

Hence this patch, which suggests that we remove the conditional and delayed clean-up after rendering. Compared to the situation back in 2012, a number of things have improved since:
 - We have *multiple* caches for repeated image-resources on the worker-thread[1], which helps reduce overall memory usage and improves performance.
 - We downsize huge images on the worker-thread, which means that the images we're using on the main-thread cannot be arbitrarily large.
 - The amount of available RAM on devices should be a lot higher, since more than a decade has passed.

A future improvement here, for more resource constrained environments, could be to instead clean-up when actually needed using e.g. `WeakRef`s (see issue 18148).

---
[1] More specifically:
 - `LocalImageCache`, which caches image-data by /Name and /Ref on the `PartialEvaluator.prototype.getOperatorList` level.
 - `RegionalImageCache`, which caches image-data by /Ref on the `PartialEvaluator`-instance (i.e. at the page) level.
 - `GlobalImageCache`, which caches image-data by /Ref globally at the document level.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/bd2ee5a99045088/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/f219206796a7d21/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Success

Full output at http://54.241.84.105:8877/bd2ee5a99045088/output.txt

Total script time: 28.80 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: Passed

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/f219206796a7d21/output.txt

Total script time: 59.28 mins

  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: Passed

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. It's always hard to predict the exact effects of a patch like this given the variety of environments PDF.js runs in, but I agree with the analysis in the PR description and I don't immediately foresee any issues myself either, so let's do this. Thanks!

@timvandermeij timvandermeij merged commit 4f1078d into mozilla:master Jan 26, 2025
9 checks passed
@Snuffleupagus Snuffleupagus deleted the rm-delayed-cleanup branch January 26, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants