Reduce the memory footprint of wide tile commands#1325
Conversation
|
Hey @LaurenzV, awesome work! I’m adding myself as a reviewer to indicate that I plan to review the PR, but I’ll likely be able to get to it on Monday. 🙏 |
| pub strips: Box<[Strip]>, | ||
| #[cfg(feature = "multithreading")] | ||
| /// The index of the thread that owns the alpha buffer. | ||
| /// Always 0 in single-threaded mode. |
There was a problem hiding this comment.
If I’m calculating the struct size correctly, it’s now 32 bytes instead of 24 bytes in single-threaded mode, so we’re paying an additional 8 bytes even though this field isn’t used there. While we probably won’t have thousands of clips simultaneously, would it still make sense to keep this behind a feature gate?
There was a problem hiding this comment.
Is this really a problem though? That struct is only used to keep track of the clip stack, and realistically you aren't going to have more than a few nested clip path, so the impact on memory is very negligible I think (unlike the wide tile commands, where you could end up with thousands or even tens of thousands of commands depending on what you are drawing)? I could add it back, but the main reason I removed it is that it keeps the code simpler because we don't have to slap on the cfg attribute everywhere we use it.
There was a problem hiding this comment.
No, it’s not a problem, I just think paying for something you don’t use isn’t ideal, even if the cost is negligible and the code is slightly simpler.
| tiles, | ||
| width, | ||
| height, | ||
| props: Props::default(), |
There was a problem hiding this comment.
Would it make sense to assign reasonable starting capacities for vectors in Props, specifically for FillProps? I think we can assume it will need to grow during the first render since it’s the primary data-holding structure. It might also be worth checking whether this shows any impact in benchmarks.
There was a problem hiding this comment.
Any suggestions for that? I think the impact will be negligible since, as before, we aren't likely pushing a lot of elements into it, but if you have a preference for the starting capacity I can add that. I don't think it would have a performance impact because the benchmarks (or at least the Blend2D benchmarks) runs with a "warm" render context, i.e. a render context that has been previously rested.
There was a problem hiding this comment.
No worries then, choosing the right capacity size isn’t easy. Regarding the warm render context, are you saying the render context isn’t recreated from scratch, but instead reset after each render?
| pub strips: Box<[Strip]>, | ||
| #[cfg(feature = "multithreading")] | ||
| /// The index of the thread that owns the alpha buffer. | ||
| /// Always 0 in single-threaded mode. |
There was a problem hiding this comment.
No, it’s not a problem, I just think paying for something you don’t use isn’t ideal, even if the cost is negligible and the code is slightly simpler.
| tiles, | ||
| width, | ||
| height, | ||
| props: Props::default(), |
There was a problem hiding this comment.
No worries then, choosing the right capacity size isn’t easy. Regarding the warm render context, are you saying the render context isn’t recreated from scratch, but instead reset after each render?
| /// The index of the thread that owns the alpha buffer | ||
| /// containing the mask values at `alpha_idx`. | ||
| /// Always 0 in single-threaded mode. | ||
| pub thread_idx: u8, |
There was a problem hiding this comment.
I guess we can consider this negligible in ClipProps as well? And for FillProps, not having thread_idx wouldn’t actually change the struct size, right?
There was a problem hiding this comment.
I think so, yes. As mentioned though, I can change it back in case you consider it important. 😄
That's how it's currently done in the Blend2D benchmark suite, yep! |
|
@grebmeg Thanks a lot!! I'll try to take a look at your other PRs soon. |
This PR reduces the memory footprint of wide tile commands from 48 bytes to 16 bytes. This is achieved by introducing a number of changes:
This leads to some nice speedups in many cases:

