Expander performance fix

Hello, is there any chance, that in foreseeable future will be this issue fixed (in 8k and higher resolution expander goes crazy and eats more time than 100 other devices combined)? Thanks a lot!

I think this is not something that can be simply fixed, as with a higher resolution, the load increases exponentially iirc. I do agree the Expander is a somewhat simple and not very “intelligent” device, as I suppose there is no real smart algorithm and just brute force expand all the pixels and then min/max. If that is the case, there is room for optimisation, but I don’t think we will see this problem turn from a exponential problem to a linear problem.

1 Like

To the best of my knowledge, there are no known algorithms for speeding up the underlying operation (morphological kernel). The problem is that as resolution increases, you have both more work to do (pixels) and you also need to make the kernel larger to produce the same effect. This gives a very lousy scaling behavior at high res.

There likely are ways to approximate the result in a much faster manner, but it’s not likely to be pixel-identical. Doing R&D on this is definitely on my list

Thanks Stephen! In vast majority of cases you dont need something super precise. Btw, I was trying to do similar operation in Photoshop and it took one second for 8k, so there must be some algorithmic solution.
I undersrand that scaling is exponential, but this is too much. ALL devices are scaling as expected, AFAIK only expander has this tremendous jump from 4k to 8k. Other wierd thing is, that from all that time 95 percent is preparation. So current situation is, that in cca 100 device project (many erosions, water ,etc…) 3 expanders are taking 50-60 percent of build time. In compare blur is MANY times faster, almost instantenous.
In my case I spend 90 percent of project time waiting for builds in higher resoutions for tuning small details. Usually I have most of work done in 1-2 weeks and then I spend several month by tuning very small details in high res builds (I work in game industry, current standard is lets say 0,5m per pixel and for artist speed of iteration is everything). So generally speed up of build by just 50 percent is something incredible…

I just was running some experiments, and there does seem to be an inexplicable performance cliff around 2k resolution in the Expander, where it doesn’t scale as expected, but much worse.

With the radius set to a fixed pixel size, it should scale linear with number of pixels; this means that you’d expect each resolution bump up to make it 4x slower (2 *2 more pixels).

Res : Seconds  : Scaling
128 : 0.47     : ~4
256 : 1.91     : ~4
512 : 7.7s     : ~4
1024 : 30.61s  : ~4
2048 : 299     : 9.7 (!)!

The scaling proceeds as expected up until 2k where it’s less than half as fast as expected. I didn’t check but I expect similar results going higher. My current thought is perhaps cache effects kick in at that point, but I haven’t investigated.

I will try to bump this up the priority list for one of the first follow-on releases to Mt Rainier.

Incidently in 4011, the Expander at least reports progress accurately and cancels, so you can get a sense of how it is progressing.

3 Likes

Thank Stephen! Going above 2k, my results are following: 2k->4k cca 8,9x, 4k->8k cca 24x.

So… although this is expressly against my current dev priorities… I got a bee in my bonnet so to speak about the performance cliff in the Expander. It clearly was mechanical in nature rather than algorithmic. I got hooked on seeing if there were any simple cache-friendly transformations that could improve things.

I think you’ll be happy with the performance improvement for 4012. Performance has improved very significantly at all resolutions, and the abnormal scaling issue has been corrected.

Using the same testing conditions as I expressed in the tests above that were in 3028 ( 1 thread, fixed radius), here’s a few new results for build 4012:

Res : Seconds
512 : 0.22s
1024: 1.58s
2048: 8.78s
4096: 39s
8192: 167s

Note that those figures are for a single core. The improvements do also scale well with core count; on my 16 core Threadripper machine, the all-thread 8192 resolution build time is 12s, which is 14x faster than a single core (nearly linear scaling).

There may still be algorithmic improvements to be had, but in terms of results per effort I’m very satisfied with the device as it is now. The absolute performance is as much as 35x faster. Not a bad optimization :slight_smile: Sorry it took so long to correct this!

4 Likes

WOW, thanks a lot Stephen. You cant imagine how big increase in productivity this means for me.

2 Likes

I’ll be honest - I expected some improvement, but was blown away by how much the poor cache behavior was holding back performance! The even better news is that some of the techniques applied can also be applied to other devices. There won’t be anything like the same level of improvement, but improvements of 25-30% or so might be possible and is still very worth it.

4 Likes

This is indeed very interesting. In my case I spend most of time in 8k/16k builds, so even 25-30 percent is huge increase in productivity. This expander fix would probably mean shortening of each “project” by maybe 30 percent, which means 1 month…add other 20-30 and we have other month. It can easily tripple my productivity. It you would ever manage to utilize GPU somehow, you can simply wipe any competition. From the point of view of features, WM is by far the best (and only able to handle more complex projects, for example because of tiled output), its only problem is speed.

2 Likes