World builder not intelligently ordering threads?

I have pointed out earlier in a thread how expander can be extremely slow due to it not utilizing multiple threads, but it also doesnt reallly seem as if the builder actually orders the stuff in an optimized fashion either.

Shouldnt the building process logically try to reach the slowest devices first that take relatively the longest to build? I tried optimizing my project somewhat but it still takes all the perlin noises etc before anything else while the expander just sits there waiting while nothing else keeps it/blocks it from being able to be built ALOT
sooner in the build process… Seems like a waste tbh and some performance increase can be gained here imho.

The sequence is simple, Generators–>Filters–>Outputs. The sequence depends on how data is created and how are operations performed on data. For example, if perlin generators are built after erosion, what exactly will erosion “erode”?

But if its awefully clear that it can do more work more efficient with the current paths in my build, and yet it still prioritizes the other (almost independant) perlin’s and whatever’s grouped with that, and literally waits for all of that to complete first, it seems inefficient.

Like i said earlier, currently it lets the expanders wait (which take by far the most time) while it could already have started that one way earlier, so the situation is that when all the perlin stuff is finished, everything is finished and the only thing remains is expander, which takes 2 minutes to complete, and only after that i can start doing other stuff again.

Could you elaborate using a screenshot of your network?

Sure.

Mind you I use alot of perlin noises atm but that will be optimized at the end of the project, dispite that the issue will remain either way.

So here you can see, I put some red arrows to indicate the flow and the seperate expander locations, I can see 2/3 expanders that couldve been triggered alot sooner if the system had some decent prioritization. also note the red circled part, as indicated it doesnt even pass any expander on the way to the end “file output”, yet that whole part is prioritized too.

Two expanders both take 2 minutes~ to complete each, and it really seems clear that they coulve been started alot sooner.

Just my 2 cents. PErhaps Im not getting it ?

World Machine has no way to know what devices are going to be fast and what are going to be slow – even the devices themselves can’t know that necessarily (small differences in parameters such as radius can make massive changes to runtime). Because of this, WM has no way to know how to sequence your build ‘optimally’ – for example, if your expanders were small radii, they might not be bottlenecks at all, and instead some other priority would be better. Thus, WM sorts the graph into topographical ordering and activates by level sets – that is, all devices with no inputs are built first, then the devices that depend on them, etc. In the absence of additional time-to-build hinting information, this is optimal.

Is it possible to get this hinting information? Yes, although due to different devices having different scaling behaviors it’s not as easy as you might think (a device that is faster than another at preview res might be much slower at high res). I’m not convinced it is worth the implementation time to do this versus adding additional features, however.

Lastly, there is a somewhat related issue that you might be running into here – that is, that all devices in a level set are processed before the next level set – so if there is one slow device left in the current set, the next set (of potentially many devices) doesn’t start until its finished. This could certainly be optimized. However, most of the time the slow device is one that has been multi-threaded, so the payoff here is much lower than you might expect – it makes sense to use those threads to finish the device rather than starting new ones. As a result, there’s been no need to invest time in this area so far when it could be spent elsewhere. The expander is one of the few slow devices that doesn’t multithread; if it did, there’d be very little different between any ordering.

Thanks for the elaboration Remnant.

Ofcourse, WM initially has no way of knowing what devices will take more or less time, again initially.

What about the indicators that show how long a device has been running? After a (re)build it shows how many seconds it took, cant the builder learn from these numbers and try to adjust the building order of that. Just thinking here.

Expander is currently such a time consumer, especially when one tries to tweak an environment, ive been wasting hours to get it right with the expander, etc. its way worse then Erosion (Because erosion imho isnt a necessary device for landscape creation, more an after effect)

Or as a first order approximation, each node can just remember the amount of time it took last time the world was built. Then it is trivial to identify the critical path.

This seems like it would work well in practice, even if it was not optimal.

Alright then. Explain to me this:

I just had to rebuild that small piece, the bloody thing goes and build the expanders one by one? That cannot be intended?

There is nothing that i can see why they wouldnt build them simultaneously, at all.

Please elaborate Remnant! I really dont get it, seems buggy. This could be sped up by 100 percent, imagine this happening throughout the whole project, this needs to be looked at, thanks.

No expert here on this subject, but wouldn’t devices with inputs need to know what the input data is before it can process it. For example, a combiner has to know exactly what it need to combine. Same with an expander.

I guess there could be an optimizer phase where a build could do a render and then optimize it for the next time a build is initiated, but it would require no data changes because then it would have to be re-optimized. So you wouldn’t gain much unless you were doing the exact same build over and over, and I don’t see the point in that ((optimize phase + rebuild optimized) > optimize phase).

That said, I’m pretty sure Stephen looks at optimizing a lot.

But, isnt this program based totally on seperate outputs? Isnt that why its so easy to change something in a project and only the corresponding stuff updates because everything is ‘layered’?

To me the output from after the curves and inverter are totally seperate especially since they got split up.

I just reshuffled my whole project again today and rebuilt it, i got ton of expanders, most are parallelized, literally NONE tried to build simultaneously.

Its actually not just the expanders acting funky, there are also illogical delays elsewhere, like between the transition from building all the noise devices to voronoi, or anything else, it simply waits for one whole group to complete before jumping to the next group, so resulting in only one last cpu core finishing off that group.

Its either a bug or a design flaw. Which is sad as this program is really great, but i could shave of 50/100% of my build time if this was fixed. Assuming its a flaw i could conclude the program can only multi thread with the exact same (type) of devices, grouped up.

Sorry for the rant, this program is awesome but this puts me off. It should be able to handle what i want just fine with a bit of code correction, this should be exactly the strenght of the type of system WM uses.

What you are seeing is the scheduler using statically known knowledge, which results in an overly conservative build order, as I mentioned in my previous reply.

Key to understanding the process is that the build scheduler groups together devices into level sets statically at the start of the build process.

In the image you posted above, in that small subset of devices, the curve device will build first; the first expander and inverter will build together; the second expander will build by itself; then the combiner will build.

In reality, of course, the inverter will build almost immediately, which means the second expander has everything it needs to start building, but it does not. So you see each expander slowly building by itself.

This is not a bug, but an engineering trade-off – static build order makes the builder much simpler to create (no cross-thread communication is needed), and overwhelmingly the slow devices in WM will be erosion, blur, thermal erosion – all things that are multithreaded. Instead of idling, the extra cores usually will go to finishing off the current device, which is superior in the presence of multithreaded devices. Only with single-threaded devices does this deficiency become notable. Since most of the time the static builder doesn’t appreciably increase build times, there has been no need to sink significant effort into improving it.

In fact although it would be nice to improve the builder to dynamic ordering, in terms of engineering time it would be much faster to multi-thread the remaining slow devices and get better performance to boot, then design, test, and debug a new build system.

One possible interim work-around is to insert “spacer” devices in the network to line things up the way you want them; for example a clamp or a checkpoint device inserted to balance the flow will make both expanders build at the same time. I don’t generally recommend doing this (it’s rarely worth it), but if your network is being scheduled in a very frustrating way it might be worth it.

It is very little work to multithread the expander device; I’ll see if I can bump that up the priority list for you.

+1 for multithreading the expander!

I’ve finished the infrastructure work to simplify the multithreading of various other sections of WM that have languished;

I have also extended multithreading to the expander as of this evening. It will appear in the next build.

Decided to go ahead and push this out with some bugfixes that were happening imminently.

Enjoy the faster builds :slight_smile:

omg remnant, that was fast! and thanks for the response.

Sorry for being such a plague to you, but this is great! I guess my blabbing about deficiencies has been working out after all :stuck_out_tongue:

this will save SO much time trying to tweak my world.

Wow. My build times for my current project went from 8+ minutes to a little bit over 3 minutes. damn. :smiley:

So happy :slight_smile:

Btw. Is there an optimized number of threads that is recommended like when compiling in programming languages? Lets say a 4 core CPU the amount of worker threads on 5 is ideal? Or is this not the case here?