Runs forever?

So i was trying to generate something pretty high res (yes, i need it, no, tiled build doesn’t fit the bill).
And i’m wondering if i should let it run, as you can see the previous device completed 100% after about 30 seconds and since then it’s been running for . . . . over 26 hours! doing seemingly nothing.
Any clues?

Before jumping at ram being the culprit, there’s little paging going on (it’s a 64GB ram server doing only world machine)

Sounds normal, to me.

65535 is an “omg” resolution. It says 76GB, in the screenshot, but it actually is always higher than that.
So it’s paging in and out like crazy, which slows the whole process quite a lot.

no but, there’s a diference between slowing and going from 30 seconds (from 0 to 100%¨for erosion) to 25 HOURS just in between 2 steps doing nothing, i’m really wondering what it was doing.
Also the estimate always seems pretty fine, this graph is taking just a few minutes at 32K and i’ve generated graphs at 128K just fine, just stumped there as i’ve got no clue what it was no clue on, previous step 100%, next step not started, and stayed like that 26hours.

This is definitely a forensic investigation… you should not see a freeze with no activity if neither WM or the operating system is actively swapping.

I don’t currently have the ability to reproduce this situation as my dev machine tops out at 32GB of RAM, but I will work from what is visible to see if anything is up.

Most likely – I suspect at that resolution there might be a counter overflow causing the issue. Also, the erosion is extremely unlikely to finish as quickly as indicated there, even if the erosion amount was set extremely low.

If this can help you i can prove you remote desktop access to the server, you’re free to trash it / install whatever debugging tools you need there, even get visual studio express + your code to debug if you want, i’ll just re format when you’re done (it’s pretty much dedicated to WM anyway, so helping you fix it is important to me as i’m releasing that server soon and need to decide if i go for a “much bigger” server, but that wouldn’t make sense if large builds just hang and it’s substantial money)
Am subscribed to the thread so just post here if you wanna take up that offer or email me to ronan.thibaudau@rt-informationtechnology.com

Well, I’m fairly certain I’ve found the counter overflow issue – reducing the size by a couple pixels per direction in this case will probably work – but there is a second issue you’re going to run into here:

You need at minimum to have enough main memory to maintain every input and output the device needs to access at once. For a 65536^2 map, that is 16GB per heightfield, and Erosion needs to keep 4 minimum in memory at once – and about 33% more than that if you enable Geologic Time. On a 64GB server, you will definitely be hitting swap, which is essentially the same as “will not work”; Something smaller like 60000 x 60000 should fit in your envelope however (and should not have counter issues either).

World Machine was certainly not designed for such large non-tiled world sizes; and building a non-trivial world at that size remains quite a challenge. There is no reason to have artificial barriers however, so I will roll the erosion counter issue fix into the currently-pending beta.

Cheers

If you can also fix this : right now wm lets you allocate at most 80% of ram to itself. Kinda hurts on a 64+gb server! Also swapping should be ok with 2intel ssd in raid 0. Is it possible to get a build without the swap management at all? I expect it should run ok if i allocate 128gb to wm & let windows handle paging

That’s not a bad idea. I’ve gone ahead and added an ability to disable WM-driven swap as well as allowed for up to 100% mem usage with WM-controlled swap. I am not sure if even with SSDs the swap speed would be acceptable, but its certainly worth letting that option be there!

Do you have an eta on next beta? Would love a build before end of month to know if i renew my server

I just released it a moment ago to the beta site actually… give it a try and see if the fix works!

Will test it immediately then on the same graph & server & let you know :slight_smile: hopefully it will take a “bit less than 25 hours” between 2 steps now hehe.