Details About Server Disconnect Problem - some insights

So, Saturday night 8/29, I had an occurrence of problem that had occasionally popped up with one of my worlds: I would start the game world…within 30 seconds, whether players were joining or not, I would get a “Lost connection” alert followed by a “Connection re-established” alert. But no interface buttons would respond but Return to Forge - and attempts to refresh would get to a “No active game” page on the Foundry setup page.

I reached out to the Discord channel, and they suggested disabling all modules in the Edit World page - which worked, and I was able, once in, to re-enable all mods. This remained stable for the rest of the session with my players.

However, I was irritated because I have another world for a game I run on alternate Sundays, which has all the same mods installed and enabled. Different maps, players, and characters - but many of the same journal entries, and access to all the same shared compendiums.

After Saturday night’s session, I proceeded to test the two worlds. The Sunday world never had a problem logging in. The Saturday world continued to have the login disconnect every time, UNLESS I disabled all mods in Edit World, and re-enabled them in session. The Saturday world would then remain stable for the remainder of that session.

So, I proceeded to extract all the content from the Saturday world into Json exports. Then Created a new world with the same settings. I then proceeded to import all the individual data into the new world, and enabled all the modules that I use, and made sure their settings matched up what I used before.

Lo and behold, the new Saturday game world loads more quickly, and never has the server disconnect error. I tested logging in and out after each block of content (journals, actors, items, scenes, etc), after each swath of module settings, and after the final swing of permission edits for player visibility of content. Not a problem once.

So, while the server disconnect, ID’d as a server memory issue, was clearly having some interaction with the modules (resetting them avoided the crash in the old world file), it seems that it may not be wholly a module issue, but perhaps some additional setting stored or corrupted in the overall world.

If you are having this problem, and can’t get around it, you may wish to try exporting your data, and reconstructing your world file. My original Saturday game’s world was made via a local Foundry install, and then uploaded when I got the Forge hosting account. Perhaps something happened in the original creation from back in the .5 versions of the program, or something did not upload correctly - those are pure conjecture, but a clean world built on forge with the discrete jsons re-imported seems to have solved my problem.

Thank you for your insights on the problem.
I wonder what’s different between the two worlds to make one work and not the other. If you have both worlds in your account still and can authorize me to have a look at your data, I’d check it out to try and figure out the cause.
For your information, this is what the problem actually is :


Basically, the Foundry server itself starts using more and more RAM until it runs out of RAM and the process gets killed. A Foundry server will generally use between 100MB and 200MB, it rarely ever goes above that, but in some specific situations (like what you had), it can start using up over 2GB of RAM.
The thing with modules, is that, there are also memory spikes that can happen when modules are flooding the server with a huge amount of requests at once. You can read more about the cycle-token-stack module which had this issue (it’s fixed now) here : https://github.com/aka-beer-buddy/fvtt-cycle-token-stack/issues/11
I know that about-face does something similar, I’m not sure about others but there must be some that can cause the same problem, which is why disabling all the modules helps.

I read up on that post when you and the other folks linked it in the discord when I was asking about it - made perfect sense - I just couldn’t pin down which particular thing was the culprit (especially since I don’t use either of the well known problem modules you noted).

That said, yes, I’d happily provide authorization (let me know if there is anything I need to do other than simply saying so here in text) - the old world is the 01_fr_tsar folder for the Saturday game. The other worlds are the Sunday game (which has been on the server since I began hosting with you) and the remade Saturday game, both of which have shown no login problems to date.

No, that was all that was needed. And I checked, it’s a small world, Foundry doesn’t even go above 120MB while I was testing it. So I have no idea why it would go that high for you. At least, not with me copying just the world. So this has to be a module that did all the damage. I think the reason why it worked on one world and not the other might be because you had the problematic module enabled in 01_fr_tsar but didn’t enable it on the new copy you made.
So I copied your entire Data folder, including your modules, to test, and indeed, the RAM usage shot up right away to 1GB as soon as I loaded the world. So now I need to poke until I find which module is causing it.
Will let you know when I figure it out!

FINALLY figured it out! This was a though one because I wasn’t able to find “the one module” that was causing it. It turned out to be two modules that had to be enabled together.
Specifically, the problem is the Compendium Browser. When you have it enabled, it will use up a lot of RAM, but when you have it enabled along with your Shared compendium, that will cause the server RAM to spike to 600MB/900MB when the page loads. I think the module on load will flood the server with requests to load every compendium and serve its data, which is bad in every aspect because it’s flooding the network, forcing the server to unnecessarily read files it doesn’t need, process them and load them in RAM, and it puts strain on the client side’s RAM as well.
It seems that it wasn’t that big of a deal for most compendiums, but when you enable your shared compendium module which had 75MB of compendium data in it, that’s where things got bad.
Note that your new copy of the world also has thos emodules enabled, and I just tried loading it, and it got to 950MB of server RAM used, so you’ll have the same problem potentially.

Well, there we go!

Thanks for figuring that out.

I like the compendium browser for filtering, but I don’t need it except when planning between sessions. So, it can be safely disabled in all my worlds.

I feel your pain, I’ve been trying to set up a new file server and the samba shares weren’t working AT ALL, but I finally figured out the confluence of problems. So I get the “It’s not just ONE culprit” problem.

With that known, I’m going to turn that bad boy off, and hopefully not see that server spike going forward. Because, let’s be clear, the shared compendiums are WAY more useful than that module. So thanks for the wonder that is the shared compendiums.

Again, thanks for the investigation. My players will be thrilled to not ping pong during login.

PS: I just disabled it in both the remaining worlds - in the fresh tsar game I was able to login, disable it, and hit save module settings as it had the spike event…and it reconnected, and then was stable.

1 Like

Actually, while on the topic…is there a way, by module or otherwise for the GM to SEE what the server memory usage is during game, such that if things start getting “spikey”, it’s a red flag that something we are doing or using could be causing a problem, so we as users could provide more or clearer details on what we were doing before a crash or other similar error?

Just curious.
Thanks

No, unfortunately, there’s no way to check that. Ideally, it should also not be the concern of the user.

Understood, and agreed.

The IT manager in me just loves data useful for when I need to get external support!

Anyway, thanks again!

1 Like

I am experiencing the same problem of “Lost connection” alert followed by a “Connection re-established” alert, followed by “No active game” page on the Foundry setup page.
I have uninstalled all modules and don’t use a shared compendium - nevertheless it is impossible for all of my players to even log-in rendering this service unusable.
Is there any insight from the forge team as to what else can be done to find out about the underlying issue?

I just checked the server logs and I see the same issue where the server runs out of RAM and gets killed.
But since you said you’re not using any modules, I looked at your files and found this :

Basically, you have 80MB in the actors, with an image pasted entirely as text within the biography of one of your actors. And you also have another 120MB in the journals where an image is pasted as text within the content of a journal entry.
This makes your world over 200MB in size total (while it should be a couple of MBs at most). Which means that when you connect to Foundry, you’re downloading the full 200MB every single time, and all your players are too, and Foundry is caching that data in RAM (so the usual 100MB of RAM + another 200MB for all this data) and caching the request that it’s sending you (so probably another 400MB, we’re at 700MB total already, which is 5 times more than it should be), and if you all connect at the same time (or close enough), Foundry will duplicate that data again until everyone is done downloading, so 5 people connecting, means you’re already at a few GBs of RAM instead of the expected 100MB.
My advice, connect to your game, find which actors and journals have an embedded image inside of it, and remove that. Instead, upload that image to your assets library and link to it.

Hopefully that helps, and thanks for showing me another possible use case for why things can break, I hadn’t thought of something like that happening.

Thank you very much for your fast reply and for looking into this! I will remove all the images (which can take a while) and try again. I was using the copy-paste function for inserting images into the journals and character biographies since this was a much faster workflow for me, rather than first uploading the images and then linking them. Knowing the kind of issues this can cause, I will of course rethink this approach :wink:

No problem, though I’m surprised as copy pasting shouldn’t work that way. I think that there was a module that specifically did that and I think that it recently had an update to stop using this method which is far from efficient.
Good luck with the cleaning up process! :+1:

Ahh that’s good to know, I will try to find out which module this might be.
In any case, the clean-up is done and now everything works flawlessly!
Many thanks again for your help!!