Advanced Question Debug version works, release version crashes

jedidia

shoemaker without legs
Addon Developer
Joined
Mar 19, 2008
Messages
10,891
Reaction score
2,141
Points
203
Location
between the planets
Well, I thought I'm finally ready to put Orbiter Galaxy up, but it turns out that the release version crashes orbiter, while the debug version works flawlessly (thanks god I checked the release version before upload!)

My experience is pretty limited, so I have no idea where to start looking in the code. I don't even know the technical differences between debug and release versions (I know what they are good for, but have no Idea what the exact difference is in terms of compiling). I hope someone with a bit more expierience can give me a hint where to look. The crash happens in the Orbiter code, not in mine, so I guess my release version is stomping on a pointer somewhere...

If it is of any help, here's the disassembly around where the crash occurs:

Code:
00C43EE9  inc         dword ptr [esp+10h] 
00C43EED  add         ecx,60h 
00C43EF0  sub         edx,1 
00C43EF3  jne         00C43EA2 
00C43EF5  mov         ecx,dword ptr [esp+18h] 
00C43EF9  cmp         ebx,esi 
00C43EFB  jge         00C43F34 
00C43EFD  mov         eax,dword ptr [esp+ecx*8+30h] 
00C43F01  mov         ecx,dword ptr [esp+14h] 
00C43F05  mov         edi,dword ptr [ecx+eax]           <<<<<<<Crash!
00C43F08  mov         eax,dword ptr [esp+24h] 
00C43F0C  lea         edx,[ebx+ebx*2] 
00C43F0F  mov         ecx,esi 
00C43F11  lea         edx,[eax+edx*8] 
00C43F14  sub         ecx,ebx 
00C43F16  cmp         dword ptr [edx],edi 
00C43F18  jne         00C43F28 
00C43F1A  fcom        qword ptr [edx+10h] 
00C43F1D  fnstsw      ax   
00C43F1F  test        ah,5

Oh yeah, I'm compiling with VS2008, under Win7, if that can be of any consequence.
 

Artlav

Aperiodic traveller
Addon Developer
Beta Tester
Joined
Jan 7, 2008
Messages
5,790
Reaction score
780
Points
203
Location
Earth
Website
orbides.org
Preferred Pronouns
she/her
Can you trace the crash in your code?
That is, where is the last point it's been there.
Does it happen on a specific OAPI call, or somewhere between callbacks?

Other than that, no remote debugging possible, disassembly is only good when it's alive, not as a piece of text.
 

orb

New member
News Reporter
Joined
Oct 30, 2009
Messages
14,020
Reaction score
4
Points
0
It doesn't seem to be in Orbiter.exe. Orbiter.exe module is loaded between 0x00400000 and 0x006E8000. At least contents of registers and stack could help somehow.
 

jedidia

shoemaker without legs
Addon Developer
Joined
Mar 19, 2008
Messages
10,891
Reaction score
2,141
Points
203
Location
between the planets
No, it isn't in Orbiter.exe, I just realised that after following some hunches... Since the crash happend not in my code, I assumed it was in orbiter, but there's another piece of foreign code at work here, which would be the texture library. Which I pretty much identified as source of the trouble by now, but I don't really know what to make of it: It is a release build anyways, so how can MY release-built code lead to a crash IN the library that doesn't happen if my code is a debug version? I don't get it, really.

Anyways, I noticed that config files for planets are exported up to the atmosphere block of the first planet with an atmosphere. That's where the first call to the library happens, to get the colors. I removed the call to the library and tried again. Now all cfg's were exported, but the crash happens immediately afterwards, when the textures are exported, so I'm about 99.9% sure that it's something in connection with the TexGen library.

Here's what my code looks like to get the atmosphere colors. It's a default initialization, since only the atmosphere composition is of interest for this call. The other function (for exporting the textures) is a lot larger, but since the crash happens with the one or the other I guess we can take this code as a minimum code example. If we fix it here, we can fix it in the other:

Code:
                    gen.atmos_comp = GetAtmosphereForTex(i, j);
                    gen.radius = 1;

                    gen.world_type = 1;
                    gen.atmos_type = 1;
                    gen.clouds = 0;
                    gen.ice = 0;
                    gen.name = "Guyneapig";
                    gen.water = 0;
                    
                    swtx_just_initialize(&gen);

                    BYTE a = gen.atmcolor >> 24;
                    BYTE b = (gen.atmcolor >> 16) & 0xFF;
                    BYTE g = (gen.atmcolor >> 8 ) & 0xFF;
                    BYTE r = gen.atmcolor & 0xFF;

                    Planets[i].Bodies[j].AtmoColor.X = r;
                    Planets[i].Bodies[j].AtmoColor.Y = g;
                    Planets[i].Bodies[j].AtmoColor.Z = b;

                    swtx_clear_gen(&gen);
 

Artlav

Aperiodic traveller
Addon Developer
Beta Tester
Joined
Jan 7, 2008
Messages
5,790
Reaction score
780
Points
203
Location
Earth
Website
orbides.org
Preferred Pronouns
she/her
If you're sure it happens in texgen library, your best bet would be to send me your release build and reproduction instructions.

-Is it related in some way to GPU generation being on or off?
-Are you sure you NULL the gen.intern before first use?
-Are you sure you don't send illegal values for atmospheric/composition/world types (i.e. are you sure you initialized all fields of the struct)?
-Are you sure you don't actually try to make a planet smaller than 1 km in radius?

Debug version tend to have different values in uninitialized variables than the release one, and lost pointers also point somewhere else, so such bugs can show up easily in one but not in the other.
 

jedidia

shoemaker without legs
Addon Developer
Joined
Mar 19, 2008
Messages
10,891
Reaction score
2,141
Points
203
Location
between the planets
-Is it related in some way to GPU generation being on or off?

Nope.

-Are you sure you NULL the gen.intern before first use?

Yes. It's NULLed right after declaration.

(i.e. are you sure you initialized all fields of the struct

all except atmcolor and inited.

Are you sure you don't actually try to make a planet smaller than 1 km in radius?

I didn't know that was a problem... Anyways, that might lead to a bug with some very small asteroids, but none of the systems I used for the tests have any. So yes, I'm sure that's not a problem here.

If you're sure it happens in texgen library,

I'm absolutely sure it is in some way related to it, but I have no expierience with release builds: If I run it from Visual Studio and there is a crash, would the debuger still show me the line where it crashed if it was in my code? If it doesn't, it might as well crash in my code, not in the library itself...

However, the Assembly seems to indicate that there's something going wrong with a DWORD operation, and your atmosphere colors are the only DWORD anywhere in my code.

I'll upload a version and send you the link, hopefully you can track the bugger down!
 

orb

New member
News Reporter
Joined
Oct 30, 2009
Messages
14,020
Reaction score
4
Points
0
I'm absolutely sure it is in some way related to it, but I have no expierience with release builds: If I run it from Visual Studio and there is a crash, would the debuger still show me the line where it crashed if it was in my code? If it doesn't, it might as well crash in my code, not in the library itself...
It won't show line in the source code, but it will show assembly and in what module that exception had happened. If that .dll module is using WinAPI to communicate with your module, then you can check in what function that exception occurred, but rather with some other debugger than Visual Studio, which will show exported functions' names.
 

computerex

Addon Developer
Addon Developer
Joined
Oct 16, 2007
Messages
1,282
Reaction score
17
Points
0
Location
Florida
The dword ptr is the size directive for some 32 bit value. You can't really tell much at all by just looking at a chunk of assembly. I would check the stack frame upon the crash to see if it contains anything useful. If it doesn't (as is usually the case) then comment out code, and keep narrowing your search by uncommenting.
 

jedidia

shoemaker without legs
Addon Developer
Joined
Mar 19, 2008
Messages
10,891
Reaction score
2,141
Points
203
Location
between the planets
More Confusion: There is a standalone exe file of the programm, and it hasn't occured to me to test that yet. Now that I did, I quite astonishingly found it working in release build. Which is REALLY weird, because it uses the same source files for its build, and it's a release build. So, technically, it should crash too.

But it doesn't, which means that the trouble isn't just between my code and the TexGen library, but also involves Orbiter. This is getting better and better... There is no doubt that the error is related with the calls I am making to the library, since it doesn't occur if I comment them out, but Orbiter obviously has a role in the drama. Ah heck!

The only thing I can get from the assembly is that the crash happens in Orbiter.exe, but I'm not sure wheather that means that it really happened there, or in the dll that is after all running with it.

Edit: slow progress by checking line by line. Ended up at one of my functions that gets called in connection With the library calls, and that apears to have a memory leak. Let's hope that's what I'm looking for...

---------- Post added at 06:37 PM ---------- Previous post was at 05:24 PM ----------

hrm... there were more bugs in that function than I have seen for some while. A memory leak, unitialised pointers, you name it. So yes, it was responsible for the crash, sorry for the false alarm, Artlav!

I wonder how that never showed up in the debuger, and why it worked with the standalone. There's still the crash later on, which seems not directly related to the earlier one as I assumed, but I have no doubt anymore that there's something screwed in my code and that I can find it if I look hard enough...
 
Last edited:

Artlav

Aperiodic traveller
Addon Developer
Beta Tester
Joined
Jan 7, 2008
Messages
5,790
Reaction score
780
Points
203
Location
Earth
Website
orbides.org
Preferred Pronouns
she/her
Loose pointers essentially turn your program into a Martian Weather Detector - the fallout could happen any time, anywhere.

You might encounter things like swapping two lines of code fixing the crash, or changing the size of some useless string.
That's because it re-positions the rest of the code in memory, so the pointer dangles into something else.

Of course, debug and release versions are major re-positions relative to each other.

So, double-check your pointers :)
compiler_complaint.png
 

dbeachy1

O-F Administrator
Administrator
Orbiter Contributor
Addon Developer
Donator
Beta Tester
Joined
Jan 14, 2008
Messages
9,218
Reaction score
1,566
Points
203
Location
VA
Website
alteaaerospace.com
Preferred Pronouns
he/him
One of the things you can do to help detect any pointer & buffer overflow problems is to turn on the MSVC++ debugger libraries' memory checks by adding this block of code to your ovcInit method or somewhere else early in your module's startup:

Code:
#ifdef _DEBUG
    // NOTE: _CRTDBG_CHECK_ALWAYS_DF is too slow
    _CrtSetDbgFlag(_CRTDBG_ALLOC_MEM_DF |
                   _CRTDBG_CHECK_CRT_DF | 
                   _CRTDBG_LEAK_CHECK_DF); 
#endif

Then your code will break into the debugger the instant the debug libraries detect any memory corruption. Very useful, at least for me. :tiphat: You can set the _CRTDBG_CHECK_ALWAYS_DF flag if you're really desperate and the other memory checks don't locate the problem, but _CRTDBG_CHECK_ALWAYS_DF really slows things down, so I normally leave that flag unset.

BTW Artlav, that cartoon is epic! :rofl:
 

jedidia

shoemaker without legs
Addon Developer
Joined
Mar 19, 2008
Messages
10,891
Reaction score
2,141
Points
203
Location
between the planets
hilarious XKCD strip
:rofl:

yeah well... they got to hate me by now. I hope they don't come one day and kill me in my sleep. Coding would sure be a dangerous hobby if that was the case :shifty:

Anyways, I got the other one fixed too. Turns out I defined an Array too wide, so the loop was walking calmly into the uninitialized elements.
The weird thing is, it was in the same function as the other bug came from and worked the first few times around, but not the other few. And I thought computers were reliable in at least failing consequently when the user commands it to fail!

Looks like I have GO for launch now. But of course it will take Orbit Hangar 24 hours to get the package through security since it's my first, so look out for it tomorrow somewhen around this time...
 

jedidia

shoemaker without legs
Addon Developer
Joined
Mar 19, 2008
Messages
10,891
Reaction score
2,141
Points
203
Location
between the planets
Usually ALPHA-testing brings up a lot more Bohrbugs and Mendelbugs... :lol:
 
Top