lib0 engine » Forums » Arch Linux is not supported due to confirmed Mesa bug.
Procedural
34 posts
1 project
#13690 Arch Linux is not supported due to confirmed Mesa bug.
3 days, 15 hours ago Edited by on Dec. 8, 2017, 10:56 a.m.

Here's a screenshot of lib0 engine running with a custom Mesa build (and now Arch Linux's stock Mesa 17.2.6) on Manjaro MATE 17.0.4:



It looks normal, but it's actually broken because when I try to switch focus from a terminal to lib0 engine window the whole desktop screen turns black.

Also, Arch Linux crashes the same binary that worked a few minutes before on Ubuntu 16.04 like this:

First it returns from glXGetProcAddress just fine:



Then it moves returned pointer with no issues too:



But watch this: if I step one assembly instruction further -- it crashes. For no reason.



It executed glXGetProcAddress several times before that point, what's the problem on this particular line? I don't know. No one knows.

This forces me to ban any other distro that can't execute code that works just fine on Ubuntu 16.04 and Mesa 17.0.7 until a person who understands the inner workings of graphics drivers will explain this random behavior. At no point I, as a software developer, should care about any of this given the same binary runs perfectly fine under another environment.

lib0 engine requirements are updated to exclude any other distro except Ubuntu 16.04 which so far at least works reliably.
mmozeiko
Mārtiņš Možeiko
1517 posts
1 project
#13693 Arch Linux is not supported due to confirmed Mesa bug.
3 days, 8 hours ago Edited by Mārtiņš Možeiko on Dec. 7, 2017, 9:03 p.m.

That's not what "confirmed bug" means. Confirmed means the reason is clear and it can be explained why particular place crashes.

I am 100% sure this is not a bug in mesa, but bug in your code. Disassembly clearly shows that.

First of all - have you tried running code without any hacks? libc replacements/_start/syscalls? Because bug can be in one of those - it messes up something that compiler does not expect (for example, stack alignment), and thus you see crash later. And it was not happening before because you were lucky, the compiler produced code that did not trigger the bug.

Crash doesn't crash "without reason". There is a reason. The instruction you highlighted can crash if %rcx register contains invalid address. Check the address in debugger and why it is wrong.

I could not compile your code on my ArchLinux:

1) I needed to add "-fno-stack-protector" to compiler arguments in build_main.sh. Otherwise it failed with missing __stack_chk_fail symbol.

2) I needed to change clang to gcc in build_lib0.sh. Clang takes forever to compile, not sure if it finishes at all. Waited like for 1 minute. gcc finishes instantly.

3) I needed to change order of scripts in build.sh - build_texteditor.sh should come before build_lib0.sh

After I could run ./main. Application launched. No crashes (on ArchLinux with mesa 17.2.6). But I don't see anything in window. Its completely empty and not refreshing.

Btw your stdlib_gettimeofday implementation is super inefficient. Real gettimeofday implementation doesn't do syscalls (which are very slow). It simply reads shared memory region which is super fast operation: https://blog.packagecloud.io/eng/...ystem-calls/#virtual-system-calls
mmozeiko
Mārtiņš Možeiko
1517 posts
1 project
#13695 Arch Linux is not supported due to confirmed Mesa bug.
3 days, 8 hours ago Edited by Mārtiņš Možeiko on Dec. 7, 2017, 8:53 p.m.

Ok, I realized I need to put all the files in /tmp/lib0_engine. Then it runs. And then I see the crash.

The bug is in your code. I can explain why. The problem is that you are linking to libGL.so. This means there will be glAttachShader and other global symbols loaded into your process. These symbols are in read-only section. That's why following line will crash:
1
  glAttachShader = (void *)glXGetProcAddressARB((unsigned char *)"glAttachShader");
Because it will try to write to read-only location (that's why I asked to check %rcx value above, you would see that it belongs to libGL.so memory map).

Why this does not happen in main.c? ./main is not relocatable that's why it will use local glAttachShader functions (why? that's a long topic, read up how dynamic linking is implemented in Linux).

The fix is to not name global symbols (functions or variables) with same name as imported symbols from shared libraries. Name them differently if you want to keep them global. This is what glew, glad or others are doing for GL functions - renaming them to not conflict with original GL symbols.

In 9999 cases of 10000 the bug is in your code, not in libraries/compiler/hardware.
Procedural
34 posts
1 project
#13697 Arch Linux is not supported due to confirmed Mesa bug.
2 days, 22 hours ago

I can explain why. The problem is that you are linking to libGL.so.



...

THANK YOU!

Dependency on libGL is removed!

Any non-deterministic garbage that alters behavior of my program will be annihilated out of my code. Although at first I simply tried to dlload libGL.so, but turns out even this way global gl* procedures get defined in read only locations somehow, because I kept getting crashes even though a binary wasn't linked to libGL.so directly anymore. So I decided to encapsulate all the procedures I use in a struct, like I did for Dear ImGui state already. This way they do not change somehow still globally located alien procedures from outer space.

Btw your stdlib_gettimeofday implementation is super inefficient. Real gettimeofday implementation doesn't do syscalls (which are very slow). It simply reads shared memory region which is super fast operation: https://blog.packagecloud.io/eng/...ystem-calls/#virtual-system-calls

Yeah, I know, stdlib is a work of 20 minutes, I just copied and pasted gettimeofday from musl, they use syscalls, I didn't bother to change it because I just wanted to get rid of garbage GNU glibc. At first I wanted to link against static musl library, but it turns out you can't mix glibc and musl, they (hey, maybe because of global stuff too) interfere with each other and X11 calls just crash. The official repsonse of musl developers on forums is "build your whole distro with musl only".

Why this does not happen in main.c? ./main is not relocatable that's why it will use local glAttachShader functions (why? that's a long topic, read up how dynamic linking is implemented in Linux).

And why this crash does not happen on Ubuntu 16.04?
mmozeiko
Mārtiņš Možeiko
1517 posts
1 project
#13698 Arch Linux is not supported due to confirmed Mesa bug.
2 days, 21 hours ago

Yeah, shared libraries work differently on Linux than Windows. Symbol table & dynamic symbols are more tricker on Linux. On Windows dll files are pretty isolated.

I guess on Ubuntu they build libraries differently. Different linker hardening flags? Import relocations (or whatever they are called) are not read-only and you are overwriting them? Maybe.
Procedural
34 posts
1 project
#13700 Arch Linux is not supported due to confirmed Mesa bug.
2 days, 16 hours ago Edited by on Dec. 8, 2017, noon

I uploaded version 3 with build and SIGSEGV fixes for Arch Linux: https://procedural.itch.io/lib0-engine/devlog/17544/version-3-is-here

But the black screen Mesa bug I talked about in the first post is still present, so Arch Linux won't be supported by me for now until I see working result on this system on my machines.
mmozeiko
Mārtiņš Možeiko
1517 posts
1 project
#13703 Arch Linux is not supported due to confirmed Mesa bug.
2 days, 9 hours ago Edited by Mārtiņš Možeiko on Dec. 8, 2017, 6:11 p.m.

I don't have any black screen on my Arch. It renders magenta triangle with all the imgui UI visible.
Tried on desktop with Nvidia GPU, and on laptop with Intel GPU - both work fine.
Something is weird in your Arch setup. This is 100% not a mesa bug.
Procedural
34 posts
1 project
#13704 Arch Linux is not supported due to confirmed Mesa bug.
2 days, 7 hours ago

This is 100% not a mesa bug.
If it helps you sleep at night! :)