Description of problem: VirtualGL is broken on Cauldron for AMD A10-7860 using amdgpu or radeon driver and mesa. The same hardware has been working flawlessly using Mageia 6. As the version of VirtualGL on Cauldron and Mageia 6 is the same it seems more likely that the incompatibility with VirtualGL is due to some combination of Mesa and the X driver + kernel module. Version-Release number of selected component (if applicable): I haven't been able to get it working since I installed Cauldron in October, so all versions of kernel, x server and Mesa from then to now have exhibited the problem. How reproducible: Starting with a properly configured VirtualGL server is difficult as the server config script shipped with the Mageia VirtualGL rpm has no "knowledge" of either Mageia or LXDE/LXDM. Configuration must be completed manually and I replicated the settings and changes which work on the same vgl server on Mageia 6. Running a command such as vglrun glxspheres64 is sufficient to show the problem. Typical runs from Mageia 6 and Mageia 7 are attached below. Steps to Reproduce: 1.Boot to a working desktop 2.In lxterminal, run glxsperes64 to confirm OpenGL acceleration is working properly 3.In lxterminal run vglrun glxspheres64
Created attachment 10543 [details] shell output showing error
Created attachment 10544 [details] successful run in Mageia 6
Created attachment 10545 [details] result of pbinfo on Mageia 7 Some additional information may be found at https://forums.mageia.org/en/viewtopic.php?t=12570 Since then I have explored as far as I can to see if I could confirm that the only problem is the failure to create a pbuffer, whatever that is. I have run the pbinfo program on both the working Mageia 6 and the non-working Mageia 7 and only got an information overload. It looks as if the code which tries to discover a usable "visual" is getting a list of possibles which starts with one or more which are not in fact usable. As the code takes the first one offered, and it fails, then we get the error and stop. In case the pbuffer/visuals information is useful, I have attached the Mageia 6 and Mageia 7 results.
Created attachment 10546 [details] pbinfo from Mageia 6 - the working platform
Assigning to the registered virtualgl maintainer.
CC: (none) => marja11Assignee: bugsquad => rverschelde
Created attachment 10563 [details] pbdemo as provided for VirtualGL with extra info The pbdemo program was built from the source provided with VirtualGL with a short "debug" piece of code enabled. The result here attached shows details of 276 possible framebuffer configurations matching requirements of single or double buffered, with or without depth buffer. In this output from the console, 108 failures to create a drawable were recorded. Presumably 168 of the configurations were suitable for creating the drawable. Then again, 216 of them support pbuffers. Looking at the attributes passed to ChooseFBConfig(), they each require GLX_DRAWABLE_TYPE to have GLX_PBUFFER_BIT. I don't understand any of this, but in particular I don't understand why fb configurations supporting pbuffers should be selected and then deny they can do it. The only (failed) attempt to create a single buffered with depth buffer fails because it can't get XVisualInfo from FBConfig, but then the IDs chosen as possibly suitable start with ID 0xbf and the program tried to use ID 131 (0xB8). There is just one conclusion I feel it is safe to draw from this; the problem does not lie in VirtualGL. The pbdemo program does not use anything other than system OpenGL stuff (Mesa) - VirtualGL is not involved. Do we have any OpenGL gurus or should I report this "upstream"?
oops. Stupid typo in Comment 6 ID 131 (0xB8) should read ID 131 (0x83)
Created attachment 10603 [details] Console output running servertest from VirtualGL git build 2018-12-25 As my expectation that the cause of this problem lies in VirtualGL is dwindling to nothing I thought it would do no harm to see if the recent tweaks to VirtualGL, which may help when it is used with VirtualBox, might accidentally have some other benefit for me. The results here posted are sadly no different from earlier runs of this server test script (one of the VGL utilities) and merely reveal that the problem is still present. I have looked into the possibility of building earlier versions of Mesa, going back to the version I currently use on Mageia 6. From what I can see I will also need to build the corresponding Xorg server and amdgpu video driver. Can anyone help with information regarding which versions of Mesa/Xorg were first used in Cauldron prior to me starting testing in October? Is there any way to obtain historical packages now, other than fetching the sources from Xorg and building from scratch? I would be very grateful for any small assistance as I must get VirtualGL working with my hardware before committing to migrate multiple machines from Mageia 6.
Created attachment 10604 [details] The real record of console output from servertest I am definitely getting old! This is what the previous attachment should have been. Apologies, Richard
Is this bug considered dead in the water?
I pushed VirtualGL 2.6.1 to Cauldron, but as per comment 8 I assume that it won't fix the issue. I'm not familiar enough with VGL nor its interaction with Mesa to help debug this sadly, I only took over maintainership back when I needed it as a bridge for bumblebee. Maybe tmb can provide some input on the Mesa side, but otherwise I would advise to get in touch with upstream to get support from people who know VGL. You can use their users mailing list for example: https://groups.google.com/forum/#!forum/virtualgl-users They might send you back here as a Mageia-specific issue, but since nothing changed in our packaging and our Mesa packaging is relatively "normal", the issue you're experiencing might impact other distros/VGL users.
CC tmb for above comment.
CC: (none) => tmb
The test procedure in initial report works here in gnome terminal on Intel hw, so virtualgl seems to be ok the only thing I noticed is that installing virtualgl did not install the matching lib(64)virtualgl, causing: $ vglrun glxspheres64 ERROR: ld.so: object 'libdlfaker.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object 'libvglfaker.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. so I added that dep in virtualgl-2.6.1-4.mga7
Thank you gentlemen, both, I have installed the new vgl package and I can confirm that it makes no difference. The only place left to find the problem is, I suspect, the amdgpu kernel module or the Xorg amdgpu video driver as one of them is probably misleading Mesa. I have hacked the pbdemo file supplied in a full VGL implementation so that it will not bail out with an error. From memory, all I did was refine the selection criteria for a usable GLXFBConfig to the point where it would reliably pick the first one to work. This suggests to me that Mesa will do what is expected of it when presented with valid info from the hardware. My problem now is to figure out where the information is derived; from the kernel or the Xorg module. Actually, before that I have to work out if it is significant that running my pbdemo gives different (though successful) results when run directly or prefixed by vglrun.
Sorry, it had slipped my mind that this bug is still open. I reported it upstream, or rather I added my fourpenceworth to an existing issue: https://groups.google.com/forum/#!msg/virtualgl-users/orJUPt0a94o/ScxuCIqhFQAJ The real resolution appears to be something only AMD can provide, though I suspect that it might be possible to make the approach through the Mesa team. For the record, the workaround is to set VGL_FORCEALPHA=1 before invoking a program by VGL. I find it easiest to add the export VGL_FORCEALPHA=1 line to ~/.bashrc and/or ~/.bash_profile, though a system-wide solution may be better for some users. DRC <drc@virtualgl.org> has subsequently made some changes to the way VGL handles GLXFBConfigs in response to similar sounding issues discovered by users of other and similar hardware (see https://groups.google.com/forum/#!msg/virtualgl-users/psJrdPuxcPQ/BpvvOxoDBwAJ) and it is probably a worthwhile improvement for Mageia users to have these changes too. Meanwhile I have uncovered another workaround which appears to be equally as effective as the VGL_FORCEALPHA environment variable setting. It is accessible through the driconf package in Cauldron and is easy to apply and check by each user. It has the advantage of applying to the Mesa implementation, not just to the VGL interpretaion of this. It also suggests that the issue might be directly related to the bit-width of the pbuffer and not necessarily on which type of pbuffer was requested. The default request for DirectColor RGB expects a 24-bit width. The VGL workaround changes this to RGBA 32-bits. If I read it correctly, the change we can make in driconf is to "Create all visuals with a depth buffer" (on the "Miscellaneous" tab) which likely also results in a 32-bit structure. NOTE though that the third option on the driconf "Miscellaneous" tab should NOT be set; "Allow exposure of visuals and fbconfigs with RGB10A2 formats". That would also be 32-bits, but it breaks the fix again. Disclaimer: Any irrelevant nonsense and misinterpretations in this comment are entirely my own and are derived exclusively from my ignorance in these matters. Richard