Description of problem: The problem was observed during use of the Blender program when executing a GPU-assisted Cycles render. The report in the console was: CUDA version 9.1 detected, build may succeed but only CUDA 8.0 is officially supported. Compiling CUDA kernel ... "nvcc" -arch=sm_61 --cubin "/usr/share/blender/2.79/scripts/addons/cycles/source/kernel/kernels/cuda/kernel.cu" -o "/home/richard/.cache/cycles/kernels/cycles_kernel_sm61_7541DDBE6B1A613331389550DF3BCB6B.cubin" -m64 --ptxas-options="-v" --use_fast_math -DNVCC -I"/usr/share/blender/2.79/scripts/addons/cycles/source" In file included from /usr/include/host_config.h:50, from /usr/include/cuda_runtime.h:78, from <command-line>: /usr/include/crt/host_config.h:121:2: error: #error -- unsupported GNU version! gcc versions later than 6 are not supported! #error -- unsupported GNU version! gcc versions later than 6 are not supported! ^~~~~ CUDA kernel compilation failed, see console for details. Investigation shows that the failure to compile the CUDA kernel may be due to the /usr/include/crt/host_config.h file being from some time in 2017. It is possible that using headers and libraries from a more recent version of CUDA may fix the problem. Version-Release number of selected component (if applicable): How reproducible: Every time Steps to Reproduce: 1. Update all Cauldron packages 2. Install and run Blender 3. Configure Blender to use an Nvidia GPU, requiring nvidia-current drivers, for Cycles rendering 4. Render the default scene
Text below copied from Comment 8 in the history of bug 24379 It may solve the problem if our CUDA packages were updated to CUDA 10.0.130 but that might prevent users, with nvidia hardware requiring pre-410.48 nvidia drivers, from getting CUDA accelerated rendering in Blender. Perhaps we should consider driver-versioned rpms for the CUDA toolkit stuff too.... Meanwhile I can show that copying the Blender.org-supplied cycles addon from blender-2.79-667033e89e7f-linux-glibc224-x86_64/2.79/scripts/addons/cycles to ~/.config/blender/2.79/scripts/addons/cycles does indeed allow the Cauldron rpm version to start and render using my nvidia gpu.
(In reply to Richard Walker from comment #1) > Text below copied from Comment 8 in the history of bug 24379 Thanks for that, I had forgotten about that bug, so didn't understand at first why you didn't file this bug agains blender ;-) > > It may solve the problem if our CUDA packages were updated to CUDA 10.0.130 > but that might prevent users, with nvidia hardware requiring pre-410.48 > nvidia drivers, from getting CUDA accelerated rendering in Blender. > > Perhaps we should consider driver-versioned rpms for the CUDA toolkit stuff > too.... > > Meanwhile I can show that copying the Blender.org-supplied cycles addon from > blender-2.79-667033e89e7f-linux-glibc224-x86_64/2.79/scripts/addons/cycles > to ~/.config/blender/2.79/scripts/addons/cycles does indeed allow the > Cauldron rpm version to start and render using my nvidia gpu. Assigning to all packagers collectively, since there is no registered maintainer for this package. Also CC'ing daviddavid and some committers.
See Also: (none) => https://bugs.mageia.org/show_bug.cgi?id=24379Source RPM: nvidia-cuda-toolkit-9.1.85-2.mga7.nonfree.src.rpm => nvidia-cuda-toolkit-9.1.85-2.mga7.nonfreeAssignee: bugsquad => pkg-bugsCC: (none) => geiger.david68210, ghibomgx, marja11, mitya, tmb
I had packaged cuda 10.1.130 some weeks ago, but in the end the cuda compiler didn't work at all, dunno why (CPU stay at 100% forever even compiling some basic cuda program). An alternative attempt could be to try with a release between 10.1.130 and 9.1.85, which is 9.2.148 plus its PatchLevel1, available here: https://developer.nvidia.com/cuda-92-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Fedora&target_version=27&target_type=runfilelocal Maybe it could have also sense to have versioned cuda toolkit Ui.e. cuda9, cuda10, etc.), i.e. the whole tree installed in /opt/cuda-<version>/ (to which usually CUDADIR points to) plus a second package with alternative softlinks.
Giuseppe, I would be happy to try your 10.1.130 if it is still available. My hardware setup is likely quite different from yours. AMD A10-7860 APU drives all screens with amdgpu and radeonsi drm. Nvidia 1050TI is headless and provides only CUDA for Blender. Would that help?
Created attachment 10794 [details] nvidia cuda 10 spec file
Created attachment 10795 [details] patch for nvidia cuda 10 spec file
Created attachment 10796 [details] nvidia cuda 9.2 spec file
I attached the spec files for cuda 10 and 9.2.148; spec file for release 9.2.148 should be further completed to merge the Patch1 from upstream, that I missed. Resulting src.rpm for 10.0 is near 1.9GB, and for 9.2.148 is 1.6GB, and I'm little bit short of upload bandwidth to upload the .src.rpm somewhere. But package can be easily be built from the spec files above, downloading the nvidia binaries from https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
OK, downloading from nvidia now. I'll see if I can make sense of the spec files while I am waiting:~) Thanks
Sorry Giuseppe, I hit a snag: /usr/bin/install: cannot stat '/home/richard/rpmbuild/SOURCES/nvvp.desktop': No such file or directory error: Bad exit status from /home/richard/rpmbuild/tmp/rpm-tmp.0sHbbZ (%install) These are mentioned in the spec file: Source2: nvidia Source10: nvvp.desktop Source11: nsight.desktop I don't think I have any of those. What are they? Sorry if there is an obvious answer but I am not too literate in the rpmbuild skills.
Those files are unchanged from previous release, you can pick up from previous src.rpm package or here: http://svnweb.mageia.org/packages/cauldron/nvidia-cuda-toolkit/current/SOURCES/
OK, rebuilding now. Could take a while, but I think I still see a problem in /usr/include/crt/host_config at line 127: #if __GNUC__ > 7 #error -- unsupported GNU version! gcc versions later than 7 are not supported! #endif /* __GNUC__ > 7 */ What do you think? is it worth hand-hacking that to pass our gcc and see what happens?
There's a relevant discussion of the gcc 7 -v- gcc 8 problem at https://stackoverflow.com/questions/53344283/gcc-versions-later-than-7-are-not-supported-by-cuda-10-qt-error-in-arch-linux/53828864#53828864 It looks like the safest thing to do is use gcc 7! We have gcc 5.5 in MGA6 so I am guessing that it was a Cauldron change from 7 to 8 which provoked the CUDA kernel build failure in Blender. Did we have gcc 7 in Cauldron a few weeks ago? Can we get it back? They say it is possible to install multiversions of gcc and use the 'update alternatives' mechanism to establish a default compiler. The trick then would be to arrange somehow that CUDA always gets gcc 7 for working with nvcc. I don't see this problem going away unless Nvidia does something about it. We cannot distribute a fully functioning Blender if the user cannot build the required CUDA kernel. Blender.org distributes a number of pre-built CUDA kernels with its binary downloads so we could always advise Blender users to get one of those to have a working CUDA Cycles renderer, but seriously, that would be an awful solution. On balance, bringing back gcc 7 seems the least bad answer.
I have installed my build of your CUDA 10 spec file. I tested it first by rendering a model for which I had a timing from last night. The model rendered successfully within a second of the time taken using CUDA 9.1. The program was yesterday's Blender 2.80 beta1. My second test was a simple render of the default cube in the current Mageia Cauldron Blender 2.79 (git, so not 2.79b) which I have "fixed" by copying the Cycles addon (includes CUDA pre-built kernels) to my user addons directory from a Blender.org 2.79 nightly build. This rendered correctly. Next I removed my Cycles addon to create the situation a normal Mageia Blender user would encounter and re-tested the default cube render with the default Cauldron Blender. I got the previously noted "CUDA kernel compilation failed" error. Finally I changed the gcc version test in /usr/include/crt/host_config.h so that our gcc 8 would pass and repeated the previous test. This time the render failed with these errors: CUDA version 10.0 detected, build may succeed but only CUDA 8.0 is officially supported. Compiling CUDA kernel ... "nvcc" -arch=sm_61 --cubin "/usr/share/blender/2.79/scripts/addons/cycles/source/kernel/kernels/cuda/kernel.cu" -o "/home/richard/.cache/cycles/kernels/cycles_kernel_sm61_7541DDBE6B1A613331389550DF3BCB6B.cubin" -m64 --ptxas-options="-v" --use_fast_math -DNVCC -I"/usr/share/blender/2.79/scripts/addons/cycles/source" /usr/include/c++/8.3.0/type_traits(1049): error: type name is not allowed /usr/include/c++/8.3.0/type_traits(1049): error: type name is not allowed /usr/include/c++/8.3.0/type_traits(1049): error: identifier "__is_assignable" is undefined 3 errors detected in the compilation of "/tmp/tmpxft_00005922_00000000-6_kernel.cpp1.ii". CUDA kernel compilation failed, see console for details. So it looks like gcc 8.3.0 really will not work. It must be 7.x.x. However, the Blender.org CUDA kernels get you over the first hump and the limited rendering I have done with the CUDA 10 toolkit is successful.
As for gcc, currently there is gcc 8.3.0, and I think it will be the final system compiler version in cauldron/mga7. Consider that gcc 8.x has been introduced in cauldron a lot of time ago (24 Jul 2018). As you tried, faking gcc version support hadn't worked. Howewer I noticed that it has just been out cuda 10.1.105. Actually the format of the internal tree has been changed a bit so the nvidia-cuda-toolkit.spec should be reworked, but anyway it has host_config.h containing: #if __GNUC__ > 8 #error -- unsupported GNU version! gcc versions later than 8 are not supported! #endif /* __GNUC__ > 8 */ so gcc 8.x should be at least officially supported in that cuda version.
Created attachment 10801 [details] nvidia-cuda 10.1 spec file
Here is the spec file for version 10.1. The runfile can be downloaded from: https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.105_418.39_linux.run the spec file still needs some work, e.g. subpackage -nsight provides a set pf bundled libQt5*.so* libraries that interferes with the Provides of the system ones and must be placed in a requires_exclude list. As for further enhancements: man pages could be exported to %{_mandir}, and cublas and other cuda libraries could be libified, so to keep one major version (e.g. 9, 10, etc.).
(In reply to Giuseppe Ghibò from comment #17) It sounds like you have a plan. I am envious of your skill with this monster. I will be checking only the cuda toolkit and devel packages for now as my focus is really just on getting Blender to work a bit faster, and maybe later exploring OpenCL assistance with other programs. I'll put on another pot of coffee and get cracking on this rpm build. Thanks again...
The build completed OK and I have installed the 10.1.105 CUDA toolkit and devel rpms. The first test was to run the Mageia Cauldron Blender and render the default cube. The CUDA kernel build process completed without error in a couple of minutes and the cube rendered correctly. All other renders I have tried have frozen before completion, usually with all but 3 tiles rendered - regardless of the total number of render tiles. I suspect the new CUDA kernel build but I have no evidence yet. This could take a few hours I'm afraid. Richard
Created attachment 10804 [details] failed render from cauldron Blender
I think I am more confused than ever, but I have a couple of tentative conclusions based on a number of test renders. I have tried to render a number of models in a variety of Blender versions. In particular I have used a version of Blender equivalent (from the same day) to the version currently in Cauldron. I also removed the pre-compiled CUDA kernels from yesterday's nightly build of Blender 2.80 beta1 and tested that too. The results with cuda_10.1.105_418.39_linux : Using Cauldron's Blender 2.79 git build from 22 Feb (I think) the kernel build was completed in about 2 minutes. The model render failed to complete. Only 9 of the 12 render tiles were finished and the render engine froze. See pic 2019-02-27 23-32-40-blender2.79git-mga7.png above. Using the Blender.org equivalent build from Feb 22 I deleted the pre-compiled CUDA kernels in my blender-2.79-3b86c99260bc-linux-glibc224-x86_64/2.79/scripts/addons/cycles/lib directory and ran Blender, loaded the model and started a render. The missing kernels provoked a new kernel build which completed in the usual 120 seconds or so but the render took as long as a CPU-only render; about six and a half minutes. The new kernel had been put in the "wrong" place for a Blender.org build so I transferred the kernel and filter files to blender-2.79-3b86c99260bc-linux-glibc224-x86_64/2.79/scripts/addons/cycles/lib and hit F12 (render) again. This render completed normally in about the right time for a GPU Compute render; a little under four minutes. See pic 2019-02-28 00-37-07-blender.org2.79-3b86c99260bc.png below.
Created attachment 10805 [details] successful render with cuda_10.1.105_418.39_linux
Doing the same delete/rebuild/install dodge for yesterday's nightly Blender 2.80 build also succeeded in using the CUDA 10.1 kernel. See pic below. With the new CUDA kernel appearing to work in Blender.org builds which have been hacked to use freshly compiled CUDA kernels and the Cauldron rpm of Blender crashing with the new kernel I have no idea what is going on. At first I thought it might be that the Blender CUDA code really is sensitive to the CUDA version (8 versus 10.1), but getting it to work in all 2.79 and 2.80 Blender.org builds which I have tried certainly puts that theory in some doubt. I will need to do some more work on the Cauldron version of Blender to determine if there may yet be a problem with how we prepare its rpm for release. In the meantime I would tentatively vote in favour of this experiment with CUDA 10.1
Created attachment 10806 [details] successful render with cuda_10.1.105_418.39_linux and Blender 2.80 b1
Does cuda-z compiles for you? (package sources can be taken from http://svnweb.mageia.org/packages/cauldron/cuda-z/current/). Furthermore, if IIRC, blender has the possibility to (re)compile offline fresh cuda kernel cubins, e.g. adding this flag to the cmake configuration stage: -DWITH_CYCLES_CUDA_BINARIES:BOOL=ON should do the job, providing cubins for all the cuda architectures from 3.0 to 7.5. Alternatively with: -DCYCLES_CUDA_BINARIES_ARCH:STRING=sm_61 you can specify a single architecture (in this case sm_61 for the GTX1050Ti).
(In reply to Giuseppe Ghibò from comment #25) There is no sign of the actual cuda-z source; cuda-z-0.11.273.tar.xz I tried looking for the cuda-z source rpm but unless I am doing something really stupid, I can't find that either. I tried following the comment in the spec file and I do now have a copy of the source from subversion, but it is revision 291, not 273. Would that do? The spec file also mentions a few other patch files which don't appear to be in http://svnweb.mageia.org/packages/cauldron/cuda-z/current/SOURCES/ Again, I am sure there is a very simple answer, but my knowledge doesn't stretch that far :~(
OK, I found the source archive in cuda-z-0.11.273-1.mga6.nonfree.src.rpm I still need to find: Patch1: cuda-z-0.11.273-fix-host-defines-include.patch Patch2: cuda-z-0.11.273-path-and-verbose.patch Patch3: cuda-z-0.11.273-add-extra-arch.patch
Retry here: http://svnweb.mageia.org/packages/cauldron/cuda-z/current/SOURCES/
Got it. Building now...
It is running the command; #$ cicc --c++14 --gnu_version=80300 --allow_managed -arch compute_70 -m64 -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_0000198b_00000000-2_cudainfo.fatbin.c" -tused -nvvmir-library "/bin/../lib64/nvvm/libdevice/libdevice.10.bc" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_0000198b_00000000-3_cudainfo.module_id" --orig_src_file_name "src/cudainfo.cu" --gen_c_file_name "/tmp/tmpxft_0000198b_00000000-5_cudainfo.compute_70.cudafe1.c" --stub_file_name "/tmp/tmpxft_0000198b_00000000-5_cudainfo.compute_70.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_0000198b_00000000-5_cudainfo.compute_70.cudafe1.gpu" "/tmp/tmpxft_0000198b_00000000-16_cudainfo.compute_70.cpp1.ii" -o "/tmp/tmpxft_0000198b_00000000-5_cudainfo.compute_70.ptx" ... and using 100% of one core and 6G (and rising) of my RAM. Run time is 8 minutes so far for cicc
OK, that's 44 minutes and it has filled RAM (14G) and moved into swap. The build has definitely failed. I'll try that blender rpm rebuild now with the option to build all CUDA kernels.
The new blender rpm is in place, but I think I have more work to do on it. It may well have built the CUDA kernels but there is still no sign of them. My guess is that I will have to do a bit more work on the spec to include these new CUDA kernels in the output, but first I have to find out where Blender expects them to be, and what they are called. It is late now, Giuseppe, I'll get back to this on Friday evening. Richard
The cuda kernels are installed by the blender make install script and placed into: /usr/share/blender/<blender_release>/scripts/addons/cycles/lib/kernel_sm_<cuda_capability>.cubin and /usr/share/blender/<blender_release>/scripts/addons/cycles/lib/filter_sm_<cuda_capability>.cubin I'll attach a modified version of the blender's .spec file allowing to build offline the cuda cubins, just build with bm (build manager) or rpmbuild from the spec file adding the parameter "--with cuda_cubins", or set the parameter: %define build_cuda_cubins 1 in the .spec file.
Created attachment 10811 [details] blender spec file with conditional flag for building offline cuda kernels
Created attachment 10812 [details] blender spec file with conditional flag for building offline cuda kernels
Attachment 10811 is obsolete: 0 => 1
Created attachment 10813 [details] nvidia cuda 10.1 spec file I'm adding a more polished version of the cuda 10.1 spec file, where I excluded the libQt libraries so to avoid dependency problems. I think it will be the version that will be used to update the current nvidia-cuda-toolkit package to release 10.1. I think the cuda code compilation works also if you don't have an nvidia card installed (just install the package and the nvidia drivers). To test nvcc compilaton try this method: install the nvidia-cuda-toolkit-samples package, then copy the CUDA examples to any writable directory of your $HOME, for instance using: cp -pr /usr/share/nvidia-cuda-toolkit/samples . then go into samples, and compile everything with make. cd ./samples make actually, it worked flawlessly.
Attachment 10801 is obsolete: 0 => 1
Created attachment 10814 [details] blender spec file with conditional flag for building offline cuda kernels
Attachment 10812 is obsolete: 0 => 1
As for cuda-z, I wonder whether it's a problem of lack of memory (anyone with 32GB or more of RAM can test it?) or some mem leak, or some other kind of bug in cuda or the cuda-z sources, as with previous cuda toolkit versions it was completing without the need of much RAM.
Preparing to: 1. rebuild nvidia-cuda-toolkit with the revised spec file 2. test by compiling nvidia-cuda-toolkit-samples 3. rebuild blender-2.79b-14.git20190219.1.mga7.src rpm 4. test CUDA kernel building and Cycles render on GPU 5. rebuild cuda-z rpm against new cuda toolkit. This will take a while...and a lot of coffee... The cuda-z thing was strange. The makefile produced a reasonable amount of screen output until it got to that first invocation of nvcc at about line 124 (I'll check that later). Then it just continued to thrash one core and bleed memory. I have 16G ram with 16G swap and run a fairly lightweight LXDE so I started with 13G free (and 2G shared with video). I feel very much an observer using rpmbuild to do all the work. I do have the svn copy which is revision 291 so I can try building that the old fashioned way and see if I can get more information about what may be going wrong.
Steps 1 & 2 complete without incident - yay! moving on as planned
I forgot to tell, that every dir into "samples" contains a certain amount of cuda executables that can be launched and should produce some output (these requires all the other nvidia stuff to be correctly initialized and working).
While waiting for the blender re-build I have tried a few random sample programs. So far they have all worked or failed for reasons not related to CUDA (can't find libGL? It didn't try very hard!).
Created attachment 10815 [details] Console output from blender build (422 lines) Giuseppe, The blender rebuild, using your modified .spec file, has completed but without building the kernel_sm_ files. I have a file containig the build console output. It is 9MBytes so I searched it for the "kernel_sm_" string without success. I have attached the first 400 lines or so of this console record. It covers all the rpm stuff up to launching the make. I think it shows how the build was configured. Let me know if you need to see the whole 31000 lines :~)
This could be the problem: at line 304 of attachment 10815 [details] CUDA_TOOLKIT_ROOT_DIR not found or specified -- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR) (found version "10.1") -- CUDA compiler not found, disabling WITH_CYCLES_CUDA_BINARIES Looks like I need a few more directives or environment variables or constants to fix this.
I am trying again with this CUDA section in your .spec file: %if %{build_cuda_cubins} -DWITH_CYCLES_CUDA_BINARIES:BOOL=ON \ -DCYCLES_CUDA_BINARIES_ARCH:STRING="sm_30;sm_32;sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62;sm_70;sm_72;sm_75" \ -DCUDA_TOOLKIT_ROOT_DIR:STRING=%{_bindir} \ %endif
Pretty weird, sound you haven't installed nvidia-cuda-toolkit and nvidia-cuda-toolkit-devel or confusing with cuda toolkit from different sources; I got here: -- Found CUDA: /usr (found version "10.1") -- CUDA nvcc = /usr/bin/nvcc which produced: /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_30.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_32.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_35.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_37.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_50.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_52.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_53.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_60.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_61.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_62.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_70.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_72.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/filter_sm_75.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_30.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_32.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_35.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_37.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_50.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_52.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_53.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_60.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_61.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_62.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_70.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_72.cubin /usr/share/blender/2.79/scripts/addons/cycles/lib/kernel_sm_75.cubin
Right then, I'll uninstall the three cuda toolkit rpms and then go hunting for remnants. Then I will re-boot, re-install nvidia-cuda-toolkit, -devel and -samples. Then I will re-build blender. But first I will wait to see what I get from my modified blender.spec file. If there really is a problem with my toolkit installation it may still fail to find what is so plainly present. It mostly worked for nvidia-toolkit-samples building, though there too it reported missing libGL and missing libGLU (? I think) files.
Created attachment 10817 [details] Console output from home directory build of CUDA sample programs As we have built nvidia-cuda-toolkit-samples and we are using it, in part, to validate the cuda toolkit installation, I have attached the console output from that build. Some libraries said to be missing are, in fact, present (eg. libGL and lib64mesagl1-devel, libGLU and lib64mesaglu1-devel). Some of them (vulkan.h for instance) appear not to be available in MGA7 rpms. In general, these programs failed due to libraries not being found: - libX11.so - Vulkan SDK - libvulkan.so - vulkan.h samples/2_Graphics/simpleVulkan - libEGL.so samples/3_Imaging/EGLStreams_CUDA_Interop samples/3_Imaging/EGLSync_CUDAEvent_Interop samples/3_Imaging/EGLStream_CUDA_CrossGPU - libGL.so - libGLU.so samples/2_Graphics/volumeRender samples/2_Graphics/volumeFiltering samples/2_Graphics/Mandelbrot samples/2_Graphics/bindlessTexture samples/2_Graphics/marchingCubes samples/2_Graphics/simpleGL samples/2_Graphics/simpleTexture3D samples/3_Imaging/imageDenoising samples/3_Imaging/bilateralFilter samples/3_Imaging/recursiveGaussian samples/3_Imaging/bicubicTexture samples/3_Imaging/simpleCUDA2GL samples/3_Imaging/boxFilter samples/3_Imaging/SobelFilter samples/3_Imaging/postProcessGL samples/5_Simulations/oceanFFT samples/5_Simulations/smokeParticles samples/5_Simulations/nbody samples/5_Simulations/particles samples/5_Simulations/fluidsGL samples/6_Advanced/FunctionPointers samples/7_CUDALibraries/randomFog - libEGL.so - libGLES.so - libX11.so samples/2_Graphics/simpleGLES_screen samples/2_Graphics/simpleGLES samples/2_Graphics/simpleGLES_EGLOutput samples/5_Simulations/nbody_screen samples/5_Simulations/fluidsGLES samples/5_Simulations/nbody_opengles
vulkan.h should be contained into vulkan-headers-1.1.92 package (btw there is release 1.1.101 out upstream). libvulkan.so in lib64vulkan-loader-devel
For the other missed, seems it looks in /usr/lib64/nvidia, while nvidia stuff is installed in /usr/lib64/nvidia-current, using: make GLPATH=/usr/lib64/nvidia-current should compiled (some of) the missed samples.
Created attachment 10818 [details] Console output from CUDA toolkit sample build of Vulkan example (In reply to Richard Walker from comment #47) I have finished my re-build of blender using my modified .spec file. Checking through the console output I seem to have at least 13 different kernel_sm_ files, so that looks good. I have checked the installed files and all the /usr/share/blender/2.79/scripts/addons/cycles/lib files are present. I am waiting for it to reboot now as the linux kernel has just been updated... Looking good so far. I will check operation of GPU Cycles rendering next but it is getting late. I may postpone the rebuild of cuda-z until tomorrow. As for the vulkan stuff, I have installed: lib64mesavulkan-drivers lib64mesavulkan-devel lib64vulkan-loader-devel vulkan-headers and re-run the makefile in samples/2_Graphics/simpleVulkan. I attach the result.
(In reply to Giuseppe Ghibò from comment #50) That's OK then, it won't build with mesa files, only nvidia. I do have /usr/lib(64)/nvidia symlinked to nvidia-current but of course I had to delete all of the OpenGL stuff from that location to prevent other programs from finding it and trying to use those libraries. You might (or might not) be surprised that there are lots of programs which refuse to link to /usr/lib64/libGL.so and will ferret out the nvidia version wherever you have it. I find deleting those files to be the easiest way to get headless nvidia GPU assistance while continuing to display via andgpu and radeonsi.
(In reply to Giuseppe Ghibò from comment #49) I have checked and all the relevant headers and libraries appear to be present and correct. Maybe I need to do a completely fresh rebuild of the samples sources. It may be using previously detected configuration info, before I installed GLFW3 and the vulkan devel stuff. It may have to wait until Saturday afternoon. I want to check out Blender operation now, and I am trying to debug a boot failure on one of my brother's machines and that's a hundred miles away. Isn't the internet a wonderful thing! Goodnight for now Richard
I think getting working with vulkan is more complicated than that, and requires tweaking specifically for mageia for the file simpleVulkan/findvulkan.mk; also there is another vulkan here https://vulkan.lunarg.com/sdk/home#linux; using this attached findvulkan.mk and installing the packages vulkan-devel, vulkan-headers, glslang, glfw-devel should at least get the example compiling; another trick attempt could be to try to softlink /usr/lib64/nvidia to /usr/lib64/nvidia-current but probably won't work too.
Created attachment 10819 [details] modified findvulkan.mk for mageia support
I promise I will try that tomorrow - honest :~) Right now I am still only rendering all except three of whatever number of render tiles the model image requires in Blender. I am going to try to build CUDA 8, eventually, just to rule out the possibility that this is a Blender problem and the Blender CUDA 8 code really doesn't like to be compiled with CUDA 10.x. Otherwise the CUDA 10 toolkit appears to work very well, even with the complications I introduce by not having any nvidia OpenGL stuff, and not having the right flavour of vulkan (yet). In other news, the cuda-z build has failed again, just like yesterday, so when I have got Blender working, found a usable vulkan, sorted out the nvidia GL problem and fixed my brother's boot issue (did I leave anything out?), I will see if a newer cuda-z will help. Goodnight, really, I am on my way to bed...
(In reply to Giuseppe Ghibò from comment #54) You are so right. Getting the Vulkan example to work is way too complicated for me and beyond the scope of the current problem; first get a working CUDA toolkit with as many checks on its goodness as are provided by resources from Mageia Cauldron, then build Blender (as per modified .spec file) to verify correct operation of CUDA kernels in GPU-assisted Cycles render, and finally verify that cuda-z can be built correctly. There are a few other anomalies it would be nice to fix along the way; headless operation of CUDA on a system using Mesa, a working Vulkan toolset and getting Nvidia OpenCL running while we are waiting for OpenCL support for AMD Sea Islands and later. The current situation is that we have a partially validated CUDA 10.1 toolkit, locally built CUDA kernels in modified Cauldron Blender which fail during render, locally built CUDA kernels which appear to successfully replace the pre-built ones distributed with Blender.org nightly binary builds, and a cuda-z build which still hangs on first invocation of nvcc. I'll be back on this in a couple of hours.
Created attachment 10822 [details] nvidiacuda 10.1 spec file Update of the current cuda 10.1 spec file
Attachment 10813 is obsolete: 0 => 1
Created attachment 10823 [details] patch1 for nvidia cuda 10.1 spec file
Created attachment 10824 [details] patch2 for nvidia 10.1 cuda spec file
Created attachment 10825 [details] patch3 for nvidia cuda 10.1 spec file
Created attachment 10826 [details] patch4 for nvidia cuda 10.1 spec file
Created attachment 10827 [details] nvidia cuda 10.1 samples binaries spec file
For cuda-z, later. Passing debugging to nvcc (i.e. -G) would let the compilation pass, but probably the executable will be a lot slower, which means there is still something wrong, or in the code itself or in the optimizer. What is still not clear is whether there is a leak or just a memory hunger of the PTXAS optimizer. For cuda toolkit 10.1, the evidence show that this is the only release that could be shipped with mageia7/cauldron, as any older version including 10.0 won't work with the mageia7's gcc compiler. I updated the cuda 10.1 spec file and I also provided a spec file for compiling the cuda 10.1 samples; this should compile all the cuda samples, including vulkan, and merges all them in a bin dir. The compilation should not require an nvidia card, but of course running those binaries to see whether they are working, it does.
BTW, blender 2.79 compilation shows this warning: CMake Warning at intern/cycles/kernel/CMakeLists.txt:349 (message): CUDA version 10.1 detected, build may succeed but only CUDA 9.0, 9.1 and 10.0 are officially supported so probably 10.0 works but they haven't yet checked the code.
s/10.0 works/10.1 works/
Giuseppe, I have built and installed: nvidia-cuda-toolkit nvidia-cuda-toolkit-devel nvidia-cuda-toolkit-samples nvidia-cuda-toolkit-samples-bins blender There appears to be little difference building and using the toolkit compared to yesterday. That's good, but not exciting. The new samples binary package built and installed quietly enough, but it has produced a different number of programs in samples/bin/x86_64/linux/release (170 files) compared with yesterday's manual build in my home directory source tree where samples/bin/x86_64/linux/release has 148 files. Presumably the difference is due to your patches fixing installed library discovery and linkage. The downside is that I think 170 is still not the expected total. At least 1 file wasn't built; simpleVulkan. Not a big surprise, perhaps, but also simpleGLES, simpleGLES_EGLOutput, simpleGLES_screen and maybe others. I have not yet checked the operation of all 170 sample programs but I have noted some general truths. Any sample program which has nothing to do with screen manipulation works correctly and passes any built-in tests. Other sample programs are failing with a variety of errors. For instance: [richard@Midnight6 release]$ /usr/share/nvidia-cuda-toolkit/samples/bin/x86_64/linux/release/Mandelbrot [CUDA Mandelbrot/Julia Set] - Starting... GPU Device 0: "GeForce GTX 1050 Ti" with compute capability 6.1 Data initialization done. Initializing GLUT... OpenGL window created. Creating GL texture... Texture created. Creating PBO... CUDA error at Mandelbrot.cpp:971 code=304(cudaErrorOperatingSystem) "cudaGraphicsGLRegisterBuffer(&cuda_pbo_resource, gl_PBO, cudaGraphicsMapFlagsWriteDiscard)" [richard@Midnight6 release]$ /usr/share/nvidia-cuda-toolkit/samples/bin/x86_64/linux/release/simpleGL simpleGL (VBO) starting... GPU Device 0: "GeForce GTX 1050 Ti" with compute capability 6.1 CUDA error at simpleGL.cu:422 code=304(cudaErrorOperatingSystem) "cudaGraphicsGLRegisterBuffer(vbo_res, *vbo, vbo_res_flags)" CUDA error at simpleGL.cu:434 code=400(cudaErrorInvalidResourceHandle) "cudaGraphicsUnregisterResource(vbo_res)" There was one other which complained about some GL extensions not being available, but I haven't found it again. Tomorrow I will go through them from 1 to 170 and record the failures.
Moving on to Cauldron's Blender labelled 2.79b (actually a git snapshot from 19 Feb). There is no change in the manner in which GPU-assisted Blender Cycles renders fail having rendered all but three of the tiles to be rendered. Blender.org binary downloads continue to operate correctly when the supplied cycles/lib/ directory is replaced by the one we built with toolkit 10.1 As before, console logs are available for each of these builds, should you wish to see them. Toolkit build log is 510kB, samples-bin log is 399kB and Blender is 10.7MB
So none of the samples in 2_Graphics works? As for blender, is the blender RPM built from the spec file with the generated cubins works (i.e. from the spec file provided)? Or just works when taking those cubins and merged into latest blender site (which date?) binaries? What if upgrading the blender git version in the spec file to the current git?
Created attachment 10828 [details] patch4 for nvidia cuda 10.1 spec file Updated patch4 for vulkan/simpleVulkan.
Attachment 10826 is obsolete: 0 => 1
nvidia-smi show a correct output during operations?
Created attachment 10829 [details] Console output running 2_Graphics sample programs Thanks for the Vulkan update, I'll update the build accordingly. The Blender .spec file is the one with your added cubins stuff, plus my "-DCUDA_TOOLKIT_ROOT_DIR:STRING=%{_bindir}", and for my own convenience I changed "%define rel 2" 'cos I can get confused which rpm is installed. The resulting Blender rpm has the full set of CUDA kernels and filters in cycles/lib/. To test a Blender.org nightly build I empty its cycles/lib/ directory which forces a CUDA kernel/filter build which ends up in ~/.cache/cycles/kernels with a name like cycles_[filter|kernel]_sm61_0D29129DD439CCE0DA8F8CD2A681C9A1.cubin. The big number is different for each version/build of blender. The attached file is the console output from running each of the toolkit samples/2_graphics/ programs. They all either fail, or are not present!
(In reply to Giuseppe Ghibò from comment #69) "What if upgrading the blender git version in the spec file to the current git?" That will probably work too, though I predict that the Cycles GPPU renders will still stop 3 tiles short of completion. I will fetch a snapshot as well as the equivalent binary download. Actually, I already have the latest download. It is dated 2019-03-01 00:41:28 and works fine when forced to build its own kernel/filter sm_61 using toolkit 10.1
(In reply to Giuseppe Ghibò from comment #71) I usually only run nvidia-smi when using Blender, and then not always - only if I need confirmation that I have CUDA set up correctly after an nvidia-current update. I usually have to force an initrd rebuild, don't ask me why, after every nvidia update and very occasionally after a kernel update too. I'll make sure I have it running on a spare screen if you like...
I am hitting another snag. I don't know where to look to get a git snapshot. The current master is version 2.80 but we would want the latest 2.79 development version which produced the 1st March binary download. My git skills are not up to that, in fact I don't have any :~[
Finally for tonight, I removed the installed toolkit 10.1 rpms and it forced blender out as well. Not a welcome dependency, but hopefully it is only temporary. I will rebuild with the new Vulkan patch on Sunday.
Created attachment 10831 [details] script for downloading current blender 2.7 git I add this script for downloading the current git 2.7 branch and produce the daily tarball. The script shows some error like 'Previous HEAD position was d4100298 x3d import: make it work without internet connection' but that comes from internal blender commands, so it should be ok. In this way is it possible to compare the same blender binaries from upstream with current RPM build. I also suspect that blender is very sensitive to cuda 10, as I see commit like this: https://git.blender.org/gitweb/gitweb.cgi/blender.git/commit/f63da3dcf59f87b34aa916b2c65ce5a40a48fd92 which apparently seems building using two different cuda sources in the same build tree, one for 10 and one for 9.
Created attachment 10832 [details] blender spec file with conditional flag for building offline cuda kernels
Attachment 10814 is obsolete: 0 => 1
(In reply to Giuseppe Ghibò from comment #70) simpleVulkan is now generated correctly(?) and executes but... Instance created successfully!! WARNING: radv is not a conformant vulkan implementation, testing use only. Selected physical device = 0x247ff00 Swapchain created!! failed to open shader spv file! It looks like the frag.spv and vert.spv files are generated by the makefile but then clobbered before the sample is installed. I was able to run /usr/bin/glslangValidator -H shader_sine.[frag,vert] to re-create these files and see what they contain. Everything looked OK so I re-ran the simpleVulkan example and got this: [richard@Midnight6 release]$ ./simpleVulkan Instance created successfully!! WARNING: radv is not a conformant vulkan implementation, testing use only. Selected physical device = 0x19ccf00 Swapchain created!! Pipeline created successfully!! CUDA error at vulkanCUDASinewave.cu:1510 code=1(cudaErrorInvalidValue) "cudaImportExternalMemory(&cudaExtMemVertexBuffer, &cudaExtMemHandleDesc)"
Thanks Giuseppe, I have the .tgz in place with the new .spec and new knowledge; pigz can fly! I didn't know about that one until your script told me I didn't have it :~) I should have a report in an hour or so...
Delay. Just had a crash - PCManFM, the LXDE file manager (and a bit more perhaps). It happens from time to time, but this time it caused a make error. That's the first time a file manager crash has affected another process other than the task bar. Yes, it is a bug but no, I haven't filed it ... yet. It is very unpredictable and impossible, so far, to deliberately cause it to happen. Meanwhile I have re-started rpmbuild. Hopefully it will be ok this time if I resist the temptation to go browsing through the Cycles addon sources.
Created attachment 10833 [details] rpmbuild error building blender 2.79 git rpmbuild failed at 93% complete. The file manager crash I referred to earlier must have been coincidental. This is the second rebuild after that first crash and all have stopped at the same place. The attachment contains five files; CMakeLists-freestyle is CMakeLists.txt from BUILD/blender-2.79b-git20190301/source/blender/freestyle. The last line is: blender_add_lib(bf_freestyle "${SRC}" "${INC}" "${INC_SYS}") Makefile2 is Makefile from BUILD/blender-2.79b-git20190301/build. rpmbuild-full-output is the full 9MB console output from rpmbuild rpmbuild-BUILD-ERROR is the last dozen or so lines from the full console output. rpm-tmp.Ab7cEz is the file which rpmbuild was excecuting.
I will be on the road until about 22:00GMT
The good news is that the download from Blender.org dated 2019-03-01 can build and use its own cycles_filter_sm61_C266701B6DA7F04AEAABA3328AC151A4.cubin and cycles_kernel_sm61_C266701B6DA7F04AEAABA3328AC151A4.cubin.
Created attachment 10835 [details] Console output with verbose nvcc Meanwhile I have re-run the cuda-z build with your .spec and patches. It hangs with the results in the attached file. Some of the files referenced are expected to be found in /tmp. I have added the relevant file list to the bottom of the console output.
Created attachment 10837 [details] blender spec file with conditional flag for building offline cuda kernels the blender build fails due to an upstream bug in the code, just rerun the script for downloading the current git as of today 20190304 and it will complete the compilation.
Attachment 10832 is obsolete: 0 => 1 Attachment 10833 is obsolete: 0 => 1
Got there ahead of you. I have successfully built two versions of the March 4 git; one with and one without the CUDA kernels build. With the kernels included in the rpm: test render completes in 1min 5sec With the kernels omitted and built on-demand test render completes in 1min 5sec Using Blender.org build of March 4, the included kernel test render completes in 1min 13sec Using Blender.org build of March 4, kernels omitted and built on-demand test render completes in 1min 9sec Locally built CUDA kernels in all cases perform slightly faster!
Created attachment 10839 [details] Console output - failed build I tried rebuilding cuda-z without the extra architectures; sm_70 sm72 sm75 It made little difference. cicc now hangs while working on sm62 instead.
Apologies for Comment 87 - you had the answer before I was even home from work! Your spec file is essentially the same as mine. I have since completed testing the Blender.org 2.80 download with similar successful results. With the supplied CUDA kernels the test render finished in 1min 8sec With the CUDA 10.1 kernel built on demand the render time was 1min 05sec With a prebuilt CUDA 10.1 kernel in place of those supplied : 1min 05sec All tests of nvidia-cuda-toolkit-10.1.105-1.mga7.x86_64.rpm and its -devel- have now been completely successful with the March 4 Blender 2.79 built as blender-2.79b-14.git20190304.1.mga7.x86_64.rpm either with or without the inclusion of built CUDA kernels. All tests of nvidia-cuda-toolkit-10.1.105 have also been successful in building and running the sample programs from nvidia-cuda-toolkit-samples-10.1.105-1.mga7.x86_64.rpm with your fixes for Vulkan and GL/EGL/GLES. Nevertheless there are still some problems at run time for all example programs in 2_Graphics and 3_Imaging and elsewhere. This may be a problem associated with the Mesa implementation for AMD Kaveri which has been exposed by VirtualGL in particular (https://bugs.mageia.org/show_bug.cgi?id=23990 and https://groups.google.com/d/msg/virtualgl-users/orJUPt0a94o/OjrcvIy_AgAJ) and any program trying to use 24bit pbuffers in general. As such it might be beyond the scope of this bug. I could set up a machine to test all of this using, say, nouveau. I would be nervous about doing this because I have always found it difficult to remember how to get CUDA working on a card which isn't being used for screen output.
Giuseppe, I have been trying to come up with a test environment which I can use to check the use of nvidia-cuda-toolkit-10.1.105 example programs in OpenGL environments other than nvidia-current and Mesa's amdgpu/radeonbsi support. I don't think I can do this with the hardwaqre I have available; AMD APUs, Nvidia GTX 960 and 1050 and an old 6200. Furthermore, I have uncovered another bug which affects the current Cauldron version of Blender, and all releases of Blender since about 9 January 2019. I have reported this at https://developer.blender.org/T60379. In the absence of any other test results I am happy to close this bug as solved for my specific combination of hardware and software; AMD A10-7860 Kaveri screen drivers and Mesa Nvidia GTX1050 GPU for CUDA only using 418.43-1.mga7.nonfree and nvidia-cuda-toolkit-10.1.105 Would you agree?
Let's wait a bit. I think the nvidia-cuda-toolkit.spec can be soon merged to current cauldron svn, as actually is already better than current one 9.1 that is not working anyway. Regarding the other tests involving also OpenGL, I think that the nvidia-current should be also enabled as device driver and the GL libraries switched to nvidia proprietary ones. /usr/sbin/update-alternatives --set gl_conf /etc/nvidia-current/ld.so.conf should do the switch manually, but it also need to be enabled/configured in the rest of Xorg configuration to use the proprietary drivers. As for bug https://developer.blender.org/T60379, it talks about Win10 version and also is for blender 2.80 series. Is that number right, or was a typo? AS for an old nvidia 6200, I don't think it has supported CUDA capabilities, whose list is here: https://developer.nvidia.com/cuda-gpus
(In reply to Giuseppe Ghibò from comment #91) I think that there may be a variety of issues with osd-3.3.3, but I am only sort of guessing. The Blender 2.80 report was, I think, a little misleading as it describes a way, which I can duplicate, to provoke a massive memory leak. You have to be quick to catch it and kill blender, but you get all your memory back. At the moment we do not package 2.80 and it is beta 1 after all, maybe beta 2, but certainly still getting many bug fixes and improved existing features. I suspect (again I am only guessing) that as part of the backporting of improvements from 2.80 to 2.79, we got something a little unexpected, some time in early January. It is interesting to see how the startup time of a January 6 Blender 2.79 is very very quick and the next one I have, around January 9, is noticeably slower. The reason I am banging on about OSD is that it has never been included in our Blender build, but is part of the binary releases from Blender.org. When I rebuild our current Cauldron Blender with opensubdiv-3.3.3 it inherits the instability I recorded in the T60379 bug report. It is exactly the same behaviour that my copy of the nearest date Blender.org daily exhibits, and it is a hard crash. When I load my test file, as included in the T60379 report, into our Cauldron Blender, with no OSD, then adaptive subdivision, classified as an "experimental" feature, simply does not work. Neither the simple torus, nor the vehicle roadwheel appear smoothly curved. They both display the underlying jaggedness of the simple, unmodified mesh geometry. The vehicle model from which the roadwheel was copied was developed in Blender 2.79 and has not shown this viewport anomaly until I saw it in the Mageia Cauldron git rpm from February, and ever since. I didn't see it in the ship model I was using for our CUDA render tests, but if I look closely I may find it in the wheelhouse :~( As for testing with Nvidia GL, if I am careful about backing up critical files (and there are a few of those - getting amdgpu working properly was not trivial) I am sure I can disable the onboard graphics and set up the nvidia card to take at least one of my screens, That should do for testing, I reckon. I will tackle that very soon, perhaps Wednesday. For now I am preparing two systems for migration to Cauldron, They are sound studio machines and I have a number of applications to rebuild for current Cauldron drivers, glibc and gcc.
You can use ulimit -Sv 8000000 before running blender to limit the amount of memory it can allocate without having it leaks the whole system memory (8000000 means about 8GB).
Created attachment 10869 [details] borked xorg.conf I have been struggling with the change to using the Nvidia card as the screen driver. In a little over two hours I have achieved some progress; Backed up and replaced my grub/menu.lst file Disable on-board graphics in the BIOS Removed, reboot and configured the nvidia graphics via XFdrake Rebuilt initrd to get rid of the amdgpu driver My Xorg.0.log tells me that my monitor is connected to DFP-1 and that DFP-0, DFP-2 and DFP-3 are disconnected. I can only see three sockets on the back of the card - DVI, HDMI and a flat looking one which might be Display Port(?). I only have an HDMI lead for this monitor, so the others don't matter. I am pretty sure the desktop is starting and displaying on a port I can't use. My 2 hours+ struggling has been directed to trying to make the proper changes to my xorg.conf file to get the damn thing to put the picture where I can see it. In case I am doing something really stupid which everyone else knows about, I have attached it here.
Created attachment 10870 [details] It looks like it should be working I am close to the end of the road with this one. As far as I can tell from the log I should be looking at my MGA7 screen on the monitor it is attached to. The log tells me that the monitor has been detected on DFP-1, its resolution has been set correctly (1929x1080), and the monitor is used during the system boot, right uo to the graphical login screen. It just goes black when I log in. The only evidence I have that it doesn't work properly is the blackness of the screen. The only thing I can think to do is buy a DVI cable and disconnect the HDMI. Should I raise a bug for this too? The list is growing....
Maybe it's not able to query correctly the EDID informations. What if you provide the monitor modeline manually just for your Acer monitor, in the "Monitor" Section of xorg.conf, and add a: Option "UseEDID" "False" Option "ModeDebug" "True" to the "Screen" Section? You can get some EDID informations with: urpmi monitor-edid and with: monitor-get-edid | monitor-parse-edid should give the ModeLine info.
Created attachment 10871 [details] new debug xorg.conf with no EDID
Attachment 10869 is obsolete: 0 => 1
Created attachment 10872 [details] results from new xorg.conf This took much too long - sorry. To get the EDID from the 27" Acer I had to plug it into this machine and run your "monitor-get-edid | monitor-parse-edid" as it failed when run on the target PC tty2 or via ssh from this PC. The result looks just like the mode line reported previously and the effect is exactly the same - no screen, despite the log indicating it all worked properly. It seems this "black screen on nvidia HDMI 10x0 series" is not unique to me, but nobody seems to have a definitive answer. The "solutions" mostly involve either backing off to an earlier nvidia driver or rebuilding/reinstalling the current driver. I think it just doesn't work. I'll get a DVI lead tomorrow - my brother can make good use of it when I have finished this nvidia-cuda-toolkit test (almost forgot what this was all about :~)
Attachment 10870 is obsolete: 0 => 1
The DVI connection worked but I had no GL. I tried everything "proper" to fix it and in the end I had to hide the Mesa lib64/libGL.so and substitute symlinks to lib64/nvidia-current. That worked for all tests so far; foobillard, glxspheres64 and Blender. Unfortunately Blender no longer finds the card for CUDA support! I have been hacking at this for hours and I am now so far away from my original "default" configuration I begin to wonder if I will be able to get back to it when this toolkit testing is done. I will take a longer closer look at the nvidia stuff to see if I can spot what has gone missing, but I'll get a proper night's sleep first.
Probably it misses some of the "slave" softlinks that update-alternatives set or there are some "debris" from previous configuration. The problem is that this situation seems not that uncommon, but we still remain in a vague position, as this can't yet be transformed in a "fixing" script out of the box (we don't know exactly which softlink is missed, which nvidia*.ko files is missed or interferes in the installation). In the bug https://bugs.mageia.org/show_bug.cgi?id=24436 there has been recently, near the end part of bug, a procedure to restore the nvidia drivers to get working properly. Many people had success following it (the 2nd problem is that once you fixed you no longer know exactly what was the culprit, admitting that it was a single one). With the nvidia drivers working, CUDA should be available inside blender properly, and with latest blender 2.7x git there should appear also in the CUDA menu, a checkbox that even allow "hybrid" rendering, i.e. use both CUDA and CPU at the same time to perform rendering. Backing with CUDA, don't forget there is also an init script /etc/init.d/nvidia that is run once and that set the CUDA node devices and permission properly. Also nvidia-smi has an option called "--persistence-mode" that can be used to get CUDA stuff pre-initiazed. This can be useful to avoid the little initializing delay in the case an nvidia card is used remotely as rendering machine, and don't have the X11 stuff that would preload it. I was evaluating whether such extra command could be merged in the cuda toolkit /etc/init.d/nvidia script, but that would require that every cuda card (think to multiple nvidia card) is detected and the --persistence-mode is sent to each of them.
I am back from my brother's place with a DVI lead and a working monitor on the Nvidia card. I have also discovered why CUDA seemed to stop working; I was missing some essential devel rpms which are now re-installed. A quick check with Blender tells me CUDA operation is restored and I can now return to the testing of the nvidia-cuda-toolkit package. I would seem that in preparing for the change to the nvidia screen driver by removing the installed packages, it caused a lot of other stuff to be removed and I didn't notice. I will continue on Tuesday evening
I have finished re-testing the "samples" build from the nvidia-cuda-toolkit and I think I know now why there are 31 samples which fail to build. The initial suspect was my mangled Nvidia OpenGL installation, and before that it was suspected that my Mesa OpenGL (for the AMD A10 APU) was not up to the task. In fact it seems that Nvidia has included distribution-specific tests to find various required libraries and they all fail because Mageia isn't Red Hat or Fedora. The tests are done by these files: findgleslib.mk findgllib.mk findegl.mk 2_Graphics/simpleGLES_screen/findgleslib.mk 2_Graphics/volumeRender/findgllib.mk 2_Graphics/simpleGLES/findgleslib.mk 2_Graphics/volumeFiltering/findgllib.mk 2_Graphics/Mandelbrot/findgllib.mk 2_Graphics/bindlessTexture/findgllib.mk 2_Graphics/simpleGLES_EGLOutput/findgleslib.mk 2_Graphics/marchingCubes/findgllib.mk 2_Graphics/simpleGL/findgllib.mk 2_Graphics/simpleTexture3D/findgllib.mk 3_Imaging/imageDenoising/findgllib.mk 3_Imaging/EGLStreams_CUDA_Interop/findegl.mk 3_Imaging/bilateralFilter/findgllib.mk 3_Imaging/recursiveGaussian/findgllib.mk 3_Imaging/bicubicTexture/findgllib.mk 3_Imaging/simpleCUDA2GL/findgllib.mk 3_Imaging/EGLSync_CUDAEvent_Interop/findegl.mk 3_Imaging/boxFilter/findgllib.mk 3_Imaging/EGLStream_CUDA_CrossGPU/findegl.mk 3_Imaging/SobelFilter/findgllib.mk 3_Imaging/postProcessGL/findgllib.mk 5_Simulations/nbody_screen/findgleslib.mk 5_Simulations/fluidsGLES/findgleslib.mk 5_Simulations/oceanFFT/findgllib.mk 5_Simulations/smokeParticles/findgllib.mk 5_Simulations/nbody/findgllib.mk 5_Simulations/particles/findgllib.mk 5_Simulations/fluidsGL/findgllib.mk 5_Simulations/nbody_opengles/findgleslib.mk 6_Advanced/FunctionPointers/findgllib.mk 7_CUDALibraries/randomFog/findgllib.mk
The results for 165 compiled sample programs: 5_Simulations/fluidsGL appears to just hang with a green textured screen. It might be the "right" result and ESC will quit. Everything else either passed or failed as expected (eg. only one GPU when two were required) Looks like a good'un. I am switching back to amdgpu screen and Mesa now (or tomorrow). I could recommend that the various findxxxx.mk files be patched to work with Mageia. I think you have already done the Vulkan one, or was that for something else..?
Thank you Giuseppe for all your hard work. In particular, thank you for the extra work you did on the blender spec and for the script to fetch and pack the Blender dailies from git. I will be putting it all to good use as I try to build OpenShadingLanguage and OpenSubDiv support. Richard
Status: NEW => RESOLVEDResolution: (none) => FIXED
The patches for gles, egl, gl libs were already included in comments 59, 60, 61, 62. With that, cuda toolkit should be able to compile all the 171 samples. In particular the package nvidia-cuda-toolkit-samples-bins (release 3) in non-free can be retrieved with that files compiled. To test them all at once just install the package nvidia-cuda-toolkit-samples-bins, then issue: for i in /usr/share/nvidia-cuda-toolkit/samples/bin/x86_64/linux/release/*; do echo ${i} && ${i}; done for the samples involving some graphics an interactive window should popup. Of course you can compile yourself from the nvidia-cuda-tookit-samples package, by just copying the samples dir in a writable dir run "make -j1". The cuda compilation (and just that) doesn't even require an nvidia card installed.
I blame old age for my failing memory. Of course you had already patched the findxxxx.mk scripts. I had a moment of madness at the end of last week - accidentally deleted my rpmbuild directory thinking it was something else. It took an evening to fetch all of the sources, specs and patches again and then I had a weekend to get the nvidia screen and driver to work. I built the samples from what I thought was an up-to-date directory in my home but it was older than that. Silly me. I have the Cauldron toolkit updates in place now so I will retry the samples when I have put this system back in its normal working state; AMD screens and Nvidia for CUDA only. I think you said earlier that the Nvidia CUDA toolkit does not require a working nvidia screen so I will strip out all nvidia rpms and see if I can find the way to do that. Then I will get back to Blender tests. Thank you again for all you help and guidance.