| Summary: | drakx11 installing nvidia470 fails with conflicts to earlier installed nvidia-current | ||
|---|---|---|---|
| Product: | Mageia | Reporter: | Morgan Leijström <fri> |
| Component: | RPM Packages | Assignee: | Kernel and Drivers maintainers <kernel> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | Normal | CC: | ghibomgx |
| Version: | Cauldron | ||
| Target Milestone: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Source RPM: | CVE: | ||
| Status comment: | |||
| Attachments: |
fix for nvidia470 to avoid conflicts with nvidia-current
patch to move gtx 750 to nvidia470 patch for spec file for GTX 750 and nvidia470 fix for nvidia470 for upgrading from mga8 |
||
|
Description
Morgan Leijström
2023-05-06 18:04:17 CEST
I add a few infos. According to nvidia docs NVIDIA GTX 750 is supported by both 525.xx series (i.e. actually nvidia-current, see https://www.nvidia.com/Download/driverResults.aspx/204639/en-us/ and 470.xx series (i.e. nvidia470, see https://www.nvidia.com/Download/driverResults.aspx/200634/en-us/). drakx11 automatically suggests the driver with higher rank (in this case nvidia-current). Actually seems there are two kind of GTX 750 cards, one with PCI IDs "0x1381" and the other with "0x1407". According to your lspcidrake (vendor:10de device:1381 subv:3842 subd:2751), you have the version with IDs "0x1381" which is probably the older GTX 750 model that has difficulties with 525.xx series. Probably this is an upstream bug and should be reported to nvidia too (IMHO only the newer model is supported by the newer drivers). A further test would be to package locally the an nvidia530 (530.41.03) and see whether it fails too. CC:
(none) =>
ghibomgx Created attachment 13811 [details]
fix for nvidia470 to avoid conflicts with nvidia-current
Here is a suggested patch for nvidia470 SPEC to avoid conflicts with nvidia-current when the latter is previously installed and one switch to another series in drakx11.
Created attachment 13812 [details] patch to move gtx 750 to nvidia470 As reminder I add a patch to move the GTX 750 (older device ID 0x1381) to nvidia470 driver. Resulting binary package can be tested here: https://copr-be.cloud.fedoraproject.org/results/ghibo/mageia9-bonus/mageia-cauldron-x86_64/05893859-ldetect-lst/ Created attachment 13813 [details]
patch for spec file for GTX 750 and nvidia470
Reminder for the SPEC file (on final release it should be moved to git upstream ldetect-lst).
(In reply to Giuseppe Ghibò from comment #2) > Created attachment 13811 [details] > fix for nvidia470 to avoid conflicts with nvidia-current > > Here is a suggested patch for nvidia470 SPEC to avoid conflicts with > nvidia-current when the latter is previously installed and one switch to > another series in drakx11. that means they would not be parallell installable anymore.. tchnically nvidia470 should only conflict nvidia-current <= nvidia470 version as that is the real conflict So no conflict wanted because there may be more than one GPU installed. But drakx11 must be able to switch driver per user request. (In reply to Thomas Backlund from comment #5) > (In reply to Giuseppe Ghibò from comment #2) > > Created attachment 13811 [details] > > fix for nvidia470 to avoid conflicts with nvidia-current > > > > Here is a suggested patch for nvidia470 SPEC to avoid conflicts with > > nvidia-current when the latter is previously installed and one switch to > > another series in drakx11. > > that means they would not be parallell installable anymore.. > > tchnically nvidia470 should only conflict nvidia-current <= nvidia470 > version as that is the real conflict Isn't when the version is older already conflicting in the current version. But what I wonder is do parallel installation ever worked? E.g. nvidia470 dkms modules can coexist with dkms nvidia-current (actually not conflicting in the spec file) but in an installation with two different nvidia cards can they be modprobed together? Ditto for the libraries x11-driver-video-nvidia-current and x11-driver-video-nvidia470 are providing same major version of libraries, which one to take in a two card installation? (or for nvidia-current-utils and nvidia470-utils too). An alternative instead of trying to add conflicts to packages (as things would get more complicate if we would introduce a 3rd or a 4th series, e.g. suppose by example nvidia530, then you would have to conflicts all of them in each package which would be a mess) we might introduce an option or just a script to cleanup nvidia-current (or nvidia470) installation just before installing another series. (In reply to Giuseppe Ghibò from comment #7) > (In reply to Thomas Backlund from comment #5) > > > (In reply to Giuseppe Ghibò from comment #2) > > > Created attachment 13811 [details] > > > fix for nvidia470 to avoid conflicts with nvidia-current > > > > > > Here is a suggested patch for nvidia470 SPEC to avoid conflicts with > > > nvidia-current when the latter is previously installed and one switch to > > > another series in drakx11. > > > > that means they would not be parallell installable anymore.. > > > > tchnically nvidia470 should only conflict nvidia-current <= nvidia470 > > version as that is the real conflict > > Isn't when the version is older already conflicting in the current version. > No. in mga8 we currently have mvidia-current-470.161.03-1.mga8.nonfree That R470 branch is now packaged as nvidia470 packages in mga9 and nvidia-current is updated to R525 branch, currently 525.116.03 The current R470 driver in mga9 is now at: 470.182.03-2.mga9 So for upgrades to work from mga8 -> mga9 you need to inform rpm of that branch renaming change, so nvidia470.spec should have: Conflicts: <matching nvidia-current (sub)package> <= %{version}-%{release} in proper places... > But what I wonder is do parallel installation ever worked? Oh, they have atleast worked for eons... we even shipped live isos with atleast 3 different nvidia branches installed ... of course some of it has probably been broken by your rewriting of the nvidia spec files > E.g. nvidia470 > dkms modules can coexist with dkms nvidia-current (actually not conflicting > in the spec file) but in an installation with two different nvidia cards can > they be modprobed together? Yes, when they support different hw they can. if they support the same hw, then the driver loading first will claim the hw. > Ditto for the libraries > x11-driver-video-nvidia-current and x11-driver-video-nvidia470 are providing > same major version of libraries, which one to take in a two card > installation? (or for nvidia-current-utils and nvidia470-utils too). > well, not all packages can maye get parallell installable anymore due to how they are now packaged, but atleast core bits need to be... for example with 2 different nvidia gpus in the system, you might want to run one for display stuff, and the other maybe for cuda stuff, in which case you need to be able to install dkms-nvidia* for both, and macching cuda packages... We even fixed a packaging bug due to that example for mga7 or 8 so its not a theoretical use case... > An alternative instead of trying to add conflicts to packages (as things > would get more complicate if we would introduce a 3rd or a 4th series, e.g. > suppose by example nvidia530, then you would have to conflicts all of them > in each package which would be a mess) we might introduce an option or just > a script to cleanup nvidia-current (or nvidia470) installation just before > installing another series. Please no, no "script hacks" to "avoid" proper packaging Setting for errata for now... Keywords:
(none) =>
FOR_ERRATA9 Created attachment 13818 [details]
fix for nvidia470 for upgrading from mga8
(In reply to Thomas Backlund from comment #8) > > Oh, they have atleast worked for eons... > we even shipped live isos with atleast 3 different nvidia branches installed > ... > > of course some of it has probably been broken by your rewriting of the > nvidia spec files You're right. The side-by-side installation feature of multiple package versions (e.g. for the x11 libs/modules) was lost during my spec rewriting in 2020 for mga8, at the GLVND migration times. > > > > E.g. nvidia470 > > dkms modules can coexist with dkms nvidia-current (actually not conflicting > > in the spec file) but in an installation with two different nvidia cards can > > they be modprobed together? > > Yes, when they support different hw they can. if they support the same hw, > then the driver loading first will claim the hw. > > > Ditto for the libraries > > x11-driver-video-nvidia-current and x11-driver-video-nvidia470 are providing > > same major version of libraries, which one to take in a two card > > installation? (or for nvidia-current-utils and nvidia470-utils too). > > > > well, not all packages can maye get parallell installable anymore due to how > they are now packaged, but atleast core bits need to be... > > for example with 2 different nvidia gpus in the system, you might want to > run one for display stuff, and the other maybe for cuda stuff, in which case > you need to be able to install dkms-nvidia* for both, and macching cuda > packages... Actually the dkms-nvidia* can be installed side-by-side (not the x11-driver-video-*). I attached the spec with the versioned conflicts. However, IMHO, it would be beter if the Conflicts are unversioned (and the same, specular, added to nvidia-current), because in this way we can switch from one set to the other directly from drakx11 (e.g. from "GeForce 635 to GeForce 920" to "GeForce 735 and later" and viceversa, so that drakx11 will uninstall one set and install the other). > so that drakx11 will uninstall one set and install the other
But what if you have two GPU that need different drivers?
Probably not common, and I dont know how that ever worked with drakx11.
(In reply to Morgan Leijström from comment #12) > > so that drakx11 will uninstall one set and install the other > > But what if you have two GPU that need different drivers? > > Probably not common, and I dont know how that ever worked with drakx11. From what I could test in the pre-GLVND packages (e.g. in mga5, 6, 7), the typical scenario with an old nvidia card for display (and older driver) and a more recent card for compute engine e.g. in CUDA/HPC (e.g. a Tesla card) with a newer and different driver, I wasn't ever able to get them working together, at least with our standard packages and for the basics in one card. At the end only one card was working. I can't say it's not possible a priori with some further tweaks, but with all what I tried I wasn't able to get them working together. In the end you opted (for the display) to get an NVIDIA card even the cheapest possible, or a used one, but which is supported with the same driver level as the one for the compute engine. So no regression if only one nvidia driver can be installed at the same time. So if people need more than one nvidia graphics card, they need to be using use same driver. What about one nvidia and one AMD? (Tryng to get his somewhat clear - we ought to have it documented somewhere) (In reply to Morgan Leijström from comment #14) > So no regression if only one nvidia driver can be installed at the same time. > > So if people need more than one nvidia graphics card, they need to be using > use same driver. From what I tried it's so. Maybe with some further tweak exists and it could be possible, but I had exausted my attempts. Here they say it's not possible, nor in any OS: https://forums.developer.nvidia.com/t/setting-up-two-different-drivers-for-two-different-graphics-card/42252/23 > > What about one nvidia and one AMD? > > (Tryng to get his somewhat clear - we ought to have it documented somewhere) Speaking of discrete cards (and not entering in complexity of iGPUs, Optimus, and other hybrid graphics, etc.), an Nvidia and AMD for what? Both for display? One for computation engine with CUDA (e.g. NVIDIA for CUDA and AMD for display/graphics)? Both for two different computation engine (e.g. NVIDIA on CUDA, and AMD with OpenCL maybe with ROCm to be installed from extra repositories, out of mga packages)? Or to use the PCI GPU passthrough, e.g. one card for Linux and the other card reserved for instance for another OS using the extra card on the emulated OS with native drivers? Multiple displays? Xdmx? AFAIK drakx11 configures one card only at time and doesn't cycle trough all the graphics cards recognized to configure all of them together. But manually should be a matter of adding a "Device" Section in xorg.conf with Driver/BusID pair for each card for two displays. To do a better documentation, maybe for each scenario it could be done a) a real test from who owns both the hardware, b) an example. So for now lets just drakx11 be good at configuring one card. And try to not be in the way for users who try to do something more. (In reply to Morgan Leijström from comment #16) > So for now lets just drakx11 be good at configuring one card. > > And try to not be in the way for users who try to do something more. You might try to get a cheap AMD card, to bundle with the actual NVidia one, and see what happens. It uses the free driver, so there shouldn't be driver packages conflicts (maybe not an RTX 4090 bundled with 7900XTX because both are 3X in size and requires at least 6 free slots of space, 3.5 for 4090 and 2.5 for 7900XTX). I uploaded a version with explicit conflicts between nvidia470 and nvidia-current in nonfree/updates_testing. Now selecting one series (e.g. nvidia470) in drakx11 would uninstall the other series if installed (e.g. nvidia-current). With the current version in nonfree/updates_testing it should be possible to switch from nvidia470 to nvidia-current and viceversa directly from drakx11, at least for cards which are supported by both drivers. Please test too. System is using 470 (selected manually, see comment 0) System is updated. Testing now simply selecting the default nvidia current, i get conflicts, i.e nvidia-current-utils-525.116.04-1.mga9.nonfree.x86_64 conflicts nvidia470-utils-470.182.03-2.mga9.nonfree.x86_64 Have you enabled the nonfree/updates_testing repo? Have you installed ldetect-lst-0.6.50-1.mga9? duh :) Yes forgot _testing repo Updated now to the 470.182.03-3 version packages. > Have you installed ldetect-lst-0.6.50-1.mga9? Yep now - and now "NVIDIA GeForce 635 to GeForce 920" is preselected. - which is nvidia470 and correct for at least my version of GTX450 Rebooted to verify new version 470 is really OK. Again under Plasma, drakx11 (from MCC), I (now manually) selected "745 series and later, and it seem to work. Rebooted, working extremely well!? Installed is a mix of 470 and 525: $ rpm -qa | grep nvidia lib64nvidia-egl-wayland1-1.1.11-1.mga9 dkms-nvidia470-470.182.03-3.mga9.nonfree nvidia470-doc-html-470.182.03-3.mga9.nonfree dkms-nvidia-current-525.116.04-2.mga9.nonfree nvidia-current-utils-525.116.04-2.mga9.nonfree nvidia-current-doc-html-525.116.04-2.mga9.nonfree nvidia-current-cuda-opencl-525.116.04-2.mga9.nonfree x11-driver-video-nvidia-current-525.116.04-2.mga9.nonfree In journal i see 525 is loaded: kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 525.116.04 Thu Apr 27 17:57:02 UTC 2023 Both are installed: dkms-autorebuild.sh[1170]: nvidia470 (470.182.03-3.mga9.nonfree): Already installed on this kernel. dkms-autorebuild.sh[1170]: nvidia-current (525.116.04-2.mga9.nonfree): Already installed on this kernel. $ inxi -G Graphics: Device-1: NVIDIA GM107 [GeForce GTX 750] driver: nvidia v: 525.116.04 Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 driver: X: loaded: nvidia,v4l gpu: nvidia resolution: 3840x2160~60Hz API: OpenGL v: 4.6.0 NVIDIA 525.116.04 renderer: NVIDIA GeForce GTX 750/PCIe/SSE2 In xorg.conf, BoardName seem wrong... is it set by default selection and not my manual selection of "745 series and later" ? Or old and not updated? Section "Device" Identifier "device1" BoardName "NVIDIA GeForce 635 to GeForce 920" Driver "nvidia" Option "DPMS" Option "DynamicTwinView" "false" Option "AddARGBGLXVisuals" EndSection Anyway, I assume it to be really using the 525 driver. ** BUT! ** This is a surprise: now the 525 driver works well with this card! As well as 470 do. Or is it really using 470 somehow? Maybe 525 now is really OK with this GTX750 with the newer kernel and possibly other updates since the negative test result in comment 0. If so, that change in ldetect-lst should be reversed. And we need not open a bug upstream. I will keep running 525 and report any problems. For starter, I will remove dkms-nvidia470 and reboot. Yes, I've not added the conflicts between dkms-nvidia-current and dkms-nvidia470, so you are basically using the the kernel modules of nvidia470 but the libraries (glx, X11) of 525.xx. Of course if you install both dkms, only one will be choosen (and will be a mess). At this point probably is better to add the conflicts also between the dkms-nvidia470 and dkms-nvidia-current so to have a cleaner installation and not strange derives. For nvidia470-doc-html-470.182.03-3.mga9.nonfree they don't conflicts because in theory you can install with nvidia-current-doc-html side-by-side without conflicting, in the end it's just documentation. But for keeping things cleaner they could conflict too. > so you are basically using the the kernel modules of nvidia470 > but the libraries (glx, X11) of 525.xx. Ah. And that happened to work well. Tested: With no 470 package, it boot to black screen instead of DM. > probably is better to add the conflicts also between the dkms-nvidia470 and > dkms-nvidia-current so to have a cleaner installation and not strange derives. Yep... > For nvidia470-doc-html-470.182.03-3.mga9.nonfree they don't conflicts because > in theory you can install with nvidia-current-doc-html side-by-side without > conflicting, in the end it's just documentation. But for keeping things > cleaner they could conflict too. Clean and consistent :) I am back on 470 only packages so i can test again to switch when you have added more conflicts. Have you also conflicts on the -lib32 and -devel packages? (i dont know, did not have them installed) The problem with modules with multiple dkms-nvidia-current and dkms-nvidia470 is that it loads the right main nvidia470.ko kernel modules (because it's under a proper /etc/modprobe.d/display-driver.conf which corresponds to the right driver, but then the other modules like nvidia-drm.ko (which are called from the first one) might not, and might give unpredictable results. Problem with adding a dkms-* conflicts, is that when you install one package, it will uninstall the other one that it's running, like when you uninstall a running kernel. So the problem remain in some side, which is not having it robust: if you don't add any dkms conflicts you need a cleanup mechanish that later would install the unmatching version of dkms-* modules. Another alternative is to have also the other nvidia submodules (i.e. nvidia-drm, nvidia-uvm, etc.) renamed as the package version. But that's need to be explored deeply (admitting it would possible with the internals calls). Per earlier tests in another thread this seems to work. Resolution:
(none) =>
FIXED |