Bug 31874 - drakx11 installing nvidia470 fails with conflicts to earlier installed nvidia-current
Summary: drakx11 installing nvidia470 fails with conflicts to earlier installed nvidia...
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: RPM Packages (show other bugs)
Version: Cauldron
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: Kernel and Drivers maintainers
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-06 18:04 CEST by Morgan Leijström
Modified: 2023-06-21 08:31 CEST (History)
1 user (show)

See Also:
Source RPM:
CVE:
Status comment:


Attachments
fix for nvidia470 to avoid conflicts with nvidia-current (2.67 KB, patch)
2023-05-07 13:54 CEST, Giuseppe Ghibò
Details | Diff
patch to move gtx 750 to nvidia470 (1.34 KB, patch)
2023-05-07 14:05 CEST, Giuseppe Ghibò
Details | Diff
patch for spec file for GTX 750 and nvidia470 (1023 bytes, patch)
2023-05-07 14:06 CEST, Giuseppe Ghibò
Details | Diff
fix for nvidia470 for upgrading from mga8 (2.77 KB, patch)
2023-05-08 14:59 CEST, Giuseppe Ghibò
Details | Diff

Description Morgan Leijström 2023-05-06 18:04:17 CEST
I have an old Nvidia GTX750 [GM107]

This system was freshly installed last week using previous version of internal beta2 classic installer.

I first tried the suggested nvidia-current
(it is painfully slow seem to fall back to software rendering, and also trig bug 29845)

I then tried nouveau: 2,5 times better, but suspend-resume-fail :(

So I tried modesetting: 50% faster than nouveau, and suspend-resume works :)

Then I decided to try nvidia470, as it say "Geforce 635 to 920".
But drakx11 failed, said it fall back to nouveau.
Looking in journal I saw conflicts with nvidia-current packages.

*  After using drakrpm to uninstall all *nvidia*,  *
*  drakx11 installed nvidia470 OK.                 *

(and this is the fastest driver, and suspend-resume works - Hurray :) )


I guess this is a packaging bug, so I assign drivers people.
Comment 1 Giuseppe Ghibò 2023-05-07 13:52:30 CEST
I add a few infos.

According to nvidia docs NVIDIA GTX 750 is supported by both 525.xx series (i.e. actually nvidia-current, see https://www.nvidia.com/Download/driverResults.aspx/204639/en-us/ and 470.xx series (i.e. nvidia470, see https://www.nvidia.com/Download/driverResults.aspx/200634/en-us/).

drakx11 automatically suggests the driver with higher rank (in this case nvidia-current).

Actually seems there are two kind of GTX 750 cards, one with PCI IDs "0x1381" and the other with "0x1407". According to your lspcidrake 
(vendor:10de device:1381 subv:3842 subd:2751), you have the version with IDs "0x1381" which is probably the older GTX 750 model that has difficulties with 525.xx series.

Probably this is an upstream bug and should be reported to nvidia too (IMHO only the newer model is supported by the newer drivers). A further test would be to package locally the an nvidia530 (530.41.03) and see whether it fails too.

CC: (none) => ghibomgx

Comment 2 Giuseppe Ghibò 2023-05-07 13:54:57 CEST
Created attachment 13811 [details]
fix for nvidia470 to avoid conflicts with nvidia-current

Here is a suggested patch for nvidia470 SPEC to avoid conflicts with nvidia-current when the latter is previously installed and one switch to another series in drakx11.
Comment 3 Giuseppe Ghibò 2023-05-07 14:05:08 CEST
Created attachment 13812 [details]
patch to move gtx 750 to nvidia470

As reminder I add a patch to move the GTX 750 (older device ID 0x1381) to nvidia470 driver.

Resulting binary package can be tested here: https://copr-be.cloud.fedoraproject.org/results/ghibo/mageia9-bonus/mageia-cauldron-x86_64/05893859-ldetect-lst/
Comment 4 Giuseppe Ghibò 2023-05-07 14:06:38 CEST
Created attachment 13813 [details]
patch for spec file for GTX 750 and nvidia470

Reminder for the SPEC file (on final release it should be moved to git upstream ldetect-lst).
Comment 5 Thomas Backlund 2023-05-07 15:00:27 CEST
(In reply to Giuseppe Ghibò from comment #2)
> Created attachment 13811 [details]
> fix for nvidia470 to avoid conflicts with nvidia-current
> 
> Here is a suggested patch for nvidia470 SPEC to avoid conflicts with
> nvidia-current when the latter is previously installed and one switch to
> another series in drakx11.

that means they would not be parallell installable anymore..

tchnically nvidia470 should only conflict nvidia-current <= nvidia470 version as that is the real conflict
Comment 6 Morgan Leijström 2023-05-07 15:12:56 CEST
So no conflict wanted because there may be more than one GPU installed.

But drakx11 must be able to switch driver per user request.
Comment 7 Giuseppe Ghibò 2023-05-07 15:22:55 CEST
(In reply to Thomas Backlund from comment #5)

> (In reply to Giuseppe Ghibò from comment #2)
> > Created attachment 13811 [details]
> > fix for nvidia470 to avoid conflicts with nvidia-current
> > 
> > Here is a suggested patch for nvidia470 SPEC to avoid conflicts with
> > nvidia-current when the latter is previously installed and one switch to
> > another series in drakx11.
> 
> that means they would not be parallell installable anymore..
> 
> tchnically nvidia470 should only conflict nvidia-current <= nvidia470
> version as that is the real conflict

Isn't when the version is older already conflicting in the current version.

But what I wonder is do parallel installation ever worked? E.g. nvidia470 dkms modules can coexist with dkms nvidia-current (actually not conflicting in the spec file) but in an installation with two different nvidia cards can they be modprobed together? Ditto for the libraries x11-driver-video-nvidia-current and x11-driver-video-nvidia470 are providing same major version of libraries, which one to take in a two card installation? (or for nvidia-current-utils and nvidia470-utils too).

An alternative instead of trying to add conflicts to packages (as things would get more complicate if we would introduce a 3rd or a 4th series, e.g. suppose by example nvidia530, then you would have to conflicts all of them in each package which would be a mess) we might introduce an option or just a script to cleanup nvidia-current (or nvidia470) installation just before installing another series.
Comment 8 Thomas Backlund 2023-05-07 16:37:46 CEST
(In reply to Giuseppe Ghibò from comment #7)
> (In reply to Thomas Backlund from comment #5)
> 
> > (In reply to Giuseppe Ghibò from comment #2)
> > > Created attachment 13811 [details]
> > > fix for nvidia470 to avoid conflicts with nvidia-current
> > > 
> > > Here is a suggested patch for nvidia470 SPEC to avoid conflicts with
> > > nvidia-current when the latter is previously installed and one switch to
> > > another series in drakx11.
> > 
> > that means they would not be parallell installable anymore..
> > 
> > tchnically nvidia470 should only conflict nvidia-current <= nvidia470
> > version as that is the real conflict
> 
> Isn't when the version is older already conflicting in the current version.
> 

No.

in mga8 we currently have mvidia-current-470.161.03-1.mga8.nonfree

That R470 branch is now packaged as nvidia470 packages in mga9 and nvidia-current is updated to R525 branch, currently 525.116.03

The current R470 driver in mga9 is now at: 470.182.03-2.mga9

So for upgrades to work from mga8 -> mga9 you need to inform rpm of that branch renaming change, so nvidia470.spec should have:

Conflicts: <matching nvidia-current (sub)package> <= %{version}-%{release} 

in proper places...



> But what I wonder is do parallel installation ever worked? 

Oh, they have atleast worked for eons...  
we even shipped live isos with atleast 3 different nvidia branches installed ...

of course some of it has probably been broken by  your rewriting of the nvidia spec files


> E.g. nvidia470
> dkms modules can coexist with dkms nvidia-current (actually not conflicting
> in the spec file) but in an installation with two different nvidia cards can
> they be modprobed together? 

Yes, when they support different hw they can. if they support the same hw, then the driver loading first will claim the hw.

> Ditto for the libraries
> x11-driver-video-nvidia-current and x11-driver-video-nvidia470 are providing
> same major version of libraries, which one to take in a two card
> installation? (or for nvidia-current-utils and nvidia470-utils too).
> 

well, not all packages can maye get parallell installable anymore due to how they are now packaged, but atleast core bits need to be...

for example with 2 different nvidia gpus in the system, you might want to run one for display stuff, and the other maybe for cuda stuff, in which case you need to be able to install dkms-nvidia* for both, and  macching cuda packages...

We even fixed a packaging bug due to that example for mga7 or 8 so its not a theoretical use case...

> An alternative instead of trying to add conflicts to packages (as things
> would get more complicate if we would introduce a 3rd or a 4th series, e.g.
> suppose by example nvidia530, then you would have to conflicts all of them
> in each package which would be a mess) we might introduce an option or just
> a script to cleanup nvidia-current (or nvidia470) installation just before
> installing another series.


Please no, no "script hacks" to "avoid" proper packaging
Comment 9 Morgan Leijström 2023-05-07 23:47:58 CEST
Setting for errata for now...

Keywords: (none) => FOR_ERRATA9

Comment 10 Giuseppe Ghibò 2023-05-08 14:59:27 CEST
Created attachment 13818 [details]
fix for nvidia470 for upgrading from mga8
Comment 11 Giuseppe Ghibò 2023-05-11 11:34:20 CEST
(In reply to Thomas Backlund from comment #8)

>
> Oh, they have atleast worked for eons...  
> we even shipped live isos with atleast 3 different nvidia branches installed
> ...
> 
> of course some of it has probably been broken by  your rewriting of the
> nvidia spec files

You're right. The side-by-side installation feature of multiple
package versions (e.g. for the x11 libs/modules) was lost during my spec rewriting in 2020 for mga8, at the GLVND migration times.

> 
> 
> > E.g. nvidia470
> > dkms modules can coexist with dkms nvidia-current (actually not conflicting
> > in the spec file) but in an installation with two different nvidia cards can
> > they be modprobed together? 
> 
> Yes, when they support different hw they can. if they support the same hw,
> then the driver loading first will claim the hw.
> 
> > Ditto for the libraries
> > x11-driver-video-nvidia-current and x11-driver-video-nvidia470 are providing
> > same major version of libraries, which one to take in a two card
> > installation? (or for nvidia-current-utils and nvidia470-utils too).
> > 
> 
> well, not all packages can maye get parallell installable anymore due to how
> they are now packaged, but atleast core bits need to be...
> 
> for example with 2 different nvidia gpus in the system, you might want to
> run one for display stuff, and the other maybe for cuda stuff, in which case
> you need to be able to install dkms-nvidia* for both, and  macching cuda
> packages...

Actually the dkms-nvidia* can be installed side-by-side (not the x11-driver-video-*).

I attached the spec with the versioned conflicts. However, IMHO, it would be beter if the Conflicts are unversioned (and the same, specular, added to nvidia-current), because in this way we can switch from one set to the other directly from drakx11 (e.g. from "GeForce 635 to GeForce 920" to "GeForce 735 and later" and viceversa, so that drakx11 will uninstall one set and install the other).
Comment 12 Morgan Leijström 2023-05-11 12:24:21 CEST
> so that drakx11 will uninstall one set and install the other

But what if you have two GPU that need different drivers?

Probably not common, and I dont know how that ever worked with drakx11.
Comment 13 Giuseppe Ghibò 2023-05-11 17:13:45 CEST
(In reply to Morgan Leijström from comment #12)
> > so that drakx11 will uninstall one set and install the other
> 
> But what if you have two GPU that need different drivers?
> 
> Probably not common, and I dont know how that ever worked with drakx11.

From what I could test in the pre-GLVND packages (e.g. in mga5, 6, 7), the typical scenario with an old nvidia card for display (and older driver) and a more recent card for compute engine e.g. in CUDA/HPC (e.g. a Tesla card) with a newer and different driver, I wasn't ever able to get them working together, at least with our standard packages and for the basics in one card. At the end only one card was working. I can't say it's not possible a priori with some further tweaks, but with all what I tried I wasn't able to get them working together. In the end you opted (for the display) to get an NVIDIA card even the cheapest possible, or a used one, but which is supported with the same driver level as the one for the compute engine.
Comment 14 Morgan Leijström 2023-05-11 17:21:38 CEST
So no regression if only one nvidia driver can be installed at the same time.

So if people need more than one nvidia graphics card, they need to be using use same driver.

What about one nvidia and one AMD?

(Tryng to get his somewhat clear - we ought to have it documented somewhere)
Comment 15 Giuseppe Ghibò 2023-05-11 18:14:17 CEST
(In reply to Morgan Leijström from comment #14)

> So no regression if only one nvidia driver can be installed at the same time.
> 
> So if people need more than one nvidia graphics card, they need to be using
> use same driver.

From what I tried it's so. Maybe with some further tweak exists and it could be possible, but I had exausted my attempts.

Here they say it's not possible, nor in any OS:

https://forums.developer.nvidia.com/t/setting-up-two-different-drivers-for-two-different-graphics-card/42252/23

> 
> What about one nvidia and one AMD?
> 
> (Tryng to get his somewhat clear - we ought to have it documented somewhere)

Speaking of discrete cards (and not entering in complexity of iGPUs, Optimus, and other hybrid graphics, etc.), an Nvidia and AMD for what? Both for display? One for computation engine with CUDA (e.g. NVIDIA for CUDA and AMD for display/graphics)? Both for two different computation engine (e.g. NVIDIA on CUDA, and AMD with OpenCL maybe with ROCm to be installed from extra repositories, out of mga packages)? Or to use the PCI GPU passthrough, e.g. one card for Linux and the other card reserved for instance for another OS using the extra card on the emulated OS with native drivers? Multiple displays? Xdmx?

AFAIK drakx11 configures one card only at time and doesn't cycle trough all the graphics cards recognized to configure all of them together. But manually should be a matter of adding a "Device" Section in xorg.conf with Driver/BusID pair for each card for two displays.

To do a better documentation, maybe for each scenario it could be done a) a real test from who owns both the hardware, b) an example.
Comment 16 Morgan Leijström 2023-05-11 20:43:52 CEST
So for now lets just drakx11 be good at configuring one card.

And try to not be in the way for users who try to do something more.
Comment 17 Giuseppe Ghibò 2023-05-11 21:01:08 CEST
(In reply to Morgan Leijström from comment #16)
> So for now lets just drakx11 be good at configuring one card.
> 
> And try to not be in the way for users who try to do something more.

You might try to get a cheap AMD card, to bundle with the actual NVidia one, and see what happens. It uses the free driver, so there shouldn't be driver packages conflicts (maybe not an RTX 4090 bundled with 7900XTX because both are 3X in size and requires at least 6 free slots of space, 3.5 for 4090 and 2.5 for 7900XTX).
Comment 18 Giuseppe Ghibò 2023-05-17 21:57:54 CEST
I uploaded a version with explicit conflicts between nvidia470 and nvidia-current in nonfree/updates_testing. Now selecting one series (e.g. nvidia470) in drakx11 would uninstall the other series if installed (e.g. nvidia-current).
Comment 19 Giuseppe Ghibò 2023-05-25 19:04:27 CEST
With the current version in nonfree/updates_testing it should be possible to switch from nvidia470 to nvidia-current and viceversa directly from drakx11, at least for cards which are supported by both drivers. Please test too.
Comment 20 Morgan Leijström 2023-05-26 15:15:43 CEST
System is using 470 (selected manually, see comment 0)

System is updated.

Testing now simply selecting the default nvidia current, i get conflicts, i.e

 nvidia-current-utils-525.116.04-1.mga9.nonfree.x86_64
conflicts
 nvidia470-utils-470.182.03-2.mga9.nonfree.x86_64
Comment 21 Giuseppe Ghibò 2023-05-26 15:16:54 CEST
Have you enabled the nonfree/updates_testing repo?
Comment 22 Giuseppe Ghibò 2023-05-26 15:27:03 CEST
Have you installed ldetect-lst-0.6.50-1.mga9?
Comment 23 Morgan Leijström 2023-05-26 17:34:16 CEST
duh  :)  Yes forgot _testing repo

Updated now to the 470.182.03-3 version packages.

> Have you installed ldetect-lst-0.6.50-1.mga9?

Yep now - and now "NVIDIA GeForce 635 to GeForce 920" is preselected.
- which is nvidia470 and correct for at least my version of GTX450

Rebooted to verify new version 470 is really OK.


Again under Plasma, drakx11 (from MCC),
I (now manually) selected "745 series and later, and it seem to work.

Rebooted, working extremely well!?

Installed is a mix of 470 and 525:
$ rpm -qa | grep nvidia
lib64nvidia-egl-wayland1-1.1.11-1.mga9
dkms-nvidia470-470.182.03-3.mga9.nonfree
nvidia470-doc-html-470.182.03-3.mga9.nonfree
dkms-nvidia-current-525.116.04-2.mga9.nonfree
nvidia-current-utils-525.116.04-2.mga9.nonfree
nvidia-current-doc-html-525.116.04-2.mga9.nonfree
nvidia-current-cuda-opencl-525.116.04-2.mga9.nonfree
x11-driver-video-nvidia-current-525.116.04-2.mga9.nonfree



In journal i see 525 is loaded:
kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  525.116.04  Thu Apr 27 17:57:02 UTC 2023

Both are installed:
dkms-autorebuild.sh[1170]: nvidia470 (470.182.03-3.mga9.nonfree): Already installed on this kernel.
dkms-autorebuild.sh[1170]: nvidia-current (525.116.04-2.mga9.nonfree): Already installed on this kernel.

$ inxi -G
Graphics:
  Device-1: NVIDIA GM107 [GeForce GTX 750] driver: nvidia v: 525.116.04
  Display: x11 server: X.org v: 1.21.1.8 with: Xwayland v: 22.1.9 driver: X:
    loaded: nvidia,v4l gpu: nvidia resolution: 3840x2160~60Hz
  API: OpenGL v: 4.6.0 NVIDIA 525.116.04 renderer: NVIDIA GeForce GTX
    750/PCIe/SSE2

In xorg.conf, BoardName seem wrong... is it set by default selection and not my manual selection of "745 series and later" ?  Or old and not updated?
Section "Device"
    Identifier "device1"
    BoardName "NVIDIA GeForce 635 to GeForce 920"
    Driver "nvidia"
    Option "DPMS"
    Option "DynamicTwinView" "false"
    Option "AddARGBGLXVisuals"
EndSection



Anyway, I assume it to be really using the 525 driver.

 ** BUT! **
This is a surprise: now the 525 driver works well with this card!
As well as 470 do.  Or is it really using 470 somehow?

Maybe 525 now is really OK with this GTX750 with the newer kernel and possibly other updates since the negative test result in comment 0.

If so, that change in ldetect-lst should be reversed.

And we need not open a bug upstream.

I will keep running 525 and report any problems.

For starter, I will remove dkms-nvidia470 and reboot.
Comment 24 Giuseppe Ghibò 2023-05-26 17:45:18 CEST
Yes, I've not added the conflicts between dkms-nvidia-current and dkms-nvidia470, so you are basically using the the kernel modules of nvidia470 but the libraries (glx, X11) of 525.xx. Of course if you install both dkms, only one will be choosen (and will be a mess). At this point probably is better to add the conflicts also between the dkms-nvidia470 and dkms-nvidia-current so to have a cleaner installation and not strange derives.

For nvidia470-doc-html-470.182.03-3.mga9.nonfree they don't conflicts because in theory you can install with nvidia-current-doc-html side-by-side without conflicting, in the end it's just documentation. But for keeping things cleaner they could conflict too.
Comment 25 Morgan Leijström 2023-05-26 18:23:51 CEST
> so you are basically using the the kernel modules of nvidia470
> but the libraries (glx, X11) of 525.xx.

Ah. And that happened to work well.

Tested: With no 470 package, it boot to black screen instead of DM.

> probably is better to add the conflicts also between the dkms-nvidia470 and
> dkms-nvidia-current so to have a cleaner installation and not strange derives.

Yep...

> For nvidia470-doc-html-470.182.03-3.mga9.nonfree they don't conflicts because 
> in theory you can install with nvidia-current-doc-html side-by-side without
> conflicting, in the end it's just documentation. But for keeping things
> cleaner they could conflict too.

Clean and consistent :)
Comment 26 Morgan Leijström 2023-05-26 18:39:15 CEST
I am back on 470 only packages so i can test again to switch when you have added more conflicts.

Have you also conflicts on the -lib32 and -devel packages?
(i dont know, did not have them installed)
Comment 27 Giuseppe Ghibò 2023-05-27 19:51:57 CEST
The problem with modules with multiple dkms-nvidia-current and dkms-nvidia470 is that it loads the right main nvidia470.ko kernel modules (because it's under a proper /etc/modprobe.d/display-driver.conf which corresponds to the right driver, but then the other modules like nvidia-drm.ko (which are called from the first one) might not, and might give unpredictable results.

Problem with adding a  dkms-* conflicts, is that when you install one package, it will uninstall the other one that it's running, like when you uninstall a running kernel.

So the problem remain in some side, which is not having it robust: if you don't add any dkms conflicts you need a cleanup mechanish that later would install the unmatching version of dkms-* modules. 

Another alternative is to have also the other nvidia submodules (i.e. nvidia-drm, nvidia-uvm, etc.) renamed as the package version. But that's need to be explored deeply (admitting it would possible with the internals calls).
Comment 28 Morgan Leijström 2023-06-21 08:31:30 CEST
Per earlier tests in another thread this seems to work.

Resolution: (none) => FIXED
Status: NEW => RESOLVED
Keywords: FOR_ERRATA9 => (none)


Note You need to log in before you can comment on or make changes to this bug.