Bug 32646 - Bring ROCm HIP
Summary: Bring ROCm HIP
Status: NEW
Alias: None
Product: Mageia
Classification: Unclassified
Component: New RPM package request (show other bugs)
Version: 9
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: christian barranco
QA Contact:
URL:
Whiteboard:
Keywords: advisory
Depends on:
Blocks:
 
Reported: 2023-12-22 16:29 CET by christian barranco
Modified: 2024-03-24 21:01 CET (History)
9 users (show)

See Also:
Source RPM: rocm-hip, python-cppheaderparser
CVE:
Status comment:


Attachments
mixbench opencl test for RX6600 GPU (6.78 KB, text/plain)
2024-01-06 23:56 CET, christian barranco
Details
mixbench HIP test for RX6600 GPU (8.00 KB, text/plain)
2024-01-06 23:56 CET, christian barranco
Details
mixbench opencl test for 7530U iGPU (6.76 KB, text/plain)
2024-01-06 23:57 CET, christian barranco
Details
mixbench HIP test for 7530U iGPU (1.41 KB, text/plain)
2024-01-06 23:57 CET, christian barranco
Details
strace of hipcc (72.94 KB, text/plain)
2024-01-10 01:00 CET, PC LX
Details

christian barranco 2023-12-22 17:35:06 CET

CC: (none) => animtim, ezequiel_partida, ghibomgx, joselp, marja11

Comment 1 christian barranco 2023-12-22 18:26:54 CET Comment hidden (obsolete)

Assignee: chb0 => qa-bugs

Comment 2 christian barranco 2023-12-22 18:45:46 CET
How to test?

I welcome ideas.

One way is to install Blender
Run blender, go to menu Edit>Preferences>System
You should then be able to see and to see you AMD GPU in the HIP section.
Marja Van Waes 2023-12-23 22:13:12 CET

Source RPM: (none) => rocm-hip, python-cppheaderparser

Comment 3 Marja Van Waes 2023-12-23 22:21:31 CET
Advisory from comment 1 added to SVN. Please remove the "advisory" keyword if it needs to be changed. It also helps when obsolete advisories are tagged as "obsolete"

Keywords: (none) => advisory

PC LX 2023-12-24 02:46:24 CET

CC: (none) => mageia

Comment 4 PC LX 2023-12-24 10:50:50 CET
I would like to give this a test but what are the supported GPUs? From the AMD site is seems that only a few cards are supported. Is this correct?


From https://rocm.docs.amd.com/projects/radeon/en/latest/docs/compatibility.html
"""""""
GPU support matrix

ROCm Version : 5.7.0
Radeon Software for Linux® Version: 23.20.00.48
Supported AMD Radeon Hardware:
  AMD Radeon RX 7900 XTX
  AMD Radeon RX 7900 XT
  AMD Radeon PRO W7900
"""""""
Comment 5 christian barranco 2023-12-24 11:11:41 CET
Thanks PC LX for your support.
The comprehensive list of supported GPU is unclear but if much bigger than the ones you listed; this list is more the new ones, according me.

There is a long thread here:
https://github.com/ROCm/ROCm/issues/1714

providing, at the end, an interesting link from Debian that I will add to the description (sorry Marja, I will need to update the advisory):
https://salsa.debian.org/rocm-team/community/team-project/-/wikis/supported-gpu-list

I have tested it successfully with a RX6600.
christian barranco 2023-12-24 11:14:20 CET

Keywords: advisory => (none)

Comment 6 Marja Van Waes 2023-12-24 17:47:34 CET
(In reply to christian squidf from comment #5)

> 
> providing, at the end, an interesting link from Debian that I will add to
> the description (sorry Marja, I will need to update the advisory):
> https://salsa.debian.org/rocm-team/community/team-project/-/wikis/supported-
> gpu-list
> 

No problem, because you let me know and removed the advisory keyword :-)

The advisory in SVN has been update with that link.

Keywords: (none) => advisory

Comment 7 Herman Viaene 2023-12-26 11:38:11 CET
List of rpm's please.

CC: (none) => herman.viaene

Comment 8 christian barranco 2023-12-27 10:39:45 CET Comment hidden (obsolete)
Comment 9 christian barranco 2023-12-29 10:35:16 CET
Successfully tested on:

- MSI Modern 15 B7M, Ryzen 5 7530U, iGPU AMD Barcelo vendor: Micro-Star MSI driver: amdgpu v: kernel arch: GCN-5 code: Vega
- Desktop computer with Ryzen 9 5900X, GPU AMD Navi 23 [Radeon RX 6600/6600 XT/6600M] vendor: Sapphire driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x

What other tests are still required?

CC: (none) => davidwhodgins, fri

Comment 10 Giuseppe Ghibò 2023-12-29 10:56:00 CET
(In reply to christian squidf from comment #9)
> Successfully tested on:
> 
> - MSI Modern 15 B7M, Ryzen 5 7530U, iGPU AMD Barcelo vendor: Micro-Star MSI
> driver: amdgpu v: kernel arch: GCN-5 code: Vega
> - Desktop computer with Ryzen 9 5900X, GPU AMD Navi 23 [Radeon RX 6600/6600
> XT/6600M] vendor: Sapphire driver: amdgpu v: kernel arch: RDNA-2 code:
> Navi-2x
> 
> What other tests are still required?

Just to know, under Navi 23, HIP works (e.g. in blender) without either any of amdgpupro-opencl-orca or amdgpupro-opencl-pal?
Comment 11 christian barranco 2023-12-29 11:46:51 CET
(In reply to Giuseppe Ghibò from comment #10)
> (In reply to christian squidf from comment #9)
> > Successfully tested on:
> > 
> > - MSI Modern 15 B7M, Ryzen 5 7530U, iGPU AMD Barcelo vendor: Micro-Star MSI
> > driver: amdgpu v: kernel arch: GCN-5 code: Vega
> > - Desktop computer with Ryzen 9 5900X, GPU AMD Navi 23 [Radeon RX 6600/6600
> > XT/6600M] vendor: Sapphire driver: amdgpu v: kernel arch: RDNA-2 code:
> > Navi-2x
> > 
> > What other tests are still required?
> 
> Just to know, under Navi 23, HIP works (e.g. in blender) without either any
> of amdgpupro-opencl-orca or amdgpupro-opencl-pal?

Correct, blender works and there is no need of any of the proprietary drivers you mentioned. However, some libs from the ROCm stack will be installed; I listed them on the advisory.
Comment 12 Giuseppe Ghibò 2023-12-29 12:11:40 CET
(In reply to christian squidf from comment #11)
> (In reply to Giuseppe Ghibò from comment #10)
> > (In reply to christian squidf from comment #9)
> > > Successfully tested on:
> > > 
> > > - MSI Modern 15 B7M, Ryzen 5 7530U, iGPU AMD Barcelo vendor: Micro-Star MSI
> > > driver: amdgpu v: kernel arch: GCN-5 code: Vega
> > > - Desktop computer with Ryzen 9 5900X, GPU AMD Navi 23 [Radeon RX 6600/6600
> > > XT/6600M] vendor: Sapphire driver: amdgpu v: kernel arch: RDNA-2 code:
> > > Navi-2x
> > > 
> > > What other tests are still required?
> > 
> > Just to know, under Navi 23, HIP works (e.g. in blender) without either any
> > of amdgpupro-opencl-orca or amdgpupro-opencl-pal?
> 
> Correct, blender works and there is no need of any of the proprietary
> drivers you mentioned. However, some libs from the ROCm stack will be
> installed; I listed them on the advisory.

Does that apply also to opencl? As I quickly tried some time ago on a Navi 10  card (of a Radeon Pro Navi generation), which AFAIK remember should be well supported with free driver only, however without any of the above driver (orca or pal, can't remember), clinfo doesn't show anything, and for instance this bench: https://github.com/ekondis/mixbench won't run.
Comment 13 christian barranco 2023-12-29 12:42:11 CET
(In reply to Giuseppe Ghibò from comment #12)
> (In reply to christian squidf from comment #11)
> > (In reply to Giuseppe Ghibò from comment #10)
> > > (In reply to christian squidf from comment #9)
> > > > Successfully tested on:
> > > > 
> > > > - MSI Modern 15 B7M, Ryzen 5 7530U, iGPU AMD Barcelo vendor: Micro-Star MSI
> > > > driver: amdgpu v: kernel arch: GCN-5 code: Vega
> > > > - Desktop computer with Ryzen 9 5900X, GPU AMD Navi 23 [Radeon RX 6600/6600
> > > > XT/6600M] vendor: Sapphire driver: amdgpu v: kernel arch: RDNA-2 code:
> > > > Navi-2x
> > > > 
> > > > What other tests are still required?
> > > 
> > > Just to know, under Navi 23, HIP works (e.g. in blender) without either any
> > > of amdgpupro-opencl-orca or amdgpupro-opencl-pal?
> > 
> > Correct, blender works and there is no need of any of the proprietary
> > drivers you mentioned. However, some libs from the ROCm stack will be
> > installed; I listed them on the advisory.
> 
> Does that apply also to opencl? As I quickly tried some time ago on a Navi
> 10  card (of a Radeon Pro Navi generation), which AFAIK remember should be
> well supported with free driver only, however without any of the above
> driver (orca or pal, can't remember), clinfo doesn't show anything, and for
> instance this bench: https://github.com/ekondis/mixbench won't run.

Hi. OpenCL should work now after installing rocm-amd-opencl.
Look at: https://bugs.mageia.org/show_bug.cgi?id=32580
Comment 14 christian barranco 2023-12-29 20:36:22 CET Comment hidden (obsolete)

Keywords: advisory => (none)

Comment 15 Marja Van Waes 2023-12-30 17:01:17 CET
No problem, Christian ;-)

Advisory updated to rocm-hip-5.7.1-1.1.mga9 SRPM

Keywords: (none) => advisory

Comment 16 Giuseppe Ghibò 2023-12-30 20:23:02 CET
I tried quickly (also by removing and cleaning up any previous rocm 5.5 package) installing just rocm-amd-opencl and rocm-core package and their deps, however I get weird results. The mixbench (see URL above) in opencl sometimes works sometimes just hangs.

Indeed the most problems comes with blender. As soon as I access to menu edit/preferences/system and then click to tab HIP, apparently it's recognized but kernel immediately crashes with error:

[   70.416962] BUG: kernel NULL pointer dereference, address: 0000000000000260
[   70.416972] #PF: supervisor write access in kernel mode
[   70.416976] #PF: error_code(0x0002) - not-present page
...
[   70.417306]  ? __die+0x1f/0x70
[   70.417313]  ? page_fault_oops+0x159/0x450
[   70.417319]  ? update_load_avg+0x7e/0x780
[   70.417326]  ? srso_alias_return_thunk+0x5/0x7f
[   70.417335]  ? exc_page_fault+0x73/0x170
[   70.417341]  ? asm_exc_page_fault+0x22/0x30
[   70.417350]  ? amdgpu_gmc_set_pte_pde+0x1f/0x30 [amdgpu]
[   70.417534]  amdgpu_vm_cpu_update+0x8e/0x100 [amdgpu]
[   70.417721]  amdgpu_vm_ptes_update+0x2e8/0x8f0 [amdgpu]
[   70.417905]  amdgpu_vm_update_range+0x211/0x6f0 [amdgpu]
[   70.418088]  amdgpu_vm_bo_update+0x1f0/0x600 [amdgpu]
[   70.418270]  amdgpu_gem_va_ioctl+0x4bc/0x500 [amdgpu]
[   70.418447]  ? __pfx_amdgpu_bo_user_destroy+0x10/0x10 [amdgpu]
[   70.418618]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[   70.418792]  drm_ioctl_kernel+0xc1/0x160 [drm]
[   70.418842]  drm_ioctl+0x24c/0x490 [drm]
Comment 17 christian barranco 2023-12-30 23:12:55 CET
Hi Giuseppe. What is your GPU model? Do you have Blender crashing?
For OpenCL, you just need to install rocm-amd-opencl and all deps will follow. 
Have you had to install rocm-core explicitly?
Comment 18 christian barranco 2023-12-30 23:15:43 CET
@PC LX: have you done any tests? Thanks
Comment 19 Giuseppe Ghibò 2023-12-30 23:30:15 CET
(In reply to christian squidf from comment #17)

> Hi Giuseppe. What is your GPU model? Do you have Blender crashing?
> For OpenCL, you just need to install rocm-amd-opencl and all deps will
> follow. 
> Have you had to install rocm-core explicitly?

I tested just quickly had the rocm 5.5 which I used during the partial june/july tests. 5.5 was completely removed (also because the update 5.5->5.7 wasn't smooth, but probably 5.5 was incomplete). I had to install expressely rocm-core because the deps seemed to not retrieve otherwise. Note that blender doesn't crash, but as soon as you go into the edit/preferences/HIP tab, the GUI become totally unresponsive. Just to test the crashes occurs also installing opencl orca packages.

CPU is Ryzen 7 Pro 58xx with Cezanne integrated card. I didn't remember such crashes during june experiments with 5.5. It crashes either 6.5.13-desktop-6.mga9, as well as even with upcoming 6.6.8 (almost 6.6.9).

Note it doesn't freeze the machine. Simple it hangs partially the desktop GUI, but mouse continue to be responsive.

BTW, I suggest to try you too the mixbench (https://github.com/ekondis/mixbench), it also has a small code to get compiled with HIP). it should compile with cmake/make.
Comment 20 christian barranco 2024-01-01 17:21:37 CET
The behavior with Blender is weird. I have never got such a thing on the 2 machines I used for testing. One machine as a RX6600 GPU (Navi) and the other one as a Vega iGPU like yours. 

Rocm 5.5 was made of a fork of ROCm to be able to build it with our LLVM version. OpenCL has never worked and HIP might have partly worked. 

ROCm 5.7.1 is now made of the upstream source, with its specific LLVM version. 
It is one explanation why the update doesn’t work. 

I would recommend to clean everything up and to do again the installation. If installing only rocm-hip (which will trigger all necessary deps) is not enough, there is something not clean ; same with rocm-amd-opencl

Please do note amdgpupro-opencl-pal is the proprietary version for your iGPU; not -orca. 

Before installing again, to check your uninstall is complete:

rpm -qa | grep rocm 
should not return anything 

hsakmt should not be installed either

Folder /etc/OpenCL/vendors should be empty or should not exist
Likewise for /usr/lib64/rocm 

ls /usr/lib64/*amd* should not return anything linked to ROCm


I will check mi bench when I have some free time.
Comment 21 Giuseppe Ghibò 2024-01-02 00:29:26 CET
(In reply to christian squidf from comment #20)
> The behavior with Blender is weird. I have never got such a thing on the 2
> machines I used for testing. One machine as a RX6600 GPU (Navi) and the
> other one as a Vega iGPU like yours. 

but you get it recognized at least or not?

> 
> Rocm 5.5 was made of a fork of ROCm to be able to build it with our LLVM
> version. OpenCL has never worked and HIP might have partly worked. 
> 
> ROCm 5.7.1 is now made of the upstream source, with its specific LLVM
> version. 
> It is one explanation why the update doesn’t work. 
> 
> I would recommend to clean everything up and to do again the installation.
> If installing only rocm-hip (which will trigger all necessary deps) is not
> enough, there is something not clean ; same with rocm-amd-opencl
> 
> Please do note amdgpupro-opencl-pal is the proprietary version for your
> iGPU; not -orca. 
> 
> Before installing again, to check your uninstall is complete:
> 

It was, however with amdgpupro-opencl-pal the mixbench opencl works, while with free driver rocm-amd-opencl it hangs.

BTW, I think amdgpupro-opencl-pal should also add an explicit Conflicts with rocm-amd-opencl as they are mutually exclusives.

What causes also blender hangs is the installation of lib64rocm-hip-devel (which retrieve also lib64hsakmt1 and lib64hsakmt-devel).

Without tha, HIP is not recognized in blender. What should I try should be the whole monolithic upstream proprietary rocm tree in /opt/rocm/hip to see if there is a different behaviour, in comparison.
Comment 22 PC LX 2024-01-02 19:20:27 CET
Installation worked without issues but I'm having difficulties getting it to actually work.

I'm getting the following error when using hipcc to compile some HIP examples.

"""
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
"""

According to hipconfig the HIP runtime is "rocclr" but I can't find that file anywhere in the testing packages or in the repositories.
"""
HIP_RUNTIME  : rocclr
"""


Is the HIP runtime missing or am I doing something wrong?



System: Mageia 9, x86_64, AMD Ryzen 5 5600G with Radeon Graphics using amdgpu driver.




# uname -a
Linux jupiter 6.5.13-desktop-6.mga9 #1 SMP PREEMPT_DYNAMIC Sun Dec 17 22:42:25 UTC 2023 x86_64 GNU/Linux
# lscpu | grep name
Model name:                         AMD Ryzen 5 5600G with Radeon Graphics
BIOS Model name:                    AMD Ryzen 5 5600G with Radeon Graphics          Unknown CPU @ 3.9GHz
# lspci | grep VGA
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 24 [Radeon RX 6400/6500 XT/6500M] (rev c1)
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c9)
# urpmi rocm-hip
To satisfy dependencies, the following packages are going to be installed:
  Package                        Version      Release       Arch    
(medium "QA Testing (64-bit)")
  lib64hsakmt-devel              1.0.6        0.5.7.1.2.mg> x86_64  
  lib64rocm-compilersupport-dev> 5.7.1        1.mga9        x86_64  
  lib64rocm-hip-devel            5.7.1        1.1.mga9      x86_64  
  lib64rocm-hip5                 5.7.1        1.1.mga9      x86_64  
  lib64rocm-runtime-devel        5.7.1        1.1.mga9      x86_64  
  lib64rocm-runtime1             5.7.1        1.1.mga9      x86_64  
  rocm-hip                       5.7.1        1.1.mga9      x86_64  
(medium "Core Release")
  clang                          15.0.6       5.mga9        x86_64  
  lib64numa-devel                2.0.16       1.mga9        x86_64  
  lib64pciaccess-devel           0.17         1.mga9        x86_64  
  libstdc++-static-devel         12.3.0       3.mga9        x86_64  
  perl-URI-Encode                1.1.1        4.mga9        noarch  
(medium "Core Updates")
  lib64drm-devel                 2.4.116      2.mga9        x86_64  
114MB of additional disk space will be used.
26MB of packages will be retrieved.
Proceed with the installation of the 13 packages? (Y/n)
<SNIP>

$ export HIPCC_VERBOSE=10
$ export DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode
$ hipconfig --full
HIP version  : 5.7.31921-

== hipconfig
HIP_PATH     : /usr
ROCM_PATH    : /usr
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME  : rocclr
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/usr/include -I/usr/lib64/clang/15.0.6
 

== hip-clang
HIP_CLANG_PATH   : /usr/bin
clang version 15.0.6 (Mageia 15.0.6-5.mga9)
Target: x86_64-mageia-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
LLVM (http://llvm.org/):
  LLVM version 15.0.6
  Optimized build.
  Default target: x86_64-mageia-linux-gnu
  Host CPU: znver3

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_32 - AArch64 (little endian ILP32)
    aarch64_be - AArch64 (big endian)
    amdgcn     - AMD GCN GPUs
    arm        - ARM
    arm64      - ARM64 (little endian)
    arm64_32   - ARM64 (little endian ILP32)
    armeb      - ARM (big endian)
    bpf        - BPF (host endian)
    bpfeb      - BPF (big endian)
    bpfel      - BPF (little endian)
    nvptx      - NVIDIA PTX 32-bit
    nvptx64    - NVIDIA PTX 64-bit
    r600       - AMD GPUs HD2XXX-HD6XXX
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    wasm32     - WebAssembly 32-bit
    wasm64     - WebAssembly 64-bit
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
hip-clang-cxxflags : HIP_PATH=/usr
HIP_PLATFORM=amd
HIP_COMPILER=clang
HIP_RUNTIME=rocclr
ROCM_PATH=/usr
HIP_ROCCLR_HOME=/usr
HIP_CLANG_PATH=/usr/bin
HIP_INCLUDE_PATH=/usr/include
HIP_LIB_PATH=/usr/lib
DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode
HIP_CLANG_RT_LIB=/usr/lib64/clang/15.0.6/lib/linux
 -isystem "/usr/include" -O3 --hip-path="/usr"
hip-clang-ldflags  : HIP_PATH=/usr
HIP_PLATFORM=amd
HIP_COMPILER=clang
HIP_RUNTIME=rocclr
ROCM_PATH=/usr
HIP_ROCCLR_HOME=/usr
HIP_CLANG_PATH=/usr/bin
HIP_INCLUDE_PATH=/usr/include
HIP_LIB_PATH=/usr/lib
DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode
HIP_CLANG_RT_LIB=/usr/lib64/clang/15.0.6/lib/linux
 -O3 --hip-path="/usr" --hip-link --rtlib=compiler-rt -unwindlib=libgcc

=== Environment Variables
PATH=/opt/emsdk:/opt/emsdk/upstream/emscripten:/opt/emsdk/node/14.18.2_64bit/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/local/sbin:/usr/sbin:/usr/lib64/qt5/bin:/usr/lib64/qt6/bin:/home/pclx/bin
egrep: warning: egrep is obsolescent; using grep -E
HIPCC_VERBOSE=10
HIP_PATH=/usr
HIP_PLATFORM=amd

== Linux Kernel
Hostname     : jupiter
Linux jupiter 6.5.13-desktop-6.mga9 #1 SMP PREEMPT_DYNAMIC Sun Dec 17 22:42:25 UTC 2023 x86_64 GNU/Linux
LSB Version:    *
Distributor ID: Mageia
Description:    Mageia 9
Release:        9
Codename:       mga9

# hipcc --version
HIP version: 5.7.31921-
clang version 15.0.6 (Mageia 15.0.6-5.mga9)
Target: x86_64-mageia-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ rocm_agent_enumerator 
gfx000
gfx90c

$ strace -o /tmp/hipcc.strace hipcc -v -isystem "/usr/include" -O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc vectoradd_hip.cpp 
HIP_PATH=/usr
HIP_PLATFORM=amd
HIP_COMPILER=clang
HIP_RUNTIME=rocclr
ROCM_PATH=/usr
HIP_ROCCLR_HOME=/usr
HIP_CLANG_PATH=/usr/bin
HIP_INCLUDE_PATH=/usr/include
HIP_LIB_PATH=/usr/lib
DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode
HIP_CLANG_RT_LIB=/usr/lib64/clang/15.0.6/lib/linux
clang version 15.0.6 (Mageia 15.0.6-5.mga9)
Target: x86_64-mageia-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-mageia-linux-gnu/10
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-mageia-linux/12
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-mageia-linux/12
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
Comment 23 christian barranco 2024-01-02 22:02:55 CET
(In reply to Giuseppe Ghibò from comment #21)
> (In reply to christian squidf from comment #20)
> > The behavior with Blender is weird. I have never got such a thing on the 2
> > machines I used for testing. One machine as a RX6600 GPU (Navi) and the
> > other one as a Vega iGPU like yours. 
> 
> but you get it recognized at least or not?
> 

Yes, HIP is recognized with these 2 configurations and I can use Blender without any crash.

> > 
> > Rocm 5.5 was made of a fork of ROCm to be able to build it with our LLVM
> > version. OpenCL has never worked and HIP might have partly worked. 
> > 
> > ROCm 5.7.1 is now made of the upstream source, with its specific LLVM
> > version. 
> > It is one explanation why the update doesn’t work. 
> > 
> > I would recommend to clean everything up and to do again the installation.
> > If installing only rocm-hip (which will trigger all necessary deps) is not
> > enough, there is something not clean ; same with rocm-amd-opencl
> > 
> > Please do note amdgpupro-opencl-pal is the proprietary version for your
> > iGPU; not -orca. 
> > 
> > Before installing again, to check your uninstall is complete:
> > 
> 
> It was, however with amdgpupro-opencl-pal the mixbench opencl works, while
> with free driver rocm-amd-opencl it hangs.

I will really need to test this then.

> 
> BTW, I think amdgpupro-opencl-pal should also add an explicit Conflicts with
> rocm-amd-opencl as they are mutually exclusives.
> 

Yes, it makes sense. I will adjust this.

> What causes also blender hangs is the installation of lib64rocm-hip-devel
> (which retrieve also lib64hsakmt1 and lib64hsakmt-devel).
> 

Have you cleaned up any /usr/lib64/libhsakmt.so* and /usr/include/hsakmt* and /usr/lib64/cmake/hsakmt* before updating to 5.7.1?
Comment 24 christian barranco 2024-01-02 22:18:05 CET
(In reply to PC LX from comment #22)
> Installation worked without issues but I'm having difficulties getting it to
> actually work.
> 
> I'm getting the following error when using hipcc to compile some HIP
> examples.
> 
> """
> clang-15: error: cannot find HIP runtime; provide its path via
> '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
> """
> 
> According to hipconfig the HIP runtime is "rocclr" but I can't find that
> file anywhere in the testing packages or in the repositories.
> """
> HIP_RUNTIME  : rocclr
> """
Thank you for your thorough test. Actually, you are going beyond what I thought people will go :) I mean using it to compile programs; it is very good.

rocclr is part of the source used to build rocm-hip. I don't think it is the issue but I can be wrong. 
The thing is that the ROCm stack is built with a rocm-llvm 17 version (package rocm-llvm).
I didn't want to get this rocm-llvm package installed while installing rocm-hip because it is a heavy one; I don't think it is required either, "just" to run Blender with HIP.
But, I might need to because it looks like clang 15 cannot find HIP.

Could you share an example (and how) of hipcc compilation, in order I test it?


BTW, does Blender work for you when you configure it with HIP?
Comment 25 PC LX 2024-01-03 17:14:33 CET
I never used blender so the tests are going to be the bare minimum.

Using blender with HIP results in a "BUG: kernel NULL pointer dereference".

WARNING: save all work before trying the following steps as the desktop can become unresponsive.

Steps to reproduce:

- Run in a terminal the command "dmesg --follow-new" to see the kernel messages;

- Run in another terminal the command "blender --debug-all";

- In Blender, go to menu "Edit" > menu "Preferences" > button "System" > button "HIP"

Blender will output:
"""
I0103 14:44:02.028831 27628 device.cpp:32] HIPEW initialization succeeded
I0103 14:44:02.030375 27628 device.cpp:38] Found HIPCC hipcc
I0103 14:44:02.040884 27628 device.cpp:175] Device has compute preemption or is not used for display.
I0103 14:44:02.040907 27628 device.cpp:178] Added device "AMD Radeon Graphics" with id "HIP_AMD Radeon Graphics_0000:0c:00".
"""

- After that, changing the blender preferences window size or closing the preferences window will result in a "BUG: kernel NULL pointer dereference" in dmesg.
  It may also cause the Xorg process and the desktop as a consequence to become unresponsive. It does not happen every time but a few tries always ended up in a unresponsive desktop.
  The system can be connected to using ssh but the Xorg process can't even be SIGKILLed so a reboot is needed to get the desktop usable again.

"""
$ dmesg --follow-new
[18844.852016] BUG: kernel NULL pointer dereference, address: 0000000000000188
[18844.852022] #PF: supervisor write access in kernel mode
[18844.852025] #PF: error_code(0x0002) - not-present page
[18844.852027] PGD 2ade26067 P4D 2ade26067 PUD 2c0557067 PMD 0 
[18844.852032] Oops: 0002 [#1] PREEMPT SMP NOPTI
[18844.852036] CPU: 3 PID: 23904 Comm: blender Not tainted 6.5.13-desktop-6.mga9 #1
[18844.852039] Hardware name: ASUS System Product Name/TUF GAMING B450-PLUS II, BIOS 3802 04/28/2022
[18844.852041] RIP: 0010:amdgpu_gmc_set_pte_pde+0x1f/0x30 [amdgpu]
[18844.852213] Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 b8 00 f0 ff ff ff ff 00 00 48 21 c1 8d 04 d5 00 00 00 00 4c 09 c1 48 01 c6 <48> 89 0e 31 c0 e9 67 86 75 ec 0f 1f 80 00 00 00 00 90 90 90 90 90
[18844.852216] RSP: 0018:ffffab7718d57a28 EFLAGS: 00010206
[18844.852218] RAX: 0000000000000000 RBX: 0000000000200000 RCX: 0040000000000480
[18844.852220] RDX: 0000000000000000 RSI: 0000000000000188 RDI: ffff91a613880000
[18844.852222] RBP: ffffab7718d57b90 R08: 0040000000000480 R09: 0000000000200000
[18844.852224] R10: ffff91a7b2a88000 R11: 0000000000000009 R12: 0000000000200000
[18844.852225] R13: 0000000000000001 R14: 0000000000000188 R15: 0000000000000001
[18844.852227] FS:  00007ff3b2802800(0000) GS:ffff91ad1e4c0000(0000) knlGS:0000000000000000
[18844.852229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18844.852231] CR2: 0000000000000188 CR3: 0000000208e12000 CR4: 0000000000750ee0
[18844.852233] PKRU: 55555554
[18844.852235] Call Trace:
[18844.852237]  <TASK>
[18844.852242]  ? __die+0x1f/0x70
[18844.852246]  ? page_fault_oops+0x159/0x450
[18844.852250]  ? srso_alias_return_thunk+0x5/0x7f
[18844.852255]  ? newidle_balance+0x2ee/0x420
[18844.852261]  ? exc_page_fault+0x73/0x170
[18844.852265]  ? asm_exc_page_fault+0x22/0x30
[18844.852272]  ? amdgpu_gmc_set_pte_pde+0x1f/0x30 [amdgpu]
[18844.852418]  ? srso_alias_return_thunk+0x5/0x7f
[18844.852421]  amdgpu_vm_cpu_update+0x8e/0x100 [amdgpu]
[18844.852570]  amdgpu_vm_ptes_update+0x2e8/0x8f0 [amdgpu]
[18844.852717]  amdgpu_vm_update_range+0x211/0x6f0 [amdgpu]
[18844.852862]  amdgpu_vm_clear_freed+0x106/0x240 [amdgpu]
[18844.853006]  amdgpu_gem_va_ioctl+0x3ad/0x500 [amdgpu]
[18844.853147]  ? srso_alias_return_thunk+0x5/0x7f
[18844.853150]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[18844.853289]  drm_ioctl_kernel+0xc1/0x160 [drm]
[18844.853321]  drm_ioctl+0x24c/0x490 [drm]
[18844.853347]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
[18844.853486]  ? update_blocked_averages+0x5bf/0x770
[18844.853489]  ? srso_alias_return_thunk+0x5/0x7f
[18844.853492]  ? srso_alias_return_thunk+0x5/0x7f
[18844.853495]  ? rebalance_domains+0xf3/0x3c0
[18844.853500]  amdgpu_drm_ioctl+0x4a/0x80 [amdgpu]
[18844.853632]  __x64_sys_ioctl+0x93/0xd0
[18844.853637]  do_syscall_64+0x3a/0x90
[18844.853641]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[18844.853645] RIP: 0033:0x7ff3c2321e58
[18844.853648] Code: 00 00 48 8d 44 24 08 48 89 54 24 e0 48 89 44 24 c0 48 8d 44 24 d0 48 89 44 24 c8 b8 10 00 00 00 c7 44 24 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 07 89 d0 c3 0f 1f 40 00 48 8b 15 81 ef 0c
[18844.853650] RSP: 002b:00007fffbcf4fdd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[18844.853653] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff3c2321e58
[18844.853655] RDX: 00007fffbcf4fe20 RSI: 00000000c0286448 RDI: 0000000000000007
[18844.853656] RBP: 00007fffbcf4fe20 R08: ffff800106200000 R09: 000000000000000e
[18844.853658] R10: 0000000000000021 R11: 0000000000000246 R12: 00000000c0286448
[18844.853659] R13: 0000000000000007 R14: 0000000000250000 R15: 0000000000000002
[18844.853664]  </TASK>
[18844.853665] Modules linked in: tls ip6t_REJECT nf_reject_ipv6 xt_comment ip6table_raw xt_recent ipt_IFWLOG ipt_psd xt_set ip_set_hash_ip ip_set xt_multiport xt_hashlimit xt_addrtype xt_mark xt_CT iptable_raw xt_NFLOG nfnetlink_log xt_LOG nf_log_syslog nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp af_packet xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter binfmt_misc nls_utf8 nls_cp437 vfat fat intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic uvcvideo snd_hda_codec_hdmi uvc videobuf2_vmalloc kvm_amd videobuf2_memops
[18844.853730]  snd_hda_intel videobuf2_v4l2 snd_intel_dspcfg snd_intel_sdw_acpi snd_usb_audio snd_hda_codec videodev snd_usbmidi_lib r8169 snd_ump videobuf2_common kvm snd_rawmidi snd_hda_core snd_seq_device eeepc_wmi snd_hwdep mc asus_wmi snd_pcm ledtrig_audio sparse_keymap rapl realtek platform_profile wmi_bmof snd_timer mdio_devres snd k10temp libphy soundcore i2c_piix4 wireguard curve25519_x86_64 input_leds libchacha20poly1305 chacha_x86_64 joydev poly1305_x86_64 libcurve25519_generic gpio_amdpt libchacha ip6_udp_tunnel tpm_crb udp_tunnel tpm_tis gpio_generic tpm_tis_core bridge stp llc evdev cfg80211 rfkill zram msr fuse loop configfs efivarfs dmi_sysfs ip_tables x_tables ipv6 crc_ccitt dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio hid_dr ff_memless dm_crypt trusted asn1_encoder tee tpm crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel ccp crypto_simd cryptd sp5100_tco sha1_generic xhci_pci xhci_pci_renesas amdgpu
[18844.853810]  i2c_algo_bit drm_ttm_helper ttm drm_suballoc_helper amdxcp iommu_v2 drm_buddy gpu_sched drm_display_helper drm_kms_helper video wmi drm cec rc_core dm_mirror dm_region_hash dm_log dm_mod vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
[18844.853835] CR2: 0000000000000188
[18844.853838] ---[ end trace 0000000000000000 ]---
[18844.853839] RIP: 0010:amdgpu_gmc_set_pte_pde+0x1f/0x30 [amdgpu]
[18844.853985] Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 b8 00 f0 ff ff ff ff 00 00 48 21 c1 8d 04 d5 00 00 00 00 4c 09 c1 48 01 c6 <48> 89 0e 31 c0 e9 67 86 75 ec 0f 1f 80 00 00 00 00 90 90 90 90 90
[18844.853987] RSP: 0018:ffffab7718d57a28 EFLAGS: 00010206
[18844.853990] RAX: 0000000000000000 RBX: 0000000000200000 RCX: 0040000000000480
[18844.853991] RDX: 0000000000000000 RSI: 0000000000000188 RDI: ffff91a613880000
[18844.853993] RBP: ffffab7718d57b90 R08: 0040000000000480 R09: 0000000000200000
[18844.853995] R10: ffff91a7b2a88000 R11: 0000000000000009 R12: 0000000000200000
[18844.853996] R13: 0000000000000001 R14: 0000000000000188 R15: 0000000000000001
[18844.853998] FS:  00007ff3b2802800(0000) GS:ffff91ad1e4c0000(0000) knlGS:0000000000000000
[18844.854000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18844.854002] CR2: 0000000000000188 CR3: 0000000208e12000 CR4: 0000000000750ee0
[18844.854003] PKRU: 55555554
[18844.854005] note: blender[23904] exited with irqs disabled
"""
Comment 26 PC LX 2024-01-03 17:39:14 CET
(In reply to christian squidf from comment #24)
> Could you share an example (and how) of hipcc compilation, in order I test
> it?

I was not successful at compiling any example but I think the info bellow should help.

Set DEVICE_LIB_PATH to the path in the package rocm-device-libs-5.7.1-1.mga9:

$ export DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode

Get the examples from github:

$ git clone https://github.com/amd/rocm-examples.git

Go to one of the examples:

$ cd ./rocm-examples/Libraries/rocThrust/vectors

Edit the Makefile and change the line with ROCM_INSTALL_DIR to the following:

ROCM_INSTALL_DIR := /usr


Run make to try to compile the example. The following is the output I get.

$ make
/usr/bin/hipcc -std=c++17 -Wall -Wextra -I ../../../Common -isystem /usr/include -D__HIP_PLATFORM_AMD__   -o rocthrust_vectors main.hip 
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
make: *** [Makefile:48: rocthrust_vectors] Error 1


Run "hipconfig --runtime" to see what the runtime is called.

$ hipconfig --runtime
rocclr


That is as far as I got. Hope it helps.
Comment 27 christian barranco 2024-01-06 23:54:21 CET
Hi. Thanks for your tests, it gave me some fruits for thoughts.

First of all, I replicate the issue you have both faced with Blender, while using a hardware with integrated GPU (Ryzen 5 7530U, iGPU Barcelo GCN-5 Vega).
My first test might have been way too quick on the laptop. I am not a blender user.
Base on my readings, it might be so integrated GPU are not supported... ; yet to be confirmed.
https://github.com/ROCm/HIP/issues/3353
Or it might be a Blender issue? or both?

That being said, the HIP package was not fully ready, especially because some env variable settings were missing. I just updated rocm-hip and, now, I succeeded to build mixbench-hip.
@Giuseppe: I don't think you have been able to build it so far. Have you?

I succeeded also to build mixbench-opencl. Actually, I have a package for both mixbench-opencl and mixbench-hip ready. I have built it locally with mock but our BS doesn't like it. I am trying to sort out why on the DEV ML.
If you want to build it as well, do keep in mind you will have to install rocm-llvm-static


@PC LX: the rocm-example suite seems buggy to me. It has many hardcoded path and some tests require additional packages (like rocThrust). 

So, following your path, I rather started to look at https://github.com/ROCm/hip-tests/tree/develop
Here as well, additional HIP modules (like https://github.com/ROCm/HIPIFY) are sometimes to time required. I could build some more but where to stop? What is the minimum needed? What are the users' needs?

I finally found a test which doesn't require additional packages:
https://github.com/ROCm/hip-tests/tree/develop/samples/1_Utils/hipInfo

Please, do note, I am now with ROCm 6.0.0. What I am writing below should work with 5.7.1 though. If not, apologies and I will downgrade to test.
I am not updating to 6.0.0 yet our repo because of the python migration which seems not to be over yet.

So, to build hipInfo, after cloning it or downloading https://github.com/ROCm/hip-tests/archive/refs/tags/rocm-5.7.0.tar.gz:
$ hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo
$ cmake -Wno-dev -S ".." -B "."
$ cmake --build "." --verbose
To test it, run:
$ ./hipInfo

With my external GPU, I get:
```
$ ./hipInfo
--------------------------------------------------------------------------------
device#                           0
Name:                             AMD Radeon RX 6600
pciBusID:                         9
pciDeviceID:                      0
pciDomainID:                      0
multiProcessorCount:              14
maxThreadsPerMultiProcessor:      2048
isMultiGpuBoard:                  0
clockRate:                        2750 Mhz
memoryClockRate:                  875 Mhz
memoryBusWidth:                   128
totalGlobalMem:                   7.98 GB
totalConstMem:                    2147483647
sharedMemPerBlock:                64.00 KB
canMapHostMemory:                 1
regsPerBlock:                     65536
warpSize:                         32
l2CacheSize:                      2097152
computeMode:                      0
maxThreadsPerBlock:               1024
maxThreadsDim.x:                  1024
maxThreadsDim.y:                  1024
maxThreadsDim.z:                  1024
maxGridSize.x:                    2147483647
maxGridSize.y:                    65536
maxGridSize.z:                    65536
major:                            10
minor:                            3
concurrentKernels:                1
cooperativeLaunch:                1
cooperativeMultiDeviceLaunch:     1
isIntegrated:                     0
maxTexture1D:                     16384
maxTexture2D.width:               16384
maxTexture2D.height:              16384
maxTexture3D.width:               16384
maxTexture3D.height:              16384
maxTexture3D.depth:               8192
isLargeBar:                       0
asicRevision:                     0
maxSharedMemoryPerMultiProcessor: 64.00 KB
clockInstructionRate:             1000.00 Mhz
arch.hasGlobalInt32Atomics:       1
arch.hasGlobalFloatAtomicExch:    1
arch.hasSharedInt32Atomics:       1
arch.hasSharedFloatAtomicExch:    1
arch.hasFloatAtomicAdd:           1
arch.hasGlobalInt64Atomics:       1
arch.hasSharedInt64Atomics:       1
arch.hasDoubles:                  1
arch.hasWarpVote:                 1
arch.hasWarpBallot:               1
arch.hasWarpShuffle:              1
arch.hasFunnelShift:              0
arch.hasThreadFenceSystem:        1
arch.hasSyncThreadsExt:           0
arch.hasSurfaceFuncs:             0
arch.has3dGrid:                   1
arch.hasDynamicParallelism:       0
gcnArchName:                      gfx1032
peers:                            
non-peers:                        device#0 

memInfo.total:                    7.98 GB
memInfo.free:                     7.93 GB (99%)
```
I have not been able to test it on my laptop with integrated GPU yet.


Being able now to build mixbench-hip and hipInfo makes me confident the settings are correct. The open question might still be do I need to package more ROCm odules?


To summarize:

*With my external GPU RX6600 (arch: RDNA-2 code: Navi-2x):
-OpenCL works.
-Blender works with HIP activated.
-mixbench-opencl gives the attached file: mixbench-opencl-rx6600.txt
-mixbench-hip gives the attached file: mixbench-hip-rx6600.txt

*On my laptop, Ryzen 5 7530U, iGPU Barcelo GCN-5 Vega:
-OpenCL works
-Blender crashes with HIP activated (iGPU support issue?)
-mixbench-opencl gives the attached file: mixbench-opencl-7530U.txt
   I have run it multiple times. Sometimes, it takes longer to start or to complete. But if you wait 20 to 30 seconds, it goes until the end.
-mixbenc-hip doesn't crash but does not compute: mixbench-hip-7530U.txt

Seeking for your feedbacks.Thanks.
Comment 28 christian barranco 2024-01-06 23:56:25 CET
Created attachment 14252 [details]
mixbench opencl test for RX6600 GPU
Comment 29 christian barranco 2024-01-06 23:56:47 CET
Created attachment 14253 [details]
mixbench HIP test for RX6600 GPU
Comment 30 christian barranco 2024-01-06 23:57:20 CET
Created attachment 14254 [details]
mixbench opencl test for 7530U iGPU
Comment 31 christian barranco 2024-01-06 23:57:46 CET
Created attachment 14255 [details]
mixbench HIP test for 7530U iGPU
Comment 32 christian barranco 2024-01-07 00:01:08 CET
ADVISORY NOTICE PROPOSAL
========================
ROCm HIP 5.7.1


Description
Bring HIP support for AMD GPU, thanks to ROCm stack 5.7.1
Insights on compatible GPU at https://salsa.debian.org/rocm-team/community/team-project/-/wikis/supported-gpu-list
                
References
https://bugs.mageia.org/show_bug.cgi?id=32646
https://github.com/ROCm-Developer-Tools/HIP
https://github.com/ROCm-Developer-Tools/HIPCC
https://github.com/ROCm-Developer-Tools/clr
https://senexcanis.com/open-source/cppheaderparser/
https://salsa.debian.org/rocm-team/community/team-project/-/wikis/supported-gpu-list

SRPMS
9/core
  rocm-hip-5.7.1-1.2.mga9.src.rpm
  python-cppheaderparser-2.7.4-1.mga9.src.rpm


PACKAGES FOR QA TESTING
=======================
x86_64:

rocm-hip-5.7.1-1.2.mga9.x86_64.rpm
lib64rocm-hip-devel-5.7.1-1.2.mga9.x86_64.rpm
lib64rocm-hip5-5.7.1-1.2.mga9.x86_64.rpm
python3-cppheaderparser-2.7.4-1.mga9.noarch.rpm
lib64rocm-runtime-devel-5.7.1-1.1.mga9.x86_64.rpm
lib64rocm-runtime1-5.7.1-1.1.mga9.x86_64.rpm
lib64rocm-compilersupport-devel-5.7.1-1.mga9.x86_64.rpm
lib64rocm-compilersupport5.7.1-5.7.1-1.mga9.x86_64.rpm
rocm-device-libs-5.7.1-1.mga9.x86_64.rpm
lib64hsakmt-devel-1.0.6-0.5.7.1.2.mga9.x86_64.rpm
lib64hsakmt1-1.0.6-0.5.7.1.2.mga9.x86_64.rpm

Keywords: advisory => (none)

Comment 33 Marja Van Waes 2024-01-07 22:37:23 CET
The advisory has been updated with rocm-hip-5.7.1-1.2.mga9 SRPM, so to the version in comment 32

Keywords: (none) => advisory

Comment 34 PC LX 2024-01-08 16:26:04 CET
(In reply to christian squidf from comment #27)
> Base on my readings, it might be so integrated GPU are not supported... ;
> yet to be confirmed.
> https://github.com/ROCm/HIP/issues/3353
> Or it might be a Blender issue? or both?

At the very least there is a kernel bug. The kernel should not be doing NULL pointer dereferences even if there are bugs in the user binaries. Maybe reporting upstream would be helpful. Never done a bug report but will look in to it.
Comment 35 PC LX 2024-01-08 16:52:40 CET
(In reply to christian squidf from comment #27)
> @PC LX: the rocm-example suite seems buggy to me. It has many hardcoded path
> and some tests require additional packages (like rocThrust). 

Yes, the paths in the build script are incorrect (I had to fix them as I mentioned in comment 26) but the error I mentioned are not related to that.

> I finally found a test which doesn't require additional packages:
> https://github.com/ROCm/hip-tests/tree/develop/samples/1_Utils/hipInfo
> 
> Please, do note, I am now with ROCm 6.0.0. What I am writing below should
> work with 5.7.1 though. If not, apologies and I will downgrade to test.
> I am not updating to 6.0.0 yet our repo because of the python migration
> which seems not to be over yet.
> 
> So, to build hipInfo, after cloning it or downloading
> https://github.com/ROCm/hip-tests/archive/refs/tags/rocm-5.7.0.tar.gz:
> $ hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo
> $ cmake -Wno-dev -S ".." -B "."
> $ cmake --build "." --verbose
> To test it, run:
> $ ./hipInfo

I'm getting cmake errors (see next) so will have to try a debug the issue.

$ find /usr/ -ipath '*/Findhip.cmake' 2> /dev/null
/usr/share/cmake/Modules/hip/FindHIP.cmake

I have it installed. Will have to investigate further.


"""
CMake Warning at CMakeLists.txt:47 (find_package):
  By not providing "Findhip.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "hip", but
  CMake did not find one.

  Could not find a package configuration file provided by "hip" with any of
  the following names:

    hipConfig.cmake
    hip-config.cmake

  Add the installation prefix of "hip" to CMAKE_PREFIX_PATH or set "hip_DIR"
  to a directory containing one of the above files.  If "hip" provides a
  separate development package or SDK, be sure it has been installed.


-- Configuring done (0.0s)
CMake Error at CMakeLists.txt:58 (target_link_libraries):
  Target "hipInfo" links to:

    hip::host

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

"""
Comment 36 christian barranco 2024-01-08 21:52:10 CET
Hi PC LX. Have you updated rocm-hip to version 1.2?
Comment 37 PC LX 2024-01-08 23:10:53 CET
(In reply to christian squidf from comment #36)
> Hi PC LX. Have you updated rocm-hip to version 1.2?

Now I am.

"""
$ rpm -qa | grep rocm | sort
lib64rocm-compilersupport5.7.1-5.7.1-1.mga9
lib64rocm-compilersupport-devel-5.7.1-1.mga9
lib64rocm-hip5-5.7.1-1.2.mga9
lib64rocm-hip-devel-5.7.1-1.2.mga9
lib64rocm-opencl-runtime5.7-5.7.1-3.1.mga9
lib64rocm-opencl-runtime-devel-5.7.1-3.1.mga9
lib64rocm-runtime1-5.7.1-1.1.mga9
lib64rocm-runtime-devel-5.7.1-1.1.mga9
procmail-3.24-1.mga9
rocm-amd-opencl-5.7.1-3.1.mga9
rocm-clinfo-5.7.1-3.1.mga9
rocm-core-5.7.1-1.mga9
rocm-device-libs-5.7.1-1.mga9
rocm-hip-5.7.1-1.2.mga9
rocminfo-5.7.1-1.1.mga9
"""


After that cmake setup shows no errors but cmake build shows errors. See below.

"""
$ cmake -S .. -B . -Wno-dev
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- hip::amdhip64 is SHARED_LIBRARY
-- /usr/bin/c++: CLANGRT compiler options not supported.
-- Configuring done (0.4s)
-- Generating done (0.0s)
-- Build files have been written to: /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build
$ cmake --build . --verbose
/usr/bin/cmake -S/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo -B/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build/CMakeFiles /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build//CMakeFiles/progress.marks
/usr/bin/gmake  -f CMakeFiles/Makefile2 all
gmake[1]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
/usr/bin/gmake  -f CMakeFiles/hipInfo.dir/build.make CMakeFiles/hipInfo.dir/depend
gmake[2]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
cd /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build/CMakeFiles/hipInfo.dir/DependInfo.cmake --color=
gmake[2]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
/usr/bin/gmake  -f CMakeFiles/hipInfo.dir/build.make CMakeFiles/hipInfo.dir/build
gmake[2]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
[ 50%] Building CXX object CMakeFiles/hipInfo.dir/hipInfo.cpp.o
/usr/bin/hipcc -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1  -O3 -DNDEBUG -MD -MT CMakeFiles/hipInfo.dir/hipInfo.cpp.o -MF CMakeFiles/hipInfo.dir/hipInfo.cpp.o.d -o CMakeFiles/hipInfo.dir/hipInfo.cpp.o -c /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/hipInfo.cpp
clang-15: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
gmake[2]: *** [CMakeFiles/hipInfo.dir/build.make:76: CMakeFiles/hipInfo.dir/hipInfo.cpp.o] Error 1
gmake[2]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/hipInfo.dir/all] Error 2
gmake[1]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
gmake: *** [Makefile:91: all] Error 2
"""


Using ccmake I see the following:


"""
 AMDDeviceLibs_DIR                /usr/lib64/cmake/AMDDeviceLibs
 AMDGPU_TARGETS                   
 CMAKE_BUILD_TYPE                 
 CMAKE_INSTALL_PREFIX             /usr/local
 GPU_TARGETS                      
 HIPINFO_INTERNAL_BUILD           OFF
 ROCM_PATH                        /opt/rocm
 amd_comgr_DIR                    /usr/lib64/cmake/amd_comgr
 hip_DIR                          /usr/lib64/cmake/hip
 hsa-runtime64_DIR                /usr/lib64/cmake/hsa-runtime64
"""


At least ROCM_PATH so I tried setting it explicitly using the value from hipconfig.


"""
$ hipconfig | grep ROCM_PATH
ROCM_PATH    : /usr
$ cmake -S .. -B . -Wno-dev -DROCM_PATH=/usr
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- hip::amdhip64 is SHARED_LIBRARY
-- /usr/bin/c++: CLANGRT compiler options not supported.
-- Configuring done (0.4s)
-- Generating done (0.0s)
-- Build files have been written to: /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build
$ cmake --build . --verbose
/usr/bin/cmake -S/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo -B/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build/CMakeFiles /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build//CMakeFiles/progress.marks
/usr/bin/gmake  -f CMakeFiles/Makefile2 all
gmake[1]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
/usr/bin/gmake  -f CMakeFiles/hipInfo.dir/build.make CMakeFiles/hipInfo.dir/depend
gmake[2]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
cd /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build/CMakeFiles/hipInfo.dir/DependInfo.cmake --color=
gmake[2]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
/usr/bin/gmake  -f CMakeFiles/hipInfo.dir/build.make CMakeFiles/hipInfo.dir/build
gmake[2]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
[ 50%] Building CXX object CMakeFiles/hipInfo.dir/hipInfo.cpp.o
/usr/bin/hipcc -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1  -O3 -DNDEBUG -MD -MT CMakeFiles/hipInfo.dir/hipInfo.cpp.o -MF CMakeFiles/hipInfo.dir/hipInfo.cpp.o.d -o CMakeFiles/hipInfo.dir/hipInfo.cpp.o -c /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/hipInfo.cpp
clang-15: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
gmake[2]: *** [CMakeFiles/hipInfo.dir/build.make:76: CMakeFiles/hipInfo.dir/hipInfo.cpp.o] Error 1
gmake[2]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/hipInfo.dir/all] Error 2
gmake[1]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
gmake: *** [Makefile:91: all] Error 2
"""


But as you can see the result is the same.


I tried directly calling hipcc with "--rocm-path=/usr --rocm-device-lib-path=/usr/lib64/amdgcn/bitcode" added but now I get a different error.

$ /usr/bin/hipcc -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1  -O3 -DNDEBUG -MD -MT CMakeFiles/hipInfo.dir/hipInfo.cpp.o -MF CMakeFiles/hipInfo.dir/hipInfo.cpp.o.d -o CMakeFiles/hipInfo.dir/hipInfo.cpp.o -c /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/hipInfo.cpp --rocm-path=/usr --rocm-device-lib-path=/usr/lib64/amdgcn/bitcode
fatal error: cannot open file '/usr/lib64/amdgcn/bitcode/ocml.bc': Unknown attribute kind (86) (Producer: 'LLVM17.0.0git' Reader: 'LLVM 15.0.6')
1 error generated when compiling for gfx90c.


For know I leave it at that. I will think about it more tomorrow.
Comment 38 christian barranco 2024-01-09 09:27:31 CET
Hi. You need to install rocm-llvm-static and use another terminal if you have updated using command line; or log off/ log in
Comment 39 PC LX 2024-01-09 09:59:59 CET
Installed rocm-llvm-static and now it shows the error message:
"""
cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
"""


$ rpm -qa | grep -E 'rocm|hip' | sort
lib64rocm-compilersupport5.7.1-5.7.1-1.mga9
lib64rocm-compilersupport-devel-5.7.1-1.mga9
lib64rocm-hip5-5.7.1-1.2.mga9
lib64rocm-hip-devel-5.7.1-1.2.mga9
lib64rocm-opencl-runtime5.7-5.7.1-3.1.mga9
lib64rocm-opencl-runtime-devel-5.7.1-3.1.mga9
lib64rocm-runtime1-5.7.1-1.1.mga9
lib64rocm-runtime-devel-5.7.1-1.1.mga9
procmail-3.24-1.mga9
rocm-amd-opencl-5.7.1-3.1.mga9
rocm-clinfo-5.7.1-3.1.mga9
rocm-core-5.7.1-1.mga9
rocm-device-libs-5.7.1-1.mga9
rocm-hip-5.7.1-1.2.mga9
rocminfo-5.7.1-1.1.mga9
rocm-llvm-static-5.7.1-1.mga9
$ hipconfig --runtime
rocclr
$ find / -xdev -ipath '*rocclr*' 2> /dev/null
/usr/lib64/librocclr.a
$ rpm -qf /usr/lib64/librocclr.a
lib64rocm-hip-devel-5.7.1-1.2.mga9



"""
$ cmake --build . --verbose
/usr/bin/cmake -S/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo -B/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build/CMakeFiles /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build//CMakeFiles/progress.marks
/usr/bin/gmake  -f CMakeFiles/Makefile2 all
gmake[1]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
/usr/bin/gmake  -f CMakeFiles/hipInfo.dir/build.make CMakeFiles/hipInfo.dir/depend
gmake[2]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
cd /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build/CMakeFiles/hipInfo.dir/DependInfo.cmake --color=
gmake[2]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
/usr/bin/gmake  -f CMakeFiles/hipInfo.dir/build.make CMakeFiles/hipInfo.dir/build
gmake[2]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
[ 50%] Building CXX object CMakeFiles/hipInfo.dir/hipInfo.cpp.o
/usr/bin/hipcc -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1  -O3 -DNDEBUG -MD -MT CMakeFiles/hipInfo.dir/hipInfo.cpp.o -MF CMakeFiles/hipInfo.dir/hipInfo.cpp.o.d -o CMakeFiles/hipInfo.dir/hipInfo.cpp.o -c /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/hipInfo.cpp
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
gmake[2]: *** [CMakeFiles/hipInfo.dir/build.make:76: CMakeFiles/hipInfo.dir/hipInfo.cpp.o] Error 1
gmake[2]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/hipInfo.dir/all] Error 2
gmake[1]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
gmake: *** [Makefile:91: all] Error 2
"""


Manually running the failed build command with --rocm-path=/usr produces the same result:


"""
$ /usr/bin/hipcc -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1  -O3 -DNDEBUG -MD -MT CMakeFiles/hipInfo.dir/hipInfo.cpp.o -MF CMakeFiles/hipInfo.dir/hipInfo.cpp.o.d -o CMakeFiles/hipInfo.dir/hipInfo.cpp.o -c /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/hipInfo.cpp --rocm-path=/usr
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
"""
Comment 40 christian barranco 2024-01-09 22:11:33 CET
It is like the env variable setting does not work. 

What does return:
cat /etc/profile.d/*rocm.*

and, if you open a new console:
echo $ROCM_PATH

Thanks again for all your tests
Comment 41 PC LX 2024-01-10 00:43:07 CET
The paths seem to be correct.


$ ls -la /etc/profile.d/*rocm*
-rw-r--r-- 1 root root 145 jan  6 16:26 /etc/profile.d/40rocm-hip.sh
$ rpm -qf /etc/profile.d/40rocm-hip.sh
lib64rocm-hip-devel-5.7.1-1.2.mga9
$ rpm -V $(rpm -qf /etc/profile.d/40rocm-hip.sh)
$ cat /etc/profile.d/40rocm-hip.sh
export DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode
export HIP_CLANG_PATH="/usr/lib64/rocm/llvm/bin"
export HIP_PATH="/usr"
export ROCM_PATH="/usr"
$ set | grep -E '(HIP|ROCM|DEVICE_LIB)'
DEVICE_LIB_PATH=/usr/lib64/amdgcn/bitcode
HIP_CLANG_PATH=/usr/lib64/rocm/llvm/bin
HIP_PATH=/usr
ROCM_PATH=/usr
$ cmake -S .. -B . -Wno-dev
-- The C compiler identification is GNU 12.3.0
-- The CXX compiler identification is GNU 12.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- hip::amdhip64 is SHARED_LIBRARY
-- /usr/bin/c++: CLANGRT compiler options not supported.
-- Configuring done (0.4s)
-- Generating done (0.0s)
-- Build files have been written to: /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build
$ cmake --build . --verbose
/usr/bin/cmake -S/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo -B/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build/CMakeFiles /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build//CMakeFiles/progress.marks
/usr/bin/gmake  -f CMakeFiles/Makefile2 all
gmake[1]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
/usr/bin/gmake  -f CMakeFiles/hipInfo.dir/build.make CMakeFiles/hipInfo.dir/depend
gmake[2]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
cd /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build/CMakeFiles/hipInfo.dir/DependInfo.cmake --color=
gmake[2]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
/usr/bin/gmake  -f CMakeFiles/hipInfo.dir/build.make CMakeFiles/hipInfo.dir/build
gmake[2]: a entrar na pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
[ 50%] Building CXX object CMakeFiles/hipInfo.dir/hipInfo.cpp.o
/usr/bin/hipcc -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1  -O3 -DNDEBUG -MD -MT CMakeFiles/hipInfo.dir/hipInfo.cpp.o -MF CMakeFiles/hipInfo.dir/hipInfo.cpp.o.d -o CMakeFiles/hipInfo.dir/hipInfo.cpp.o -c /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/hipInfo.cpp
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
gmake[2]: *** [CMakeFiles/hipInfo.dir/build.make:76: CMakeFiles/hipInfo.dir/hipInfo.cpp.o] Error 1
gmake[2]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/hipInfo.dir/all] Error 2
gmake[1]: a sair da pasta "/home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/build"
gmake: *** [Makefile:91: all] Error 2
Comment 42 PC LX 2024-01-10 00:59:10 CET
I have run the command that fails in strace. The strace output is in attachment.

"""""
# LANGUAGE=C strace --output-separately --follow-forks -o hipcc.strace /usr/bin/hipcc -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1  -O3 -DNDEBUG -MD -MT CMakeFiles/hipInfo.dir/hipInfo.cpp.o -MF CMakeFiles/hipInfo.dir/hipInfo.cpp.o.d -o CMakeFiles/hipInfo.dir/hipInfo.cpp.o -c /home/pclx/tmp/hip-tests-rocm-5.7.0/samples/1_Utils/hipInfo/hipInfo.cpp
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang-15: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
"""""

I then looked for the "cannot find HIP runtime" string to see where it is failing.

I see a few lines that may be relevant.

This one is strange:
"""""
openat(AT_FDCWD, "/usr/bin/.hipVersion", O_RDONLY|O_CLOEXEC) = -1 ELOOP (Too many levels of symbolic links)
"""""

The file is from package lib64rocm-hip-devel-5.7.1-1.2.mga9.

Then these files that reference rocm, hip, amdgcn

"""""
$ grep ENOENT hipcc.strace.62839 | grep -Ei 'roc|hip|amd|gcn'
newfstatat(AT_FDCWD, "/usr/lib64/clang/15.0.6/lib/amdgcn-amd-amdhsa", 0x7ffcd6a64560, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/bin/../lib/amdgcn-amd-amdhsa", 0x7ffcd6a64560, 0) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/usr/lib64/clang/15.0.6/lib/amdhsa/amdgcn", 0x7ffcd6a64560, 0) = -1 ENOENT (No such file or directory)
access("/tmp/hipInfo-074d83/hipInfo-gfx90c.o", W_OK) = -1 ENOENT (No such file or directory)
access("/tmp/hipInfo-9e029b/hipInfo-gfx90c.out", W_OK) = -1 ENOENT (No such file or directory)
"""""

I'm not certain to file is the clang compiler looking for and not finding but it must be one of these:

""""""
$ grep ENOENT hipcc.strace.62839 | wc -l
202
$ grep ENOENT hipcc.strace.62839 | grep -Eo '"[^"]+"' | sort -u
"/bin/ld.lld"
"/bin/lld"
"/bin/ptxas"
"/bin/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/bin/x86_64-mageia-linux-gnu-ld.lld"
"/bin/x86_64-mageia-linux-gnu-lld"
"/etc/env.d/gcc"
"/etc/ld.so.preload"
"/home/pclx/bin/ld.lld"
"/home/pclx/bin/lld"
"/home/pclx/bin/ptxas"
"/home/pclx/bin/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/home/pclx/bin/x86_64-mageia-linux-gnu-ld.lld"
"/home/pclx/bin/x86_64-mageia-linux-gnu-lld"
"/home/pclx/.terminfo"
"/lib/x86_64-linux-gnu"
"/opt/emsdk/ld.lld"
"/opt/emsdk/lld"
"/opt/emsdk/node/14.18.2_64bit/bin/ld.lld"
"/opt/emsdk/node/14.18.2_64bit/bin/lld"
"/opt/emsdk/node/14.18.2_64bit/bin/ptxas"
"/opt/emsdk/node/14.18.2_64bit/bin/sh"
"/opt/emsdk/node/14.18.2_64bit/bin/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/opt/emsdk/node/14.18.2_64bit/bin/x86_64-mageia-linux-gnu-ld.lld"
"/opt/emsdk/node/14.18.2_64bit/bin/x86_64-mageia-linux-gnu-lld"
"/opt/emsdk/ptxas"
"/opt/emsdk/sh"
"/opt/emsdk/upstream/emscripten/ld.lld"
"/opt/emsdk/upstream/emscripten/lld"
"/opt/emsdk/upstream/emscripten/ptxas"
"/opt/emsdk/upstream/emscripten/sh"
"/opt/emsdk/upstream/emscripten/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/opt/emsdk/upstream/emscripten/x86_64-mageia-linux-gnu-ld.lld"
"/opt/emsdk/upstream/emscripten/x86_64-mageia-linux-gnu-lld"
"/opt/emsdk/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/opt/emsdk/x86_64-mageia-linux-gnu-ld.lld"
"/opt/emsdk/x86_64-mageia-linux-gnu-lld"
"/opt/rh"
"/tmp/hipInfo-074d83/hipInfo-gfx90c.o"
"/tmp/hipInfo-9e029b/hipInfo-gfx90c.out"
"/usr/bin/ld.lld"
"/usr/bin/../lib32"
"/usr/bin/../lib64/gcc"
"/usr/bin/../lib64/gcc-cross"
"/usr/bin/../lib/amdgcn-amd-amdhsa"
"/usr/bin/../lib/gcc-cross"
"/usr/bin/../lib/gcc/i386-mageia-linux-gnu"
"/usr/bin/../lib/gcc/i386-redhat-linux"
"/usr/bin/../lib/gcc/i386-redhat-linux6E"
"/usr/bin/../lib/gcc/i586-linux-gnu"
"/usr/bin/../lib/gcc/i586-mageia-linux"
"/usr/bin/../lib/gcc/i586-suse-linux"
"/usr/bin/../lib/gcc/i686-gnu"
"/usr/bin/../lib/gcc/i686-linux-gnu"
"/usr/bin/../lib/gcc/i686-montavista-linux"
"/usr/bin/../lib/gcc/i686-pc-linux-gnu"
"/usr/bin/../lib/gcc/i686-redhat-linux"
"/usr/bin/../lib/gcc/x86_64-amazon-linux"
"/usr/bin/../lib/gcc/x86_64-linux-gnu"
"/usr/bin/../lib/gcc/x86_64-linux-gnux32"
"/usr/bin/../lib/gcc/x86_64-mageia-linux/12/64/crtbegin.o"
"/usr/bin/../lib/gcc/x86_64-mageia-linux/12/../../../gcc/x86_64-mageia-linux/12/include/c++/"
"/usr/bin/../lib/gcc/x86_64-mageia-linux/12/../../../../include/x86_64-mageia-linux/c++/12"
"/usr/bin/../lib/gcc/x86_64-mageia-linux/12/../lib64"
"/usr/bin/../lib/gcc/x86_64-mageia-linux/12/x32/crtbegin.o"
"/usr/bin/../lib/gcc/x86_64-mageia-linux/12/../../../../x86_64-mageia-linux/include/c++/12"
"/usr/bin/../lib/gcc/x86_64-mageia-linux/12/../../../../x86_64-mageia-linux/lib"
"/usr/bin/../lib/gcc/x86_64-mageia-linux/12/../../../../x86_64-mageia-linux/lib/../lib64"
"/usr/bin/../lib/gcc/x86_64-mageia-linux-gnu/10/32/crtbegin.o"
"/usr/bin/../lib/gcc/x86_64-mageia-linux-gnu/10/64/crtbegin.o"
"/usr/bin/../lib/gcc/x86_64-mageia-linux-gnu/10/crtbegin.o"
"/usr/bin/../lib/gcc/x86_64-mageia-linux-gnu/10/x32/crtbegin.o"
"/usr/bin/../lib/gcc/x86_64-manbo-linux-gnu"
"/usr/bin/../lib/gcc/x86_64-pc-linux-gnu"
"/usr/bin/../lib/gcc/x86_64-pc-linux-gnux32"
"/usr/bin/../lib/gcc/x86_64-redhat-linux"
"/usr/bin/../lib/gcc/x86_64-redhat-linux6E"
"/usr/bin/../lib/gcc/x86_64-slackware-linux"
"/usr/bin/../lib/gcc/x86_64-suse-linux"
"/usr/bin/../lib/gcc/x86_64-unknown-linux"
"/usr/bin/../lib/gcc/x86_64-unknown-linux-gnu"
"/usr/bin/../lib/libc++.so"
"/usr/bin/../libx32"
"/usr/bin/../lib/x86_64-mageia-linux-gnu"
"/usr/bin/lld"
"/usr/bin/ptxas"
"/usr/bin/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/usr/bin/x86_64-mageia-linux-gnu-ld.lld"
"/usr/bin/x86_64-mageia-linux-gnu-lld"
"/usr/games/ld.lld"
"/usr/games/lld"
"/usr/games/ptxas"
"/usr/games/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/usr/games/x86_64-mageia-linux-gnu-ld.lld"
"/usr/games/x86_64-mageia-linux-gnu-lld"
"/usr/include/x86_64-linux-gnu"
"/usr/lib64/clang/15.0.6/lib/amdgcn-amd-amdhsa"
"/usr/lib64/clang/15.0.6/lib/amdhsa/amdgcn"
"/usr/lib64/clang/15.0.6/lib/linux/x86_64"
"/usr/lib64/clang/15.0.6/lib/x86_64-mageia-linux-gnu"
"/usr/lib64/qt5/bin/ld.lld"
"/usr/lib64/qt5/bin/lld"
"/usr/lib64/qt5/bin/ptxas"
"/usr/lib64/qt5/bin/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/usr/lib64/qt5/bin/x86_64-mageia-linux-gnu-ld.lld"
"/usr/lib64/qt5/bin/x86_64-mageia-linux-gnu-lld"
"/usr/lib64/qt6/bin/ld.lld"
"/usr/lib64/qt6/bin/lld"
"/usr/lib64/qt6/bin/ptxas"
"/usr/lib64/qt6/bin/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/usr/lib64/qt6/bin/x86_64-mageia-linux-gnu-ld.lld"
"/usr/lib64/qt6/bin/x86_64-mageia-linux-gnu-lld"
"/usr/lib/x86_64-linux-gnu"
"/usr/local/bin/ld.lld"
"/usr/local/bin/lld"
"/usr/local/bin/ptxas"
"/usr/local/bin/sh"
"/usr/local/bin/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/usr/local/bin/x86_64-mageia-linux-gnu-ld.lld"
"/usr/local/bin/x86_64-mageia-linux-gnu-lld"
"/usr/local/cuda"
"/usr/local/cuda-7.0"
"/usr/local/cuda-7.5"
"/usr/local/cuda-8.0"
"/usr/local/games/ld.lld"
"/usr/local/games/lld"
"/usr/local/games/ptxas"
"/usr/local/games/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/usr/local/games/x86_64-mageia-linux-gnu-ld.lld"
"/usr/local/games/x86_64-mageia-linux-gnu-lld"
"/usr/local/sbin/ld.lld"
"/usr/local/sbin/lld"
"/usr/local/sbin/ptxas"
"/usr/local/sbin/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/usr/local/sbin/x86_64-mageia-linux-gnu-ld.lld"
"/usr/local/sbin/x86_64-mageia-linux-gnu-lld"
"/usr/sbin/ld.lld"
"/usr/sbin/lld"
"/usr/sbin/ptxas"
"/usr/sbin/x86_64-mageia-linux-gnu-clang-offload-bundler"
"/usr/sbin/x86_64-mageia-linux-gnu-ld.lld"
"/usr/sbin/x86_64-mageia-linux-gnu-lld"
"""""

Hope this helps.
Comment 43 PC LX 2024-01-10 01:00:31 CET
Created attachment 14259 [details]
strace of hipcc
Comment 44 Giuseppe Ghibò 2024-01-10 16:52:06 CET
Christian, I think we need  is actually two things:

a) actually HIP is detected (by cmake scripts around) mainly as "monolithic", i.e. in /opt/rocm/hip (or some other tree). Indeed we installed as "scattered", so there is something in /usr, something in /usr/lib64, etc., however the assumption of taking HIP_PATH or ROCM_PAT=/usr is not totally detected. What we would need is a collection of soft links that resemble the whole original tree at /opt/rocm/hip in a path that we decide, e.g. /usr/lib64/rocm/; in this way we would have /usr/lib64/rocm/bin, /usr/lib64/rocm/<whatever> (or even versioned e.g. /usr/lib64/rocm-5.7.x). E.g. /usr/lib64/rocm/bin/hipcc would be a link to /usr/bin/hipcc, ditto for include files, etc.
In this way the HIP_PATH and ROCM_PATH can point to /usr/lib64/rocm, and scripts won't have problem of detection.

b) the 2nd things we need is a list of upstream RPM binaries packages that need to be repackaged in a monolithic mageia package installed in /opt/rocm/; Actually HIP is similar to Java (i.e. it's either closed and open). What to do is take all the hip rpm packages and place in a single SPEC file (which would convert upstream packages with rpm2cpio). It can be distributed SPEC only for instance. This can be interesting for comparisons, especially when there are crash cases to analyze.
Comment 45 christian barranco 2024-01-10 17:23:21 CET
Hi. 
It works with ROCm 6.0.0
I will give a try with 5.7.1
It might be so 6.0.0 allows paths more in line with the system. 

I don’t see the point to unbundle rpm if we can be build from source?
Could you elaborate Giuseppe?
christian barranco 2024-03-24 21:01:56 CET

Assignee: qa-bugs => chb0


Note You need to log in before you can comment on or make changes to this bug.