<![CDATA[Multiwfn forum / Patch: omp collapse(2) in grid.f90]]> - //www.umsyar.com/wfnbbs/viewtopic.php?id=752 Sun, 18 Dec 2022 09:38:33 +0000 FluxBB <![CDATA[Re: Patch: omp collapse(2) in grid.f90]]> //www.umsyar.com/wfnbbs/viewtopic.php?pid=2901#p2901 Dear Igor,

collapse(2) is really fantastic! Your patch has been merged into official source code.

I tested 704atoms.wfn on my dual AMD EPYC 7R32 (96 physical cores) server, the costs using new version for calculating high quality grid data of electron density and ELF are 2s and 6s, respectively. While the costs using old version are 5s and 20s. The speed-up by collapse(2) on the server with large number of cores is surprisingly high!

However, I removed "if(mod(ifinish,256)==0)", otherwise after calculation I will observe

Calculation of grid data took up wall clock time         2 s-]   99.89 %     /

Namely the progress bar is not 100%. My brief test showed that removing "if(mod(ifinish,256)==0)" doesn't detectably hurt performance, at least on my 8-core notebook and 96-core server.

Best regards,

Tian

]]>
Sun, 18 Dec 2022 09:38:33 +0000 //www.umsyar.com/wfnbbs/viewtopic.php?pid=2901#p2901
<![CDATA[Re: Patch: omp collapse(2) in grid.f90]]> //www.umsyar.com/wfnbbs/viewtopic.php?pid=2891#p2891 Dear Igor,

Thanks, I'll check and test shortly. I just infected with COVID-19 and my productivity has been greatly affected, so it may take longer time for me to give you reponse...

Best regards,

Tian

]]>
Fri, 16 Dec 2022 06:02:06 +0000 //www.umsyar.com/wfnbbs/viewtopic.php?pid=2891#p2891
<![CDATA[Patch: omp collapse(2) in grid.f90]]> //www.umsyar.com/wfnbbs/viewtopic.php?pid=2886#p2886 Dear Tian,

As I mentioned on //www.umsyar.com/wfnbbs/viewtopic.php?id=732 topic, I found a way to have a speed-up.

Here, the patch is presented. The patch is affected for machines with a large number of threads. Probably, a similar patch can be applied through the whole code.

Multiwfn_collapse.patch.txt

The effect of the patch I tested on 704atoms.wfn. Here, the speed-ups are presented for a different number of cores. The black line means ideal scale. After the patch, the ideal scale is up to 26 cores, while before only up to 19 (?).

collapse.png

Probably, for better scalability, I need a larger system (or a slower computer) since even for code without collapse, near 32 cores, time became about 5 seconds, and for `collapse(2)`, time became about 3 seconds for 32 cores.

Best regards,
Igor

]]>
Thu, 15 Dec 2022 13:31:45 +0000 //www.umsyar.com/wfnbbs/viewtopic.php?pid=2886#p2886