- 16 Jul, 2019 6 commits
- 15 Jul, 2019 7 commits
- 14 Jul, 2019 5 commits
- 12 Jul, 2019 7 commits
- 11 Jul, 2019 4 commits
-
-
ulrich_y authored
This is the algorithm by GiNaC. In theory one could extend this to add a caching mechanism such as complex(kind=prec) :: cache(size(x),MPLMaxQ) do q=1,j cache(:,q) = x**q/q**m enddo do q=1,MPLMaxQ res = t(1) ! Fortran uses Column-major order, hence cache(:,q) is ! faster than cache(q,:). cache(:,q+j-1) = x**(q+j-1)/(q+j-1)**m t(j) = t(j) + cache(j,q) do k=1,j-1 t(j-k) = t(j-k) + t(j-k+1) * cache(j-k,k+q) enddo if (mod(q,2) .eq. 1) then if (abs(t(1)-res).lt.MPLdel) exit endif enddo In practice this doesn't really help because any time saved with the cache is paid back through the allocation and clearing of cache(:,:). Both variations work similarly well now. If at some point we might need MPLs with many more arguments (size(x)), this might change.
-
ulrich_y authored
-
ulrich_y authored
-
ulrich_y authored
-
- 10 Jul, 2019 10 commits
-
-
ulrich_y authored
I've studied the timing a bit and the message is quite clear: 1) Use -O3 2) Use -march=native and -mtune=native (in some cases it might be better to use actually work out what the architecture is as Kalby Lake (7th gen i5) is misdetected as Broadwell (5th gen)) 3) even though -ffast-math speeds the code up tremendously it also produces very wrong results. Below a list of G/s Nothing : 5907.27, 5852.97, 4255.59, 5627.56, 5886.03 O3 : 9780.68, 11269.85, 11464.97, 10475.08, 10966.49 O3+unroll : 11385.20, 10785.49, 11361.03, 10225.69, 11134.86 O3+tree vec : 11028.18, 11232.13, 11349.96, 11257.04, 11410.13 O3+native : 9124.84, 8609.82, 9330.82, 9912.89, 9503.70 O3+skylake : 11894.75, 11966.64, 11882.61, 12000.73, 11666.96 O3+march+mtune : 11818.07, 11943.54, 11963.75, 11780.81, 11560.69 O3+native+nati : 11390.44, 11873.35, 11827.96, 11781.51, 11725.69 O3+ffast-math : 19014.69, 19016.52, 18849.96, 19213.97, 19067.80 O3+un+vec+nat+nat : 11521.01, 11666.27, 11341.51, 11290.41, 11488.92 O3+vec+nat+nat : 10712.55, 11211.78, 11328.53, 11140.49, 11298.79 O3+un+nat+nat : 11702.13, 11442.44, 11680.73, 11498.26, 11677.28 unroll: -funroll-loops tree vec: -ftree-vectorize skylake: -march=skylake native: -march=native march+mtune: -march=skylake -mtune=skylake
-
ulrich_y authored
-
ulrich_y authored
-
ulrich_y authored
-
ulrich_y authored
-
ulrich_y authored
-
ulrich_y authored
-
ulrich_y authored
-
ulrich_y authored
-
Luca Naterop authored
-
- 09 Jul, 2019 1 commit
-
-
Luca Naterop authored
-