-
ulrich_y authored
I've studied the timing a bit and the message is quite clear: 1) Use -O3 2) Use -march=native and -mtune=native (in some cases it might be better to use actually work out what the architecture is as Kalby Lake (7th gen i5) is misdetected as Broadwell (5th gen)) 3) even though -ffast-math speeds the code up tremendously it also produces very wrong results. Below a list of G/s Nothing : 5907.27, 5852.97, 4255.59, 5627.56, 5886.03 O3 : 9780.68, 11269.85, 11464.97, 10475.08, 10966.49 O3+unroll : 11385.20, 10785.49, 11361.03, 10225.69, 11134.86 O3+tree vec : 11028.18, 11232.13, 11349.96, 11257.04, 11410.13 O3+native : 9124.84, 8609.82, 9330.82, 9912.89, 9503.70 O3+skylake : 11894.75, 11966.64, 11882.61, 12000.73, 11666.96 O3+march+mtune : 11818.07, 11943.54, 11963.75, 11780.81, 11560.69 O3+native+nati : 11390.44, 11873.35, 11827.96, 11781.51, 11725.69 O3+ffast-math : 19014.69, 19016.52, 18849.96, 19213.97, 19067.80 O3+un+vec+nat+nat : 11521.01, 11666.27, 11341.51, 11290.41, 11488.92 O3+vec+nat+nat : 10712.55, 11211.78, 11328.53, 11140.49, 11298.79 O3+un+nat+nat : 11702.13, 11442.44, 11680.73, 11498.26, 11677.28 unroll: -funroll-loops tree vec: -ftree-vectorize skylake: -march=skylake native: -march=native march+mtune: -march=skylake -mtune=skylake
3c052e0e