Results of parallel 2D FFT (1024x1024)
esp3:ce107/work/SP2-tutorial% mpirun -np 1 2dfft1x1
2dfft achieves 23.6892973841184826 Mflop/s
( 10 iterations in 44.2637020000000021 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 25.8000219770237216 Mflop/s
( 10 iterations in 40.6424460000000067 secs)
Using unit-stride stores for copying
************************************
row_2dfft achieves 29.1567400283460358 Mflop/s
( 10 iterations in 35.9634169999999926 secs)
************************************
"transpose" row_2dfft achieves 48.1264008311380067 Mflop/s
( 10 iterations in 21.7879580000000033 secs)
************************************
in-place pdcft2 achieves 69.1892987583912742 Mflop/s
( 10 iterations in 15.1551759999999831 secs)
************************************
pdcft2 achieves 69.7629223868726740 Mflop/s
( 10 iterations in 15.0305630000000008 secs)
************************************
"transpose" pdcft2 achieves 97.0241513870488035 Mflop/s
( 10 iterations in 10.8073709999999892 secs)
************************************
in-place dual dcft achieve 55.3922864815433584 Mflop/s
( 10 iterations in 18.9300003051757812 secs)
************************************
The leading dimension is 1028 for m = 1024
dual dcft achieve 73.1734825947448542 Mflop/s
( 10 iterations in 14.3299999237060547 secs)
************************************
in-place dcft2 achieves 55.7456687810419567 Mflop/s
( 10 iterations in 18.8099994659423828 secs)
************************************
The leading dimension is 1028 for m = 1024
dcft2 achieves 88.0416468038697815 Mflop/s
( 10 iterations in 11.9099998474121094 secs)
************************************
The leading dimension is 1028 for n = 1024
"transpose" dfct2 achieves 100.438318213179656 Mflop/s
( 10 iterations in 10.4399995803833008 secs)
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 2 2dfft2x1
2dfft achieves 33.9549647643603976 Mflop/s
( 10 iterations in 30.8813749999999985 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 36.4694179923623238 Mflop/s
( 10 iterations in 28.7522000000000020 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 2 2dfft1x2
2dfft achieves 45.7082437833447628 Mflop/s
( 10 iterations in 22.9406320000000008 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 51.2392250278313597 Mflop/s
( 10 iterations in 20.4643220000000028 secs)
Using unit-stride stores for copying
************************************
row_2dfft achieves 52.2715384629572597 Mflop/s
( 10 iterations in 20.0601710000000040 secs)
************************************
"transpose" row_2dfft achieves 84.5677501898905746 Mflop/s
( 10 iterations in 12.3992420000000081 secs)
************************************
in-place pdcft2 achieves 63.5959653301261767 Mflop/s
( 10 iterations in 16.4880899999999997 secs)
************************************
pdcft2 achieves 63.6752036062146800 Mflop/s
( 10 iterations in 16.4675720000000041 secs)
************************************
"transpose" pdcft2 achieves 119.927017375633582 Mflop/s
( 10 iterations in 8.74345099999999320 secs)
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 2 2dfft4x1
2dfft achieves 58.4136069941060327 Mflop/s
( 10 iterations in 17.9508860000000006 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 61.9241500972761756 Mflop/s
( 10 iterations in 16.9332320000000003 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 2 2dfft2x2
2dfft achieves 63.2847555689566121 Mflop/s
( 10 iterations in 16.5691720000000018 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 66.5042056342389145 Mflop/s
( 10 iterations in 15.7670630000000003 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 2 2dfft1x4
2dfft achieves 79.7560258593117624 Mflop/s
( 10 iterations in 13.1472949999999997 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 85.1894098186982376 Mflop/s
( 10 iterations in 12.3087599999999995 secs)
Using unit-stride stores for copying
************************************
row_2dfft achieves 84.3598991841536900 Mflop/s
( 10 iterations in 12.4297919999999991 secs)
************************************
"transpose" row_2dfft achieves 145.764395084766505 Mflop/s
( 10 iterations in 7.19363600000000503 secs)
************************************
in-place pdcft2 achieves 141.969354713799078 Mflop/s
( 10 iterations in 7.38593200000000394 secs)
************************************
pdcft2 achieves 144.872379930778521 Mflop/s
( 10 iterations in 7.23792900000000117 secs)
************************************
"transpose" pdcft2 achieves 276.454384773410936 Mflop/s
( 10 iterations in 3.79294399999999854 secs)
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 8 2dfft8x1
2dfft achieves 95.6782305600671634 Mflop/s
( 10 iterations in 10.9594000000000005 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 103.349381141253232 Mflop/s
( 10 iterations in 10.1459340000000005 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 8 2dfft4x2
2dfft achieves 112.462095727533054 Mflop/s
( 10 iterations in 9.32381700000000002 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 115.498105516752418 Mflop/s
( 10 iterations in 9.07872899999999916 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 8 2dfft2x4
2dfft achieves 111.974724699364060 Mflop/s
( 10 iterations in 9.36439900000000058 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 114.444076755904405 Mflop/s
( 10 iterations in 9.16234399999999916 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 8 2dfft1x8
2dfft achieves 121.504788865243739 Mflop/s
( 10 iterations in 8.62991500000000045 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 128.355217423609702 Mflop/s
( 10 iterations in 8.16932899999999940 secs)
Using unit-stride stores for copying
************************************
row_2dfft achieves 143.933640213942482 Mflop/s
( 10 iterations in 7.28513500000000036 secs)
************************************
"transpose" row_2dfft achieves 241.600637401583981 Mflop/s
( 10 iterations in 4.34012099999999990 secs)
************************************
in-place pdcft2 achieves 276.854307564558553 Mflop/s
( 10 iterations in 3.78746499999999742 secs)
************************************
pdcft2 achieves 279.769103829904225 Mflop/s
( 10 iterations in 3.74800499999999914 secs)
************************************
"transpose" pdcft2 achieves 517.565854679423637 Mflop/s
( 10 iterations in 2.02597600000000000 secs)
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 16 2dfft16x1
2dfft achieves 169.356879743688012 Mflop/s
( 10 iterations in 6.19151700000000016 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 174.396550175664544 Mflop/s
( 10 iterations in 6.01259600000000027 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 16 2dfft8x2
2dfft achieves 187.965016124203885 Mflop/s
( 10 iterations in 5.57857000000000003 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 196.006908065123071 Mflop/s
( 10 iterations in 5.34968899999999969 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 16 2dfft4x4
2dfft achieves 196.886840988550063 Mflop/s
( 10 iterations in 5.32577999999999996 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 209.046960486110095 Mflop/s
( 10 iterations in 5.01598300000000119 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 16 2dfft2x8
2dfft achieves 184.055248029163067 Mflop/s
( 10 iterations in 5.69707200000000036 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 192.622283781773035 Mflop/s
( 10 iterations in 5.44369000000000014 secs)
Using unit-stride stores for copying
************************************
esp3:ce107/work/SP2-tutorial% mpirun -np 16 2dfft1x16
2dfft achieves 220.817320762925959 Mflop/s
( 10 iterations in 4.74861299999999975 secs)
Using unit-stride loads for copying
************************************
2dfft achieves 222.642490952154958 Mflop/s
( 10 iterations in 4.70968500000000034 secs)
Using unit-stride stores for copying
************************************
row_2dfft achieves 237.790345119108736 Mflop/s
( 10 iterations in 4.40966600000000142 secs)
************************************
"transpose" row_2dfft achieves 414.412036008748771 Mflop/s
( 10 iterations in 2.53027399999999858 secs)
************************************
in-place pdcft2 achieves 553.576632470763457 Mflop/s
( 10 iterations in 1.89418399999999920 secs)
************************************
pdcft2 achieves 557.139153018587763 Mflop/s
( 10 iterations in 1.88207200000000086 secs)
************************************
"transpose" pdcft2 achieves 911.919405560348537 Mflop/s
( 10 iterations in 1.14985599999999977 secs)
************************************