Untitled
unknown
batchfile
12 days ago
20 kB
3
Indexable
============================================================================================================== ► 每個『組合實例』(t1 fused vs. t2 separate) [1] mobilevit_A │ CONV │ /net/mobilevit/conv_stem/convolution/Conv t1 fused : 0.019936 ms t2 separate : 0.017760 ms (Δ -0.002176 ms, speed‑up 0.89×) ── t2 breakdown ── normal : 0.013792 ms Sigmoid : 0.002336 ms Mul : 0.001632 ms [2] mobilevit_A │ CONV │ /net/mobilevit/encoder/0/layer.0/expand_1x1/convolution/Conv t1 fused : 0.016320 ms t2 separate : 0.018912 ms (Δ 0.002592 ms, speed‑up 1.16×) ── t2 breakdown ── normal : 0.009472 ms Sigmoid : 0.006304 ms Mul : 0.003136 ms [3] mobilevit_A │ CONV │ /net/mobilevit/encoder/1/0/expand_1x1/convolution/Conv t1 fused : 0.017536 ms t2 separate : 0.027424 ms (Δ 0.009888 ms, speed‑up 1.56×) ── t2 breakdown ── normal : 0.010656 ms Sigmoid : 0.011520 ms Mul : 0.005248 ms [4] mobilevit_A │ CONV │ /net/mobilevit/encoder/1/1/expand_1x1/convolution/Conv t1 fused : 0.010496 ms t2 separate : 0.016224 ms (Δ 0.005728 ms, speed‑up 1.55×) ── t2 breakdown ── normal : 0.006880 ms Sigmoid : 0.006304 ms Mul : 0.003040 ms [5] mobilevit_A │ CONV │ /net/mobilevit/encoder/1/2/expand_1x1/convolution/Conv t1 fused : 0.010496 ms t2 separate : 0.016384 ms (Δ 0.005888 ms, speed‑up 1.56×) ── t2 breakdown ── normal : 0.006912 ms Sigmoid : 0.006304 ms Mul : 0.003168 ms [6] mobilevit_A │ CONV │ /net/mobilevit/encoder/2/downsampling_layer/expand_1x1/convolution/Conv t1 fused : 0.010624 ms t2 separate : 0.016320 ms (Δ 0.005696 ms, speed‑up 1.54×) ── t2 breakdown ── normal : 0.006880 ms Sigmoid : 0.006336 ms Mul : 0.003104 ms [7] mobilevit_A │ CONV │ /net/mobilevit/encoder/2/conv_kxk/convolution/Conv t1 fused : 0.030688 ms t2 separate : 0.029408 ms (Δ -0.001280 ms, speed‑up 0.96×) ── t2 breakdown ── normal : 0.026688 ms Sigmoid : 0.001505 ms Mul : 0.001215 ms [8] div │ MATMUL │ /net/mobilevit/encoder/2/transformer/0/attention/attention/MatMul t1 fused : 0.006817 ms t2 separate : 0.016960 ms (Δ 0.010143 ms, speed‑up 2.49×) ── t2 breakdown ── normal : 0.005632 ms Div : 0.011328 ms [9] double_add │ MATMUL │ /net/mobilevit/encoder/2/transformer/0/attention/output/dense/MatMul t1 fused : 0.008448 ms t2 separate : 0.010689 ms (Δ 0.002241 ms, speed‑up 1.27×) ── t2 breakdown ── normal : 0.007361 ms Add : 0.003328 ms [10] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/2/transformer/0/intermediate/dense/MatMul t1 fused : 0.011104 ms t2 separate : 0.016033 ms (Δ 0.004929 ms, speed‑up 1.44×) ── t2 breakdown ── normal : 0.007521 ms Add : 0.003872 ms Sigmoid : 0.002368 ms Mul : 0.002272 ms [11] double_add │ MATMUL │ /net/mobilevit/encoder/2/transformer/0/output/dense/MatMul t1 fused : 0.011743 ms t2 separate : 0.015392 ms (Δ 0.003649 ms, speed‑up 1.31×) ── t2 breakdown ── normal : 0.010880 ms Add : 0.002880 ms Add : 0.001632 ms [12] div │ MATMUL │ /net/mobilevit/encoder/2/transformer/1/attention/attention/MatMul t1 fused : 0.006752 ms t2 separate : 0.016831 ms (Δ 0.010079 ms, speed‑up 2.49×) ── t2 breakdown ── normal : 0.005631 ms Div : 0.011200 ms [13] double_add │ MATMUL │ /net/mobilevit/encoder/2/transformer/1/attention/output/dense/MatMul t1 fused : 0.008320 ms t2 separate : 0.010688 ms (Δ 0.002368 ms, speed‑up 1.28×) ── t2 breakdown ── normal : 0.007392 ms Add : 0.003296 ms [14] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/2/transformer/1/intermediate/dense/MatMul t1 fused : 0.011040 ms t2 separate : 0.015776 ms (Δ 0.004736 ms, speed‑up 1.43×) ── t2 breakdown ── normal : 0.007424 ms Add : 0.003648 ms Sigmoid : 0.002368 ms Mul : 0.002336 ms [15] double_add │ MATMUL │ /net/mobilevit/encoder/2/transformer/1/output/dense/MatMul t1 fused : 0.011841 ms t2 separate : 0.015264 ms (Δ 0.003423 ms, speed‑up 1.29×) ── t2 breakdown ── normal : 0.010848 ms Add : 0.002848 ms Add : 0.001568 ms [16] mobilevit_A │ CONV │ /net/mobilevit/encoder/2/conv_projection/convolution/Conv t1 fused : 0.012576 ms t2 separate : 0.011648 ms (Δ -0.000928 ms, speed‑up 0.93×) ── t2 breakdown ── normal : 0.008928 ms Sigmoid : 0.001504 ms Mul : 0.001216 ms [17] mobilevit_A │ CONV │ /net/mobilevit/encoder/2/fusion/convolution/Conv t1 fused : 0.052385 ms t2 separate : 0.051585 ms (Δ -0.000800 ms, speed‑up 0.98×) ── t2 breakdown ── normal : 0.048865 ms Sigmoid : 0.001504 ms Mul : 0.001216 ms [18] mobilevit_A │ CONV │ /net/mobilevit/encoder/3/downsampling_layer/expand_1x1/convolution/Conv t1 fused : 0.010400 ms t2 separate : 0.011968 ms (Δ 0.001568 ms, speed‑up 1.15×) ── t2 breakdown ── normal : 0.007168 ms Sigmoid : 0.003008 ms Mul : 0.001792 ms [19] mobilevit_A │ CONV │ /net/mobilevit/encoder/3/conv_kxk/convolution/Conv t1 fused : 0.037952 ms t2 separate : 0.036128 ms (Δ -0.001824 ms, speed‑up 0.95×) ── t2 breakdown ── normal : 0.034016 ms Sigmoid : 0.001088 ms Mul : 0.001024 ms [20] div │ MATMUL │ /net/mobilevit/encoder/3/transformer/0/attention/attention/MatMul t1 fused : 0.005280 ms t2 separate : 0.007744 ms (Δ 0.002464 ms, speed‑up 1.47×) ── t2 breakdown ── normal : 0.004384 ms Div : 0.003360 ms [21] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/0/attention/output/dense/MatMul t1 fused : 0.008320 ms t2 separate : 0.011232 ms (Δ 0.002912 ms, speed‑up 1.35×) ── t2 breakdown ── normal : 0.008224 ms Add : 0.003008 ms [22] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/3/transformer/0/intermediate/dense/MatMul t1 fused : 0.011199 ms t2 separate : 0.014240 ms (Δ 0.003041 ms, speed‑up 1.27×) ── t2 breakdown ── normal : 0.008256 ms Add : 0.003072 ms Sigmoid : 0.001472 ms Mul : 0.001440 ms [23] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/0/output/dense/MatMul t1 fused : 0.013312 ms t2 separate : 0.017055 ms (Δ 0.003743 ms, speed‑up 1.28×) ── t2 breakdown ── normal : 0.013184 ms Add : 0.002656 ms Add : 0.001215 ms [24] div │ MATMUL │ /net/mobilevit/encoder/3/transformer/1/attention/attention/MatMul t1 fused : 0.005152 ms t2 separate : 0.007649 ms (Δ 0.002497 ms, speed‑up 1.48×) ── t2 breakdown ── normal : 0.004352 ms Div : 0.003297 ms [25] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/1/attention/output/dense/MatMul t1 fused : 0.008288 ms t2 separate : 0.011231 ms (Δ 0.002943 ms, speed‑up 1.36×) ── t2 breakdown ── normal : 0.008224 ms Add : 0.003007 ms [26] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/3/transformer/1/intermediate/dense/MatMul t1 fused : 0.011168 ms t2 separate : 0.014080 ms (Δ 0.002912 ms, speed‑up 1.26×) ── t2 breakdown ── normal : 0.008256 ms Add : 0.002944 ms Sigmoid : 0.001440 ms Mul : 0.001440 ms [27] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/1/output/dense/MatMul t1 fused : 0.013344 ms t2 separate : 0.017120 ms (Δ 0.003776 ms, speed‑up 1.28×) ── t2 breakdown ── normal : 0.013184 ms Add : 0.002720 ms Add : 0.001216 ms [28] div │ MATMUL │ /net/mobilevit/encoder/3/transformer/2/attention/attention/MatMul t1 fused : 0.005152 ms t2 separate : 0.007648 ms (Δ 0.002496 ms, speed‑up 1.48×) ── t2 breakdown ── normal : 0.004352 ms Div : 0.003296 ms [29] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/2/attention/output/dense/MatMul t1 fused : 0.008353 ms t2 separate : 0.011232 ms (Δ 0.002879 ms, speed‑up 1.34×) ── t2 breakdown ── normal : 0.008192 ms Add : 0.003040 ms [30] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/3/transformer/2/intermediate/dense/MatMul t1 fused : 0.011168 ms t2 separate : 0.014080 ms (Δ 0.002912 ms, speed‑up 1.26×) ── t2 breakdown ── normal : 0.008288 ms Add : 0.002912 ms Sigmoid : 0.001440 ms Mul : 0.001440 ms [31] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/2/output/dense/MatMul t1 fused : 0.013344 ms t2 separate : 0.016992 ms (Δ 0.003648 ms, speed‑up 1.27×) ── t2 breakdown ── normal : 0.013184 ms Add : 0.002624 ms Add : 0.001184 ms [32] div │ MATMUL │ /net/mobilevit/encoder/3/transformer/3/attention/attention/MatMul t1 fused : 0.005152 ms t2 separate : 0.007680 ms (Δ 0.002528 ms, speed‑up 1.49×) ── t2 breakdown ── normal : 0.004384 ms Div : 0.003296 ms [33] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/3/attention/output/dense/MatMul t1 fused : 0.008352 ms t2 separate : 0.011264 ms (Δ 0.002912 ms, speed‑up 1.35×) ── t2 breakdown ── normal : 0.008224 ms Add : 0.003040 ms [34] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/3/transformer/3/intermediate/dense/MatMul t1 fused : 0.011168 ms t2 separate : 0.014111 ms (Δ 0.002943 ms, speed‑up 1.26×) ── t2 breakdown ── normal : 0.008288 ms Add : 0.002912 ms Sigmoid : 0.001440 ms Mul : 0.001471 ms [35] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/3/output/dense/MatMul t1 fused : 0.013312 ms t2 separate : 0.017088 ms (Δ 0.003776 ms, speed‑up 1.28×) ── t2 breakdown ── normal : 0.013152 ms Add : 0.002720 ms Add : 0.001216 ms [36] mobilevit_A │ CONV │ /net/mobilevit/encoder/3/conv_projection/convolution/Conv t1 fused : 0.013280 ms t2 separate : 0.011872 ms (Δ -0.001408 ms, speed‑up 0.89×) ── t2 breakdown ── normal : 0.009696 ms Sigmoid : 0.001120 ms Mul : 0.001056 ms [37] mobilevit_A │ CONV │ /net/mobilevit/encoder/3/fusion/convolution/Conv t1 fused : 0.066880 ms t2 separate : 0.065664 ms (Δ -0.001216 ms, speed‑up 0.98×) ── t2 breakdown ── normal : 0.063488 ms Sigmoid : 0.001120 ms Mul : 0.001056 ms [38] mobilevit_A │ CONV │ /net/mobilevit/encoder/4/downsampling_layer/expand_1x1/convolution/Conv t1 fused : 0.011424 ms t2 separate : 0.011039 ms (Δ -0.000385 ms, speed‑up 0.97×) ── t2 breakdown ── normal : 0.008032 ms Sigmoid : 0.001696 ms Mul : 0.001311 ms [39] mobilevit_A │ CONV │ /net/mobilevit/encoder/4/conv_kxk/convolution/Conv t1 fused : 0.044928 ms t2 separate : 0.042880 ms (Δ -0.002048 ms, speed‑up 0.95×) ── t2 breakdown ── normal : 0.040896 ms Sigmoid : 0.001024 ms Mul : 0.000960 ms [40] div │ MATMUL │ /net/mobilevit/encoder/4/transformer/0/attention/attention/MatMul t1 fused : 0.005536 ms t2 separate : 0.007841 ms (Δ 0.002305 ms, speed‑up 1.42×) ── t2 breakdown ── normal : 0.004608 ms Div : 0.003233 ms [41] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/0/attention/output/dense/MatMul t1 fused : 0.009760 ms t2 separate : 0.012415 ms (Δ 0.002655 ms, speed‑up 1.27×) ── t2 breakdown ── normal : 0.009408 ms Add : 0.003007 ms [42] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/4/transformer/0/intermediate/dense/MatMul t1 fused : 0.012704 ms t2 separate : 0.014720 ms (Δ 0.002016 ms, speed‑up 1.16×) ── t2 breakdown ── normal : 0.009536 ms Add : 0.002944 ms Sigmoid : 0.001088 ms Mul : 0.001152 ms [43] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/0/output/dense/MatMul t1 fused : 0.015584 ms t2 separate : 0.018848 ms (Δ 0.003264 ms, speed‑up 1.21×) ── t2 breakdown ── normal : 0.015296 ms Add : 0.002528 ms Add : 0.001024 ms [44] div │ MATMUL │ /net/mobilevit/encoder/4/transformer/1/attention/attention/MatMul t1 fused : 0.005568 ms t2 separate : 0.007777 ms (Δ 0.002209 ms, speed‑up 1.40×) ── t2 breakdown ── normal : 0.004641 ms Div : 0.003136 ms [45] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/1/attention/output/dense/MatMul t1 fused : 0.009728 ms t2 separate : 0.012416 ms (Δ 0.002688 ms, speed‑up 1.28×) ── t2 breakdown ── normal : 0.009408 ms Add : 0.003008 ms [46] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/4/transformer/1/intermediate/dense/MatMul t1 fused : 0.012672 ms t2 separate : 0.014495 ms (Δ 0.001823 ms, speed‑up 1.14×) ── t2 breakdown ── normal : 0.009568 ms Add : 0.002751 ms Sigmoid : 0.001088 ms Mul : 0.001088 ms [47] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/1/output/dense/MatMul t1 fused : 0.015552 ms t2 separate : 0.018784 ms (Δ 0.003232 ms, speed‑up 1.21×) ── t2 breakdown ── normal : 0.015232 ms Add : 0.002560 ms Add : 0.000992 ms [48] div │ MATMUL │ /net/mobilevit/encoder/4/transformer/2/attention/attention/MatMul t1 fused : 0.005568 ms t2 separate : 0.007744 ms (Δ 0.002176 ms, speed‑up 1.39×) ── t2 breakdown ── normal : 0.004608 ms Div : 0.003136 ms [49] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/2/attention/output/dense/MatMul t1 fused : 0.009696 ms t2 separate : 0.012448 ms (Δ 0.002752 ms, speed‑up 1.28×) ── t2 breakdown ── normal : 0.009408 ms Add : 0.003040 ms [50] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/4/transformer/2/intermediate/dense/MatMul t1 fused : 0.012704 ms t2 separate : 0.014431 ms (Δ 0.001727 ms, speed‑up 1.14×) ── t2 breakdown ── normal : 0.009535 ms Add : 0.002752 ms Sigmoid : 0.001056 ms Mul : 0.001088 ms [51] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/2/output/dense/MatMul t1 fused : 0.015552 ms t2 separate : 0.018784 ms (Δ 0.003232 ms, speed‑up 1.21×) ── t2 breakdown ── normal : 0.015264 ms Add : 0.002528 ms Add : 0.000992 ms [52] mobilevit_A │ CONV │ /net/mobilevit/encoder/4/conv_projection/convolution/Conv t1 fused : 0.014528 ms t2 separate : 0.012672 ms (Δ -0.001856 ms, speed‑up 0.87×) ── t2 breakdown ── normal : 0.010656 ms Sigmoid : 0.001024 ms Mul : 0.000992 ms [53] mobilevit_A │ CONV │ /net/mobilevit/encoder/4/fusion/convolution/Conv t1 fused : 0.081312 ms t2 separate : 0.079681 ms (Δ -0.001631 ms, speed‑up 0.98×) ── t2 breakdown ── normal : 0.077665 ms Sigmoid : 0.001024 ms Mul : 0.000992 ms [54] mobilevit_A │ CONV │ /net/mobilevit/conv_1x1_exp/convolution/Conv t1 fused : 0.011744 ms t2 separate : 0.010240 ms (Δ -0.001504 ms, speed‑up 0.87×) ── t2 breakdown ── normal : 0.008064 ms Sigmoid : 0.001120 ms Mul : 0.001056 ms ============================================================================================================== ► 每個『組合類型』總和 mobilevit_A : fused 0.473505 ms sep 0.487809 ms Δ 0.014304 ms speed‑up 1.03× div : fused 0.050977 ms sep 0.087874 ms Δ 0.036897 ms speed‑up 1.72× double_add : fused 0.202849 ms sep 0.258942 ms Δ 0.056093 ms speed‑up 1.28× Add_Sig_Mul : fused 0.104927 ms sep 0.131966 ms Δ 0.027039 ms speed‑up 1.26× ============================================================================================================== ► 四組合累計 Σ Σ fused : 0.832258 ms Σ separate : 0.966591 ms Δ (sep‑fuse): 0.134333 ms speed‑up 1.16× ============================================================================================================== ► 其餘 Run_with_type 彙總 (未納入組合) Add : t1 0.079968 ms │ t2 0.093728 ms │ Δ 0.013760 ms Concat : t1 0.007232 ms │ t2 0.008032 ms │ Δ 0.000800 ms Gemm : t1 0.091424 ms │ t2 0.100064 ms │ Δ 0.008640 ms LayerNormalization: t1 1.993962 ms │ t2 1.998125 ms │ Δ 0.004163 ms Mul : t1 0.015808 ms │ t2 0.015424 ms │ Δ -0.000384 ms ReduceMean: t1 0.043969 ms │ t2 0.037824 ms │ Δ -0.006145 ms Sigmoid : t1 0.027776 ms │ t2 0.027745 ms │ Δ -0.000031 ms Softmax : t1 0.019488 ms │ t2 0.024128 ms │ Δ 0.004640 ms Transpose : t1 0.094146 ms │ t2 0.094497 ms │ Δ 0.000351 ms ============================================================================================================== ► 子‑op (含 normal) 跨全部組合累積時間 normal : 0.714083 ms Add : 0.090684 ms Sigmoid : 0.069601 ms Mul : 0.046941 ms Div : 0.045282 ms ============================================================================================================== ► 整體檔案總執行時間 t1 total : 3.518770 ms t2 total : 3.684593 ms Δ : 0.165823 ms speed‑up 1.05× ► CPU t1: Split 9.0 0.276524 0.030725 Reshape 54.0 0.214966 0.003981 0.49149 t2: Split 9.0 0.250503 0.027834 Reshape 54.0 0.211196 0.003911 0.461699 ► 整體檔案總執行時間 + CPU t1 total : 4.010260 ms t2 total : 4.146292 ms Δ : 0.136032 ms speed‑up 1.03×
Editor is loading...
Leave a Comment