Untitled

 avatar
unknown
batchfile
12 days ago
20 kB
3
Indexable
==============================================================================================================
► 每個『組合實例』(t1 fused vs. t2 separate)

[1] mobilevit_A │ CONV │ /net/mobilevit/conv_stem/convolution/Conv
  t1 fused    : 0.019936 ms
  t2 separate : 0.017760 ms  (Δ -0.002176 ms, speed‑up 0.89×)
  ── t2 breakdown ──
    normal  : 0.013792 ms
    Sigmoid : 0.002336 ms
    Mul     : 0.001632 ms

[2] mobilevit_A │ CONV │ /net/mobilevit/encoder/0/layer.0/expand_1x1/convolution/Conv
  t1 fused    : 0.016320 ms
  t2 separate : 0.018912 ms  (Δ 0.002592 ms, speed‑up 1.16×)
  ── t2 breakdown ──
    normal  : 0.009472 ms
    Sigmoid : 0.006304 ms
    Mul     : 0.003136 ms

[3] mobilevit_A │ CONV │ /net/mobilevit/encoder/1/0/expand_1x1/convolution/Conv
  t1 fused    : 0.017536 ms
  t2 separate : 0.027424 ms  (Δ 0.009888 ms, speed‑up 1.56×)
  ── t2 breakdown ──
    normal  : 0.010656 ms
    Sigmoid : 0.011520 ms
    Mul     : 0.005248 ms

[4] mobilevit_A │ CONV │ /net/mobilevit/encoder/1/1/expand_1x1/convolution/Conv
  t1 fused    : 0.010496 ms
  t2 separate : 0.016224 ms  (Δ 0.005728 ms, speed‑up 1.55×)
  ── t2 breakdown ──
    normal  : 0.006880 ms
    Sigmoid : 0.006304 ms
    Mul     : 0.003040 ms

[5] mobilevit_A │ CONV │ /net/mobilevit/encoder/1/2/expand_1x1/convolution/Conv
  t1 fused    : 0.010496 ms
  t2 separate : 0.016384 ms  (Δ 0.005888 ms, speed‑up 1.56×)
  ── t2 breakdown ──
    normal  : 0.006912 ms
    Sigmoid : 0.006304 ms
    Mul     : 0.003168 ms

[6] mobilevit_A │ CONV │ /net/mobilevit/encoder/2/downsampling_layer/expand_1x1/convolution/Conv
  t1 fused    : 0.010624 ms
  t2 separate : 0.016320 ms  (Δ 0.005696 ms, speed‑up 1.54×)
  ── t2 breakdown ──
    normal  : 0.006880 ms
    Sigmoid : 0.006336 ms
    Mul     : 0.003104 ms

[7] mobilevit_A │ CONV │ /net/mobilevit/encoder/2/conv_kxk/convolution/Conv
  t1 fused    : 0.030688 ms
  t2 separate : 0.029408 ms  (Δ -0.001280 ms, speed‑up 0.96×)
  ── t2 breakdown ──
    normal  : 0.026688 ms
    Sigmoid : 0.001505 ms
    Mul     : 0.001215 ms

[8] div │ MATMUL │ /net/mobilevit/encoder/2/transformer/0/attention/attention/MatMul
  t1 fused    : 0.006817 ms
  t2 separate : 0.016960 ms  (Δ 0.010143 ms, speed‑up 2.49×)
  ── t2 breakdown ──
    normal  : 0.005632 ms
    Div     : 0.011328 ms

[9] double_add │ MATMUL │ /net/mobilevit/encoder/2/transformer/0/attention/output/dense/MatMul
  t1 fused    : 0.008448 ms
  t2 separate : 0.010689 ms  (Δ 0.002241 ms, speed‑up 1.27×)
  ── t2 breakdown ──
    normal  : 0.007361 ms
    Add     : 0.003328 ms

[10] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/2/transformer/0/intermediate/dense/MatMul
  t1 fused    : 0.011104 ms
  t2 separate : 0.016033 ms  (Δ 0.004929 ms, speed‑up 1.44×)
  ── t2 breakdown ──
    normal  : 0.007521 ms
    Add     : 0.003872 ms
    Sigmoid : 0.002368 ms
    Mul     : 0.002272 ms

[11] double_add │ MATMUL │ /net/mobilevit/encoder/2/transformer/0/output/dense/MatMul
  t1 fused    : 0.011743 ms
  t2 separate : 0.015392 ms  (Δ 0.003649 ms, speed‑up 1.31×)
  ── t2 breakdown ──
    normal  : 0.010880 ms
    Add     : 0.002880 ms
    Add     : 0.001632 ms

[12] div │ MATMUL │ /net/mobilevit/encoder/2/transformer/1/attention/attention/MatMul
  t1 fused    : 0.006752 ms
  t2 separate : 0.016831 ms  (Δ 0.010079 ms, speed‑up 2.49×)
  ── t2 breakdown ──
    normal  : 0.005631 ms
    Div     : 0.011200 ms

[13] double_add │ MATMUL │ /net/mobilevit/encoder/2/transformer/1/attention/output/dense/MatMul
  t1 fused    : 0.008320 ms
  t2 separate : 0.010688 ms  (Δ 0.002368 ms, speed‑up 1.28×)
  ── t2 breakdown ──
    normal  : 0.007392 ms
    Add     : 0.003296 ms

[14] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/2/transformer/1/intermediate/dense/MatMul
  t1 fused    : 0.011040 ms
  t2 separate : 0.015776 ms  (Δ 0.004736 ms, speed‑up 1.43×)
  ── t2 breakdown ──
    normal  : 0.007424 ms
    Add     : 0.003648 ms
    Sigmoid : 0.002368 ms
    Mul     : 0.002336 ms

[15] double_add │ MATMUL │ /net/mobilevit/encoder/2/transformer/1/output/dense/MatMul
  t1 fused    : 0.011841 ms
  t2 separate : 0.015264 ms  (Δ 0.003423 ms, speed‑up 1.29×)
  ── t2 breakdown ──
    normal  : 0.010848 ms
    Add     : 0.002848 ms
    Add     : 0.001568 ms

[16] mobilevit_A │ CONV │ /net/mobilevit/encoder/2/conv_projection/convolution/Conv
  t1 fused    : 0.012576 ms
  t2 separate : 0.011648 ms  (Δ -0.000928 ms, speed‑up 0.93×)
  ── t2 breakdown ──
    normal  : 0.008928 ms
    Sigmoid : 0.001504 ms
    Mul     : 0.001216 ms

[17] mobilevit_A │ CONV │ /net/mobilevit/encoder/2/fusion/convolution/Conv
  t1 fused    : 0.052385 ms
  t2 separate : 0.051585 ms  (Δ -0.000800 ms, speed‑up 0.98×)
  ── t2 breakdown ──
    normal  : 0.048865 ms
    Sigmoid : 0.001504 ms
    Mul     : 0.001216 ms

[18] mobilevit_A │ CONV │ /net/mobilevit/encoder/3/downsampling_layer/expand_1x1/convolution/Conv
  t1 fused    : 0.010400 ms
  t2 separate : 0.011968 ms  (Δ 0.001568 ms, speed‑up 1.15×)
  ── t2 breakdown ──
    normal  : 0.007168 ms
    Sigmoid : 0.003008 ms
    Mul     : 0.001792 ms

[19] mobilevit_A │ CONV │ /net/mobilevit/encoder/3/conv_kxk/convolution/Conv
  t1 fused    : 0.037952 ms
  t2 separate : 0.036128 ms  (Δ -0.001824 ms, speed‑up 0.95×)
  ── t2 breakdown ──
    normal  : 0.034016 ms
    Sigmoid : 0.001088 ms
    Mul     : 0.001024 ms

[20] div │ MATMUL │ /net/mobilevit/encoder/3/transformer/0/attention/attention/MatMul
  t1 fused    : 0.005280 ms
  t2 separate : 0.007744 ms  (Δ 0.002464 ms, speed‑up 1.47×)
  ── t2 breakdown ──
    normal  : 0.004384 ms
    Div     : 0.003360 ms

[21] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/0/attention/output/dense/MatMul
  t1 fused    : 0.008320 ms
  t2 separate : 0.011232 ms  (Δ 0.002912 ms, speed‑up 1.35×)
  ── t2 breakdown ──
    normal  : 0.008224 ms
    Add     : 0.003008 ms

[22] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/3/transformer/0/intermediate/dense/MatMul
  t1 fused    : 0.011199 ms
  t2 separate : 0.014240 ms  (Δ 0.003041 ms, speed‑up 1.27×)
  ── t2 breakdown ──
    normal  : 0.008256 ms
    Add     : 0.003072 ms
    Sigmoid : 0.001472 ms
    Mul     : 0.001440 ms

[23] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/0/output/dense/MatMul
  t1 fused    : 0.013312 ms
  t2 separate : 0.017055 ms  (Δ 0.003743 ms, speed‑up 1.28×)
  ── t2 breakdown ──
    normal  : 0.013184 ms
    Add     : 0.002656 ms
    Add     : 0.001215 ms

[24] div │ MATMUL │ /net/mobilevit/encoder/3/transformer/1/attention/attention/MatMul
  t1 fused    : 0.005152 ms
  t2 separate : 0.007649 ms  (Δ 0.002497 ms, speed‑up 1.48×)
  ── t2 breakdown ──
    normal  : 0.004352 ms
    Div     : 0.003297 ms

[25] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/1/attention/output/dense/MatMul
  t1 fused    : 0.008288 ms
  t2 separate : 0.011231 ms  (Δ 0.002943 ms, speed‑up 1.36×)
  ── t2 breakdown ──
    normal  : 0.008224 ms
    Add     : 0.003007 ms

[26] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/3/transformer/1/intermediate/dense/MatMul
  t1 fused    : 0.011168 ms
  t2 separate : 0.014080 ms  (Δ 0.002912 ms, speed‑up 1.26×)
  ── t2 breakdown ──
    normal  : 0.008256 ms
    Add     : 0.002944 ms
    Sigmoid : 0.001440 ms
    Mul     : 0.001440 ms

[27] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/1/output/dense/MatMul
  t1 fused    : 0.013344 ms
  t2 separate : 0.017120 ms  (Δ 0.003776 ms, speed‑up 1.28×)
  ── t2 breakdown ──
    normal  : 0.013184 ms
    Add     : 0.002720 ms
    Add     : 0.001216 ms

[28] div │ MATMUL │ /net/mobilevit/encoder/3/transformer/2/attention/attention/MatMul
  t1 fused    : 0.005152 ms
  t2 separate : 0.007648 ms  (Δ 0.002496 ms, speed‑up 1.48×)
  ── t2 breakdown ──
    normal  : 0.004352 ms
    Div     : 0.003296 ms

[29] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/2/attention/output/dense/MatMul
  t1 fused    : 0.008353 ms
  t2 separate : 0.011232 ms  (Δ 0.002879 ms, speed‑up 1.34×)
  ── t2 breakdown ──
    normal  : 0.008192 ms
    Add     : 0.003040 ms

[30] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/3/transformer/2/intermediate/dense/MatMul
  t1 fused    : 0.011168 ms
  t2 separate : 0.014080 ms  (Δ 0.002912 ms, speed‑up 1.26×)
  ── t2 breakdown ──
    normal  : 0.008288 ms
    Add     : 0.002912 ms
    Sigmoid : 0.001440 ms
    Mul     : 0.001440 ms

[31] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/2/output/dense/MatMul
  t1 fused    : 0.013344 ms
  t2 separate : 0.016992 ms  (Δ 0.003648 ms, speed‑up 1.27×)
  ── t2 breakdown ──
    normal  : 0.013184 ms
    Add     : 0.002624 ms
    Add     : 0.001184 ms

[32] div │ MATMUL │ /net/mobilevit/encoder/3/transformer/3/attention/attention/MatMul
  t1 fused    : 0.005152 ms
  t2 separate : 0.007680 ms  (Δ 0.002528 ms, speed‑up 1.49×)
  ── t2 breakdown ──
    normal  : 0.004384 ms
    Div     : 0.003296 ms

[33] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/3/attention/output/dense/MatMul
  t1 fused    : 0.008352 ms
  t2 separate : 0.011264 ms  (Δ 0.002912 ms, speed‑up 1.35×)
  ── t2 breakdown ──
    normal  : 0.008224 ms
    Add     : 0.003040 ms

[34] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/3/transformer/3/intermediate/dense/MatMul
  t1 fused    : 0.011168 ms
  t2 separate : 0.014111 ms  (Δ 0.002943 ms, speed‑up 1.26×)
  ── t2 breakdown ──
    normal  : 0.008288 ms
    Add     : 0.002912 ms
    Sigmoid : 0.001440 ms
    Mul     : 0.001471 ms

[35] double_add │ MATMUL │ /net/mobilevit/encoder/3/transformer/3/output/dense/MatMul
  t1 fused    : 0.013312 ms
  t2 separate : 0.017088 ms  (Δ 0.003776 ms, speed‑up 1.28×)
  ── t2 breakdown ──
    normal  : 0.013152 ms
    Add     : 0.002720 ms
    Add     : 0.001216 ms

[36] mobilevit_A │ CONV │ /net/mobilevit/encoder/3/conv_projection/convolution/Conv
  t1 fused    : 0.013280 ms
  t2 separate : 0.011872 ms  (Δ -0.001408 ms, speed‑up 0.89×)
  ── t2 breakdown ──
    normal  : 0.009696 ms
    Sigmoid : 0.001120 ms
    Mul     : 0.001056 ms

[37] mobilevit_A │ CONV │ /net/mobilevit/encoder/3/fusion/convolution/Conv
  t1 fused    : 0.066880 ms
  t2 separate : 0.065664 ms  (Δ -0.001216 ms, speed‑up 0.98×)
  ── t2 breakdown ──
    normal  : 0.063488 ms
    Sigmoid : 0.001120 ms
    Mul     : 0.001056 ms

[38] mobilevit_A │ CONV │ /net/mobilevit/encoder/4/downsampling_layer/expand_1x1/convolution/Conv
  t1 fused    : 0.011424 ms
  t2 separate : 0.011039 ms  (Δ -0.000385 ms, speed‑up 0.97×)
  ── t2 breakdown ──
    normal  : 0.008032 ms
    Sigmoid : 0.001696 ms
    Mul     : 0.001311 ms

[39] mobilevit_A │ CONV │ /net/mobilevit/encoder/4/conv_kxk/convolution/Conv
  t1 fused    : 0.044928 ms
  t2 separate : 0.042880 ms  (Δ -0.002048 ms, speed‑up 0.95×)
  ── t2 breakdown ──
    normal  : 0.040896 ms
    Sigmoid : 0.001024 ms
    Mul     : 0.000960 ms

[40] div │ MATMUL │ /net/mobilevit/encoder/4/transformer/0/attention/attention/MatMul
  t1 fused    : 0.005536 ms
  t2 separate : 0.007841 ms  (Δ 0.002305 ms, speed‑up 1.42×)
  ── t2 breakdown ──
    normal  : 0.004608 ms
    Div     : 0.003233 ms

[41] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/0/attention/output/dense/MatMul
  t1 fused    : 0.009760 ms
  t2 separate : 0.012415 ms  (Δ 0.002655 ms, speed‑up 1.27×)
  ── t2 breakdown ──
    normal  : 0.009408 ms
    Add     : 0.003007 ms

[42] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/4/transformer/0/intermediate/dense/MatMul
  t1 fused    : 0.012704 ms
  t2 separate : 0.014720 ms  (Δ 0.002016 ms, speed‑up 1.16×)
  ── t2 breakdown ──
    normal  : 0.009536 ms
    Add     : 0.002944 ms
    Sigmoid : 0.001088 ms
    Mul     : 0.001152 ms

[43] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/0/output/dense/MatMul
  t1 fused    : 0.015584 ms
  t2 separate : 0.018848 ms  (Δ 0.003264 ms, speed‑up 1.21×)
  ── t2 breakdown ──
    normal  : 0.015296 ms
    Add     : 0.002528 ms
    Add     : 0.001024 ms

[44] div │ MATMUL │ /net/mobilevit/encoder/4/transformer/1/attention/attention/MatMul
  t1 fused    : 0.005568 ms
  t2 separate : 0.007777 ms  (Δ 0.002209 ms, speed‑up 1.40×)
  ── t2 breakdown ──
    normal  : 0.004641 ms
    Div     : 0.003136 ms

[45] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/1/attention/output/dense/MatMul
  t1 fused    : 0.009728 ms
  t2 separate : 0.012416 ms  (Δ 0.002688 ms, speed‑up 1.28×)
  ── t2 breakdown ──
    normal  : 0.009408 ms
    Add     : 0.003008 ms

[46] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/4/transformer/1/intermediate/dense/MatMul
  t1 fused    : 0.012672 ms
  t2 separate : 0.014495 ms  (Δ 0.001823 ms, speed‑up 1.14×)
  ── t2 breakdown ──
    normal  : 0.009568 ms
    Add     : 0.002751 ms
    Sigmoid : 0.001088 ms
    Mul     : 0.001088 ms

[47] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/1/output/dense/MatMul
  t1 fused    : 0.015552 ms
  t2 separate : 0.018784 ms  (Δ 0.003232 ms, speed‑up 1.21×)
  ── t2 breakdown ──
    normal  : 0.015232 ms
    Add     : 0.002560 ms
    Add     : 0.000992 ms

[48] div │ MATMUL │ /net/mobilevit/encoder/4/transformer/2/attention/attention/MatMul
  t1 fused    : 0.005568 ms
  t2 separate : 0.007744 ms  (Δ 0.002176 ms, speed‑up 1.39×)
  ── t2 breakdown ──
    normal  : 0.004608 ms
    Div     : 0.003136 ms

[49] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/2/attention/output/dense/MatMul
  t1 fused    : 0.009696 ms
  t2 separate : 0.012448 ms  (Δ 0.002752 ms, speed‑up 1.28×)
  ── t2 breakdown ──
    normal  : 0.009408 ms
    Add     : 0.003040 ms

[50] Add_Sig_Mul │ MATMUL │ /net/mobilevit/encoder/4/transformer/2/intermediate/dense/MatMul
  t1 fused    : 0.012704 ms
  t2 separate : 0.014431 ms  (Δ 0.001727 ms, speed‑up 1.14×)
  ── t2 breakdown ──
    normal  : 0.009535 ms
    Add     : 0.002752 ms
    Sigmoid : 0.001056 ms
    Mul     : 0.001088 ms

[51] double_add │ MATMUL │ /net/mobilevit/encoder/4/transformer/2/output/dense/MatMul
  t1 fused    : 0.015552 ms
  t2 separate : 0.018784 ms  (Δ 0.003232 ms, speed‑up 1.21×)
  ── t2 breakdown ──
    normal  : 0.015264 ms
    Add     : 0.002528 ms
    Add     : 0.000992 ms

[52] mobilevit_A │ CONV │ /net/mobilevit/encoder/4/conv_projection/convolution/Conv
  t1 fused    : 0.014528 ms
  t2 separate : 0.012672 ms  (Δ -0.001856 ms, speed‑up 0.87×)
  ── t2 breakdown ──
    normal  : 0.010656 ms
    Sigmoid : 0.001024 ms
    Mul     : 0.000992 ms

[53] mobilevit_A │ CONV │ /net/mobilevit/encoder/4/fusion/convolution/Conv
  t1 fused    : 0.081312 ms
  t2 separate : 0.079681 ms  (Δ -0.001631 ms, speed‑up 0.98×)
  ── t2 breakdown ──
    normal  : 0.077665 ms
    Sigmoid : 0.001024 ms
    Mul     : 0.000992 ms

[54] mobilevit_A │ CONV │ /net/mobilevit/conv_1x1_exp/convolution/Conv
  t1 fused    : 0.011744 ms
  t2 separate : 0.010240 ms  (Δ -0.001504 ms, speed‑up 0.87×)
  ── t2 breakdown ──
    normal  : 0.008064 ms
    Sigmoid : 0.001120 ms
    Mul     : 0.001056 ms

==============================================================================================================
► 每個『組合類型』總和
  mobilevit_A : fused 0.473505 ms   sep 0.487809 ms   Δ 0.014304 ms   speed‑up 1.03×
  div         : fused 0.050977 ms   sep 0.087874 ms   Δ 0.036897 ms   speed‑up 1.72×
  double_add  : fused 0.202849 ms   sep 0.258942 ms   Δ 0.056093 ms   speed‑up 1.28×
  Add_Sig_Mul : fused 0.104927 ms   sep 0.131966 ms   Δ 0.027039 ms   speed‑up 1.26×

==============================================================================================================
► 四組合累計 Σ
  Σ fused     : 0.832258 ms
  Σ separate  : 0.966591 ms
  Δ (sep‑fuse): 0.134333 ms   speed‑up 1.16×

==============================================================================================================
► 其餘 Run_with_type 彙總 (未納入組合)
  Add       : t1     0.079968 ms │ t2     0.093728 ms │ Δ     0.013760 ms
  Concat    : t1     0.007232 ms │ t2     0.008032 ms │ Δ     0.000800 ms
  Gemm      : t1     0.091424 ms │ t2     0.100064 ms │ Δ     0.008640 ms
  LayerNormalization: t1     1.993962 ms │ t2     1.998125 ms │ Δ     0.004163 ms
  Mul       : t1     0.015808 ms │ t2     0.015424 ms │ Δ    -0.000384 ms
  ReduceMean: t1     0.043969 ms │ t2     0.037824 ms │ Δ    -0.006145 ms
  Sigmoid   : t1     0.027776 ms │ t2     0.027745 ms │ Δ    -0.000031 ms
  Softmax   : t1     0.019488 ms │ t2     0.024128 ms │ Δ     0.004640 ms
  Transpose : t1     0.094146 ms │ t2     0.094497 ms │ Δ     0.000351 ms

==============================================================================================================
► 子‑op (含 normal) 跨全部組合累積時間
  normal  : 0.714083 ms
  Add     : 0.090684 ms
  Sigmoid : 0.069601 ms
  Mul     : 0.046941 ms
  Div     : 0.045282 ms

==============================================================================================================
► 整體檔案總執行時間
  t1 total : 3.518770 ms
  t2 total : 3.684593 ms
  Δ        : 0.165823 ms   speed‑up 1.05×

► CPU
t1:
Split                   9.0        0.276524        0.030725
Reshape                54.0        0.214966        0.003981
0.49149

t2:
Split                   9.0        0.250503        0.027834
Reshape                54.0        0.211196        0.003911
0.461699

► 整體檔案總執行時間 + CPU
  t1 total : 4.010260 ms
  t2 total : 4.146292 ms
  Δ        : 0.136032 ms   speed‑up 1.03×
Editor is loading...
Leave a Comment