Untitled
unknown
c_cpp
a year ago
116 kB
8
Indexable
Model Input Name: unique_ids_raw_output___9:0, Shape: [0]
Model Input Name: segment_ids:0, Shape: [0, 256]
Model Input Name: input_mask:0, Shape: [0, 256]
Model Input Name: input_ids:0, Shape: [0, 256]
Starting model execution...
Inputs Details:
Input Name: input_ids:0
Shape: (1, 256)
Data (first 10 values): [ 101 2054 2003 1996 3007 1997 2605 1029 102 1996]...
--------------------------------------------------
Input Name: segment_ids:0
Shape: (1, 256)
Data (first 10 values): [0 0 0 0 0 0 0 0 0 1]...
--------------------------------------------------
Input Name: input_mask:0
Shape: (1, 256)
Data (first 10 values): [1 1 1 1 1 1 1 1 1 1]...
--------------------------------------------------
Input Name: unique_ids_raw_output___9:0
Shape: (1,)
Data (first 10 values): [0]...
--------------------------------------------------
Node: unique_ids_graph_outputs_Identity__10, Execution Time: 0.000511 seconds
Node: bert/encoder/Shape, Execution Time: 0.000030 seconds
Node: bert/encoder/Shape__12, Execution Time: 0.000038 seconds
Node: bert/encoder/strided_slice, Execution Time: 0.000173 seconds
Node: bert/encoder/strided_slice__16, Execution Time: 0.000029 seconds
Node: bert/encoder/strided_slice__17, Execution Time: 0.000020 seconds
Node: bert/encoder/ones/packed_Unsqueeze__18, Execution Time: 0.000035 seconds
Node: bert/encoder/ones/packed_Concat__21, Execution Time: 0.004840 seconds
Node: bert/encoder/ones__22, Execution Time: 0.000027 seconds
Node: bert/encoder/ones, Execution Time: 0.000075 seconds
Node: bert/encoder/Reshape, Execution Time: 0.000039 seconds
Node: bert/encoder/Cast, Execution Time: 0.000020 seconds
Node: bert/encoder/mul, Execution Time: 0.007645 seconds
Node: bert/encoder/layer_9/attention/self/ExpandDims, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_9/attention/self/sub, Execution Time: 0.006671 seconds
Node: bert/encoder/layer_9/attention/self/mul_1, Execution Time: 0.000213 seconds
Node: bert/embeddings/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/embeddings/Reshape, Execution Time: 0.000005 seconds
Node: bert/embeddings/GatherV2, Execution Time: 0.000162 seconds
Node: bert/embeddings/Reshape_1, Execution Time: 0.000020 seconds
Node: bert/embeddings/one_hot, Execution Time: 0.000219 seconds
Input size: (None, 256, 2, 768)
No Add node related to MatMul output: bert/embeddings/MatMul. Executing regular MatMul.
MatMul Node: bert/embeddings/MatMul, Execution Time: 0.027465 seconds
Node: bert/embeddings/Reshape_3, Execution Time: 0.000025 seconds
Add Node: bert/embeddings/add, Execution Time: 0.000611 seconds
Add Node: bert/embeddings/add_1, Execution Time: 0.000467 seconds
Node: bert/embeddings/LayerNorm/moments/mean, Execution Time: 0.005089 seconds
Node: bert/embeddings/LayerNorm/moments/SquaredDifference, Execution Time: 0.000502 seconds
Node: bert/embeddings/LayerNorm/moments/SquaredDifference__72, Execution Time: 0.000517 seconds
Node: bert/embeddings/LayerNorm/moments/variance, Execution Time: 0.000074 seconds
Add Node: bert/embeddings/LayerNorm/batchnorm/add, Execution Time: 0.000064 seconds
Node: bert/embeddings/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.010280 seconds
Node: bert/embeddings/LayerNorm/batchnorm/Rsqrt__74, Execution Time: 0.005450 seconds
Node: bert/embeddings/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds
Node: bert/embeddings/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds
Node: bert/embeddings/LayerNorm/batchnorm/sub, Execution Time: 0.000069 seconds
Node: bert/embeddings/LayerNorm/batchnorm/mul_1, Execution Time: 0.000455 seconds
Add Node: bert/embeddings/LayerNorm/batchnorm/add_1, Execution Time: 0.000453 seconds
Node: bert/encoder/Reshape_1, Execution Time: 0.000024 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_0/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_0/attention/self/value/MatMul, Execution Time: 0.001809 seconds
Skipping already processed Node: bert/encoder/layer_0/attention/self/value/BiasAdd
Node: bert/encoder/layer_0/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_0/attention/self/transpose_2, Execution Time: 0.000505 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_0/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_0/attention/self/query/MatMul, Execution Time: 0.000672 seconds
Skipping already processed Node: bert/encoder/layer_0/attention/self/query/BiasAdd
Node: bert/encoder/layer_0/attention/self/Reshape, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_0/attention/self/transpose, Execution Time: 0.000450 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_0/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_0/attention/self/key/MatMul, Execution Time: 0.000619 seconds
Skipping already processed Node: bert/encoder/layer_0/attention/self/key/BiasAdd
Node: bert/encoder/layer_0/attention/self/Reshape_1, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_0/attention/self/MatMul__306, Execution Time: 0.000444 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/MatMul, Execution Time: 0.001491 seconds
Node: bert/encoder/layer_0/attention/self/Mul, Execution Time: 0.001327 seconds
Add Node: bert/encoder/layer_0/attention/self/add, Execution Time: 0.001349 seconds
Node: bert/encoder/layer_0/attention/self/Softmax, Execution Time: 0.009065 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/MatMul_1, Execution Time: 0.000635 seconds
Node: bert/encoder/layer_0/attention/self/transpose_3, Execution Time: 0.000550 seconds
Node: bert/encoder/layer_0/attention/self/Reshape_3, Execution Time: 0.000058 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_0/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_0/attention/output/dense/MatMul, Execution Time: 0.001760 seconds
Skipping already processed Node: bert/encoder/layer_0/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_0/attention/output/add
Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/mean, Execution Time: 0.000082 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000634 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference__309, Execution Time: 0.000473 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt__311, Execution Time: 0.000068 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000464 seconds
Add Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000457 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_0/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_0/intermediate/dense/MatMul, Execution Time: 0.000690 seconds
Skipping already processed Node: bert/encoder/layer_0/intermediate/dense/BiasAdd
Node: bert/encoder/layer_0/intermediate/dense/Pow, Execution Time: 0.018049 seconds
Node: bert/encoder/layer_0/intermediate/dense/mul, Execution Time: 0.001407 seconds
Add Node: bert/encoder/layer_0/intermediate/dense/add, Execution Time: 0.001314 seconds
Node: bert/encoder/layer_0/intermediate/dense/mul_1, Execution Time: 0.001507 seconds
Node: bert/encoder/layer_0/intermediate/dense/Tanh, Execution Time: 0.003959 seconds
Add Node: bert/encoder/layer_0/intermediate/dense/add_1, Execution Time: 0.001380 seconds
Node: bert/encoder/layer_0/intermediate/dense/mul_2, Execution Time: 0.001314 seconds
Node: bert/encoder/layer_0/intermediate/dense/mul_3, Execution Time: 0.001374 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_0/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_0/output/dense/MatMul, Execution Time: 0.001047 seconds
Skipping already processed Node: bert/encoder/layer_0/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_0/output/add
Node: bert/encoder/layer_0/output/LayerNorm/moments/mean, Execution Time: 0.000100 seconds
Node: bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000494 seconds
Node: bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference__313, Execution Time: 0.000547 seconds
Node: bert/encoder/layer_0/output/LayerNorm/moments/variance, Execution Time: 0.000057 seconds
Add Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/add, Execution Time: 0.000063 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt__315, Execution Time: 0.000076 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000486 seconds
Add Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000471 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_1/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_1/attention/self/value/MatMul, Execution Time: 0.000654 seconds
Skipping already processed Node: bert/encoder/layer_1/attention/self/value/BiasAdd
Node: bert/encoder/layer_1/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_1/attention/self/transpose_2, Execution Time: 0.000449 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_1/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_1/attention/self/query/MatMul, Execution Time: 0.000632 seconds
Skipping already processed Node: bert/encoder/layer_1/attention/self/query/BiasAdd
Node: bert/encoder/layer_1/attention/self/Reshape, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_1/attention/self/transpose, Execution Time: 0.000474 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_1/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_1/attention/self/key/MatMul, Execution Time: 0.000604 seconds
Skipping already processed Node: bert/encoder/layer_1/attention/self/key/BiasAdd
Node: bert/encoder/layer_1/attention/self/Reshape_1, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_1/attention/self/MatMul__320, Execution Time: 0.000483 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/MatMul, Execution Time: 0.000508 seconds
Node: bert/encoder/layer_1/attention/self/Mul, Execution Time: 0.001349 seconds
Add Node: bert/encoder/layer_1/attention/self/add, Execution Time: 0.001579 seconds
Node: bert/encoder/layer_1/attention/self/Softmax, Execution Time: 0.001335 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/MatMul_1, Execution Time: 0.000563 seconds
Node: bert/encoder/layer_1/attention/self/transpose_3, Execution Time: 0.000447 seconds
Node: bert/encoder/layer_1/attention/self/Reshape_3, Execution Time: 0.000047 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_1/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_1/attention/output/dense/MatMul, Execution Time: 0.000678 seconds
Skipping already processed Node: bert/encoder/layer_1/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_1/attention/output/add
Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/mean, Execution Time: 0.000081 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000606 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/SquaredDifference__323, Execution Time: 0.000474 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds
Add Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/Rsqrt__325, Execution Time: 0.000074 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000041 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000466 seconds
Add Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000446 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_1/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_1/intermediate/dense/MatMul, Execution Time: 0.000661 seconds
Skipping already processed Node: bert/encoder/layer_1/intermediate/dense/BiasAdd
Node: bert/encoder/layer_1/intermediate/dense/Pow, Execution Time: 0.001371 seconds
Node: bert/encoder/layer_1/intermediate/dense/mul, Execution Time: 0.001382 seconds
Add Node: bert/encoder/layer_1/intermediate/dense/add, Execution Time: 0.001623 seconds
Node: bert/encoder/layer_1/intermediate/dense/mul_1, Execution Time: 0.001303 seconds
Node: bert/encoder/layer_1/intermediate/dense/Tanh, Execution Time: 0.001375 seconds
Add Node: bert/encoder/layer_1/intermediate/dense/add_1, Execution Time: 0.001320 seconds
Node: bert/encoder/layer_1/intermediate/dense/mul_2, Execution Time: 0.001378 seconds
Node: bert/encoder/layer_1/intermediate/dense/mul_3, Execution Time: 0.001307 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_1/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_1/output/dense/MatMul, Execution Time: 0.001064 seconds
Skipping already processed Node: bert/encoder/layer_1/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_1/output/add
Node: bert/encoder/layer_1/output/LayerNorm/moments/mean, Execution Time: 0.000084 seconds
Node: bert/encoder/layer_1/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000484 seconds
Node: bert/encoder/layer_1/output/LayerNorm/moments/SquaredDifference__327, Execution Time: 0.000571 seconds
Node: bert/encoder/layer_1/output/LayerNorm/moments/variance, Execution Time: 0.000056 seconds
Add Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/add, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/Rsqrt__329, Execution Time: 0.000080 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000042 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/sub, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000450 seconds
Add Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000466 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_2/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_2/attention/self/value/MatMul, Execution Time: 0.000678 seconds
Skipping already processed Node: bert/encoder/layer_2/attention/self/value/BiasAdd
Node: bert/encoder/layer_2/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_2/attention/self/transpose_2, Execution Time: 0.000461 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_2/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_2/attention/self/query/MatMul, Execution Time: 0.000645 seconds
Skipping already processed Node: bert/encoder/layer_2/attention/self/query/BiasAdd
Node: bert/encoder/layer_2/attention/self/Reshape, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_2/attention/self/transpose, Execution Time: 0.000476 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_2/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_2/attention/self/key/MatMul, Execution Time: 0.000615 seconds
Skipping already processed Node: bert/encoder/layer_2/attention/self/key/BiasAdd
Node: bert/encoder/layer_2/attention/self/Reshape_1, Execution Time: 0.000008 seconds
Node: bert/encoder/layer_2/attention/self/MatMul__334, Execution Time: 0.000464 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/MatMul, Execution Time: 0.000499 seconds
Node: bert/encoder/layer_2/attention/self/Mul, Execution Time: 0.001384 seconds
Add Node: bert/encoder/layer_2/attention/self/add, Execution Time: 0.001380 seconds
Node: bert/encoder/layer_2/attention/self/Softmax, Execution Time: 0.001305 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/MatMul_1, Execution Time: 0.000562 seconds
Node: bert/encoder/layer_2/attention/self/transpose_3, Execution Time: 0.000456 seconds
Node: bert/encoder/layer_2/attention/self/Reshape_3, Execution Time: 0.000037 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_2/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_2/attention/output/dense/MatMul, Execution Time: 0.000755 seconds
Skipping already processed Node: bert/encoder/layer_2/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_2/attention/output/add
Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/mean, Execution Time: 0.000100 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000583 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/SquaredDifference__337, Execution Time: 0.000602 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/variance, Execution Time: 0.000071 seconds
Add Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000078 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/Rsqrt__339, Execution Time: 0.000089 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000042 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000518 seconds
Add Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000451 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_2/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_2/intermediate/dense/MatMul, Execution Time: 0.000782 seconds
Skipping already processed Node: bert/encoder/layer_2/intermediate/dense/BiasAdd
Node: bert/encoder/layer_2/intermediate/dense/Pow, Execution Time: 0.001319 seconds
Node: bert/encoder/layer_2/intermediate/dense/mul, Execution Time: 0.001400 seconds
Add Node: bert/encoder/layer_2/intermediate/dense/add, Execution Time: 0.001352 seconds
Node: bert/encoder/layer_2/intermediate/dense/mul_1, Execution Time: 0.001411 seconds
Node: bert/encoder/layer_2/intermediate/dense/Tanh, Execution Time: 0.001316 seconds
Add Node: bert/encoder/layer_2/intermediate/dense/add_1, Execution Time: 0.001329 seconds
Node: bert/encoder/layer_2/intermediate/dense/mul_2, Execution Time: 0.001370 seconds
Node: bert/encoder/layer_2/intermediate/dense/mul_3, Execution Time: 0.001295 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_2/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_2/output/dense/MatMul, Execution Time: 0.000986 seconds
Skipping already processed Node: bert/encoder/layer_2/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_2/output/add
Node: bert/encoder/layer_2/output/LayerNorm/moments/mean, Execution Time: 0.000085 seconds
Node: bert/encoder/layer_2/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000505 seconds
Node: bert/encoder/layer_2/output/LayerNorm/moments/SquaredDifference__341, Execution Time: 0.000457 seconds
Node: bert/encoder/layer_2/output/LayerNorm/moments/variance, Execution Time: 0.000055 seconds
Add Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/add, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000070 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/Rsqrt__343, Execution Time: 0.000066 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/sub, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000513 seconds
Add Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000452 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_3/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_3/attention/self/value/MatMul, Execution Time: 0.000684 seconds
Skipping already processed Node: bert/encoder/layer_3/attention/self/value/BiasAdd
Node: bert/encoder/layer_3/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_3/attention/self/transpose_2, Execution Time: 0.000478 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_3/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_3/attention/self/query/MatMul, Execution Time: 0.000721 seconds
Skipping already processed Node: bert/encoder/layer_3/attention/self/query/BiasAdd
Node: bert/encoder/layer_3/attention/self/Reshape, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_3/attention/self/transpose, Execution Time: 0.000443 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_3/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_3/attention/self/key/MatMul, Execution Time: 0.000608 seconds
Skipping already processed Node: bert/encoder/layer_3/attention/self/key/BiasAdd
Node: bert/encoder/layer_3/attention/self/Reshape_1, Execution Time: 0.000007 seconds
Node: bert/encoder/layer_3/attention/self/MatMul__348, Execution Time: 0.000437 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/MatMul, Execution Time: 0.000544 seconds
Node: bert/encoder/layer_3/attention/self/Mul, Execution Time: 0.001320 seconds
Add Node: bert/encoder/layer_3/attention/self/add, Execution Time: 0.001428 seconds
Node: bert/encoder/layer_3/attention/self/Softmax, Execution Time: 0.001303 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/MatMul_1, Execution Time: 0.000561 seconds
Node: bert/encoder/layer_3/attention/self/transpose_3, Execution Time: 0.000469 seconds
Node: bert/encoder/layer_3/attention/self/Reshape_3, Execution Time: 0.000038 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_3/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_3/attention/output/dense/MatMul, Execution Time: 0.000677 seconds
Skipping already processed Node: bert/encoder/layer_3/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_3/attention/output/add
Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/mean, Execution Time: 0.000088 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000476 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/SquaredDifference__351, Execution Time: 0.000554 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/variance, Execution Time: 0.000055 seconds
Add Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/Rsqrt__353, Execution Time: 0.000072 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000458 seconds
Add Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000449 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_3/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_3/intermediate/dense/MatMul, Execution Time: 0.000654 seconds
Skipping already processed Node: bert/encoder/layer_3/intermediate/dense/BiasAdd
Node: bert/encoder/layer_3/intermediate/dense/Pow, Execution Time: 0.001374 seconds
Node: bert/encoder/layer_3/intermediate/dense/mul, Execution Time: 0.001344 seconds
Add Node: bert/encoder/layer_3/intermediate/dense/add, Execution Time: 0.001312 seconds
Node: bert/encoder/layer_3/intermediate/dense/mul_1, Execution Time: 0.001383 seconds
Node: bert/encoder/layer_3/intermediate/dense/Tanh, Execution Time: 0.001316 seconds
Add Node: bert/encoder/layer_3/intermediate/dense/add_1, Execution Time: 0.001338 seconds
Node: bert/encoder/layer_3/intermediate/dense/mul_2, Execution Time: 0.001379 seconds
Node: bert/encoder/layer_3/intermediate/dense/mul_3, Execution Time: 0.001310 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_3/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_3/output/dense/MatMul, Execution Time: 0.000992 seconds
Skipping already processed Node: bert/encoder/layer_3/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_3/output/add
Node: bert/encoder/layer_3/output/LayerNorm/moments/mean, Execution Time: 0.000085 seconds
Node: bert/encoder/layer_3/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000485 seconds
Node: bert/encoder/layer_3/output/LayerNorm/moments/SquaredDifference__355, Execution Time: 0.000449 seconds
Node: bert/encoder/layer_3/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/add, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/Rsqrt__357, Execution Time: 0.000070 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul, Execution Time: 0.000061 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000545 seconds
Add Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000445 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_4/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_4/attention/self/value/MatMul, Execution Time: 0.000668 seconds
Skipping already processed Node: bert/encoder/layer_4/attention/self/value/BiasAdd
Node: bert/encoder/layer_4/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_4/attention/self/transpose_2, Execution Time: 0.000548 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_4/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_4/attention/self/query/MatMul, Execution Time: 0.000681 seconds
Skipping already processed Node: bert/encoder/layer_4/attention/self/query/BiasAdd
Node: bert/encoder/layer_4/attention/self/Reshape, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_4/attention/self/transpose, Execution Time: 0.000567 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_4/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_4/attention/self/key/MatMul, Execution Time: 0.000655 seconds
Skipping already processed Node: bert/encoder/layer_4/attention/self/key/BiasAdd
Node: bert/encoder/layer_4/attention/self/Reshape_1, Execution Time: 0.000007 seconds
Node: bert/encoder/layer_4/attention/self/MatMul__362, Execution Time: 0.000541 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/MatMul, Execution Time: 0.000483 seconds
Node: bert/encoder/layer_4/attention/self/Mul, Execution Time: 0.001326 seconds
Add Node: bert/encoder/layer_4/attention/self/add, Execution Time: 0.001472 seconds
Node: bert/encoder/layer_4/attention/self/Softmax, Execution Time: 0.001326 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/MatMul_1, Execution Time: 0.000573 seconds
Node: bert/encoder/layer_4/attention/self/transpose_3, Execution Time: 0.000484 seconds
Node: bert/encoder/layer_4/attention/self/Reshape_3, Execution Time: 0.000037 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_4/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_4/attention/output/dense/MatMul, Execution Time: 0.000743 seconds
Skipping already processed Node: bert/encoder/layer_4/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_4/attention/output/add
Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/mean, Execution Time: 0.000082 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000565 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/SquaredDifference__365, Execution Time: 0.000463 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/variance, Execution Time: 0.000060 seconds
Add Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000048 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/Rsqrt__367, Execution Time: 0.000067 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000457 seconds
Add Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000459 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_4/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_4/intermediate/dense/MatMul, Execution Time: 0.000646 seconds
Skipping already processed Node: bert/encoder/layer_4/intermediate/dense/BiasAdd
Node: bert/encoder/layer_4/intermediate/dense/Pow, Execution Time: 0.001339 seconds
Node: bert/encoder/layer_4/intermediate/dense/mul, Execution Time: 0.001356 seconds
Add Node: bert/encoder/layer_4/intermediate/dense/add, Execution Time: 0.001398 seconds
Node: bert/encoder/layer_4/intermediate/dense/mul_1, Execution Time: 0.001317 seconds
Node: bert/encoder/layer_4/intermediate/dense/Tanh, Execution Time: 0.001311 seconds
Add Node: bert/encoder/layer_4/intermediate/dense/add_1, Execution Time: 0.001370 seconds
Node: bert/encoder/layer_4/intermediate/dense/mul_2, Execution Time: 0.001508 seconds
Node: bert/encoder/layer_4/intermediate/dense/mul_3, Execution Time: 0.001303 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_4/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_4/output/dense/MatMul, Execution Time: 0.000987 seconds
Skipping already processed Node: bert/encoder/layer_4/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_4/output/add
Node: bert/encoder/layer_4/output/LayerNorm/moments/mean, Execution Time: 0.000072 seconds
Node: bert/encoder/layer_4/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000470 seconds
Node: bert/encoder/layer_4/output/LayerNorm/moments/SquaredDifference__369, Execution Time: 0.000466 seconds
Node: bert/encoder/layer_4/output/LayerNorm/moments/variance, Execution Time: 0.000052 seconds
Add Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/add, Execution Time: 0.000048 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/Rsqrt__371, Execution Time: 0.000066 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000466 seconds
Add Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000463 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_5/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_5/attention/self/value/MatMul, Execution Time: 0.001840 seconds
Skipping already processed Node: bert/encoder/layer_5/attention/self/value/BiasAdd
Node: bert/encoder/layer_5/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_5/attention/self/transpose_2, Execution Time: 0.000459 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_5/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_5/attention/self/query/MatMul, Execution Time: 0.000622 seconds
Skipping already processed Node: bert/encoder/layer_5/attention/self/query/BiasAdd
Node: bert/encoder/layer_5/attention/self/Reshape, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_5/attention/self/transpose, Execution Time: 0.000436 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_5/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_5/attention/self/key/MatMul, Execution Time: 0.000607 seconds
Skipping already processed Node: bert/encoder/layer_5/attention/self/key/BiasAdd
Node: bert/encoder/layer_5/attention/self/Reshape_1, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_5/attention/self/MatMul__376, Execution Time: 0.000448 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/MatMul, Execution Time: 0.000485 seconds
Node: bert/encoder/layer_5/attention/self/Mul, Execution Time: 0.001392 seconds
Add Node: bert/encoder/layer_5/attention/self/add, Execution Time: 0.001310 seconds
Node: bert/encoder/layer_5/attention/self/Softmax, Execution Time: 0.001333 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/MatMul_1, Execution Time: 0.000640 seconds
Node: bert/encoder/layer_5/attention/self/transpose_3, Execution Time: 0.000455 seconds
Node: bert/encoder/layer_5/attention/self/Reshape_3, Execution Time: 0.000037 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_5/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_5/attention/output/dense/MatMul, Execution Time: 0.000660 seconds
Skipping already processed Node: bert/encoder/layer_5/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_5/attention/output/add
Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/mean, Execution Time: 0.000081 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000477 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/SquaredDifference__379, Execution Time: 0.000461 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds
Add Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000048 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/Rsqrt__381, Execution Time: 0.000068 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000063 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000468 seconds
Add Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000451 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_5/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_5/intermediate/dense/MatMul, Execution Time: 0.000666 seconds
Skipping already processed Node: bert/encoder/layer_5/intermediate/dense/BiasAdd
Node: bert/encoder/layer_5/intermediate/dense/Pow, Execution Time: 0.001391 seconds
Node: bert/encoder/layer_5/intermediate/dense/mul, Execution Time: 0.001312 seconds
Add Node: bert/encoder/layer_5/intermediate/dense/add, Execution Time: 0.001391 seconds
Node: bert/encoder/layer_5/intermediate/dense/mul_1, Execution Time: 0.001297 seconds
Node: bert/encoder/layer_5/intermediate/dense/Tanh, Execution Time: 0.001306 seconds
Add Node: bert/encoder/layer_5/intermediate/dense/add_1, Execution Time: 0.001386 seconds
Node: bert/encoder/layer_5/intermediate/dense/mul_2, Execution Time: 0.001291 seconds
Node: bert/encoder/layer_5/intermediate/dense/mul_3, Execution Time: 0.001279 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_5/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_5/output/dense/MatMul, Execution Time: 0.001012 seconds
Skipping already processed Node: bert/encoder/layer_5/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_5/output/add
Node: bert/encoder/layer_5/output/LayerNorm/moments/mean, Execution Time: 0.000083 seconds
Node: bert/encoder/layer_5/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000461 seconds
Node: bert/encoder/layer_5/output/LayerNorm/moments/SquaredDifference__383, Execution Time: 0.000457 seconds
Node: bert/encoder/layer_5/output/LayerNorm/moments/variance, Execution Time: 0.000056 seconds
Add Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/add, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/Rsqrt__385, Execution Time: 0.000066 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000465 seconds
Add Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000463 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_6/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_6/attention/self/value/MatMul, Execution Time: 0.000639 seconds
Skipping already processed Node: bert/encoder/layer_6/attention/self/value/BiasAdd
Node: bert/encoder/layer_6/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_6/attention/self/transpose_2, Execution Time: 0.000466 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_6/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_6/attention/self/query/MatMul, Execution Time: 0.000643 seconds
Skipping already processed Node: bert/encoder/layer_6/attention/self/query/BiasAdd
Node: bert/encoder/layer_6/attention/self/Reshape, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_6/attention/self/transpose, Execution Time: 0.000510 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_6/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_6/attention/self/key/MatMul, Execution Time: 0.000669 seconds
Skipping already processed Node: bert/encoder/layer_6/attention/self/key/BiasAdd
Node: bert/encoder/layer_6/attention/self/Reshape_1, Execution Time: 0.000008 seconds
Node: bert/encoder/layer_6/attention/self/MatMul__390, Execution Time: 0.000553 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/MatMul, Execution Time: 0.000546 seconds
Node: bert/encoder/layer_6/attention/self/Mul, Execution Time: 0.002146 seconds
Add Node: bert/encoder/layer_6/attention/self/add, Execution Time: 0.001294 seconds
Node: bert/encoder/layer_6/attention/self/Softmax, Execution Time: 0.001295 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/MatMul_1, Execution Time: 0.000554 seconds
Node: bert/encoder/layer_6/attention/self/transpose_3, Execution Time: 0.000507 seconds
Node: bert/encoder/layer_6/attention/self/Reshape_3, Execution Time: 0.000047 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_6/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_6/attention/output/dense/MatMul, Execution Time: 0.000683 seconds
Skipping already processed Node: bert/encoder/layer_6/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_6/attention/output/add
Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/mean, Execution Time: 0.000087 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000460 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/SquaredDifference__393, Execution Time: 0.000455 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/variance, Execution Time: 0.000062 seconds
Add Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/Rsqrt__395, Execution Time: 0.000072 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000443 seconds
Add Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000454 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_6/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_6/intermediate/dense/MatMul, Execution Time: 0.000655 seconds
Skipping already processed Node: bert/encoder/layer_6/intermediate/dense/BiasAdd
Node: bert/encoder/layer_6/intermediate/dense/Pow, Execution Time: 0.001311 seconds
Node: bert/encoder/layer_6/intermediate/dense/mul, Execution Time: 0.001315 seconds
Add Node: bert/encoder/layer_6/intermediate/dense/add, Execution Time: 0.001377 seconds
Node: bert/encoder/layer_6/intermediate/dense/mul_1, Execution Time: 0.001305 seconds
Node: bert/encoder/layer_6/intermediate/dense/Tanh, Execution Time: 0.001307 seconds
Add Node: bert/encoder/layer_6/intermediate/dense/add_1, Execution Time: 0.001387 seconds
Node: bert/encoder/layer_6/intermediate/dense/mul_2, Execution Time: 0.001303 seconds
Node: bert/encoder/layer_6/intermediate/dense/mul_3, Execution Time: 0.001365 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_6/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_6/output/dense/MatMul, Execution Time: 0.000988 seconds
Skipping already processed Node: bert/encoder/layer_6/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_6/output/add
Node: bert/encoder/layer_6/output/LayerNorm/moments/mean, Execution Time: 0.000092 seconds
Node: bert/encoder/layer_6/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000490 seconds
Node: bert/encoder/layer_6/output/LayerNorm/moments/SquaredDifference__397, Execution Time: 0.000460 seconds
Node: bert/encoder/layer_6/output/LayerNorm/moments/variance, Execution Time: 0.000055 seconds
Add Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/add, Execution Time: 0.000063 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/Rsqrt__399, Execution Time: 0.000071 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul, Execution Time: 0.000063 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000045 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000481 seconds
Add Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000447 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_7/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_7/attention/self/value/MatMul, Execution Time: 0.000656 seconds
Skipping already processed Node: bert/encoder/layer_7/attention/self/value/BiasAdd
Node: bert/encoder/layer_7/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_7/attention/self/transpose_2, Execution Time: 0.000444 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_7/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_7/attention/self/query/MatMul, Execution Time: 0.000674 seconds
Skipping already processed Node: bert/encoder/layer_7/attention/self/query/BiasAdd
Node: bert/encoder/layer_7/attention/self/Reshape, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_7/attention/self/transpose, Execution Time: 0.000441 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_7/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_7/attention/self/key/MatMul, Execution Time: 0.000600 seconds
Skipping already processed Node: bert/encoder/layer_7/attention/self/key/BiasAdd
Node: bert/encoder/layer_7/attention/self/Reshape_1, Execution Time: 0.000008 seconds
Node: bert/encoder/layer_7/attention/self/MatMul__404, Execution Time: 0.000440 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/MatMul, Execution Time: 0.000509 seconds
Node: bert/encoder/layer_7/attention/self/Mul, Execution Time: 0.001363 seconds
Add Node: bert/encoder/layer_7/attention/self/add, Execution Time: 0.001514 seconds
Node: bert/encoder/layer_7/attention/self/Softmax, Execution Time: 0.001384 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/MatMul_1, Execution Time: 0.000567 seconds
Node: bert/encoder/layer_7/attention/self/transpose_3, Execution Time: 0.000458 seconds
Node: bert/encoder/layer_7/attention/self/Reshape_3, Execution Time: 0.000047 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_7/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_7/attention/output/dense/MatMul, Execution Time: 0.000650 seconds
Skipping already processed Node: bert/encoder/layer_7/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_7/attention/output/add
Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/mean, Execution Time: 0.000081 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000473 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/SquaredDifference__407, Execution Time: 0.000465 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds
Add Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000045 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/Rsqrt__409, Execution Time: 0.000066 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000451 seconds
Add Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000458 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_7/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_7/intermediate/dense/MatMul, Execution Time: 0.000650 seconds
Skipping already processed Node: bert/encoder/layer_7/intermediate/dense/BiasAdd
Node: bert/encoder/layer_7/intermediate/dense/Pow, Execution Time: 0.001369 seconds
Node: bert/encoder/layer_7/intermediate/dense/mul, Execution Time: 0.001377 seconds
Add Node: bert/encoder/layer_7/intermediate/dense/add, Execution Time: 0.001498 seconds
Node: bert/encoder/layer_7/intermediate/dense/mul_1, Execution Time: 0.001320 seconds
Node: bert/encoder/layer_7/intermediate/dense/Tanh, Execution Time: 0.001377 seconds
Add Node: bert/encoder/layer_7/intermediate/dense/add_1, Execution Time: 0.001314 seconds
Node: bert/encoder/layer_7/intermediate/dense/mul_2, Execution Time: 0.001305 seconds
Node: bert/encoder/layer_7/intermediate/dense/mul_3, Execution Time: 0.002071 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_7/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_7/output/dense/MatMul, Execution Time: 0.001035 seconds
Skipping already processed Node: bert/encoder/layer_7/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_7/output/add
Node: bert/encoder/layer_7/output/LayerNorm/moments/mean, Execution Time: 0.000083 seconds
Node: bert/encoder/layer_7/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000452 seconds
Node: bert/encoder/layer_7/output/LayerNorm/moments/SquaredDifference__411, Execution Time: 0.000452 seconds
Node: bert/encoder/layer_7/output/LayerNorm/moments/variance, Execution Time: 0.000056 seconds
Add Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000045 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/Rsqrt__413, Execution Time: 0.000071 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000450 seconds
Add Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000447 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_8/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_8/attention/self/value/MatMul, Execution Time: 0.000658 seconds
Skipping already processed Node: bert/encoder/layer_8/attention/self/value/BiasAdd
Node: bert/encoder/layer_8/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_8/attention/self/transpose_2, Execution Time: 0.000448 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_8/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_8/attention/self/query/MatMul, Execution Time: 0.000630 seconds
Skipping already processed Node: bert/encoder/layer_8/attention/self/query/BiasAdd
Node: bert/encoder/layer_8/attention/self/Reshape, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_8/attention/self/transpose, Execution Time: 0.000449 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_8/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_8/attention/self/key/MatMul, Execution Time: 0.000614 seconds
Skipping already processed Node: bert/encoder/layer_8/attention/self/key/BiasAdd
Node: bert/encoder/layer_8/attention/self/Reshape_1, Execution Time: 0.000008 seconds
Node: bert/encoder/layer_8/attention/self/MatMul__418, Execution Time: 0.000443 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/MatMul, Execution Time: 0.000495 seconds
Node: bert/encoder/layer_8/attention/self/Mul, Execution Time: 0.001312 seconds
Add Node: bert/encoder/layer_8/attention/self/add, Execution Time: 0.001359 seconds
Node: bert/encoder/layer_8/attention/self/Softmax, Execution Time: 0.001416 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/MatMul_1, Execution Time: 0.000587 seconds
Node: bert/encoder/layer_8/attention/self/transpose_3, Execution Time: 0.000445 seconds
Node: bert/encoder/layer_8/attention/self/Reshape_3, Execution Time: 0.000051 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_8/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_8/attention/output/dense/MatMul, Execution Time: 0.000746 seconds
Skipping already processed Node: bert/encoder/layer_8/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_8/attention/output/add
Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/mean, Execution Time: 0.000085 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000469 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/SquaredDifference__421, Execution Time: 0.000466 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000063 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/Rsqrt__423, Execution Time: 0.000066 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000059 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000446 seconds
Add Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000448 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_8/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_8/intermediate/dense/MatMul, Execution Time: 0.000650 seconds
Skipping already processed Node: bert/encoder/layer_8/intermediate/dense/BiasAdd
Node: bert/encoder/layer_8/intermediate/dense/Pow, Execution Time: 0.001652 seconds
Node: bert/encoder/layer_8/intermediate/dense/mul, Execution Time: 0.001383 seconds
Add Node: bert/encoder/layer_8/intermediate/dense/add, Execution Time: 0.001327 seconds
Node: bert/encoder/layer_8/intermediate/dense/mul_1, Execution Time: 0.001308 seconds
Node: bert/encoder/layer_8/intermediate/dense/Tanh, Execution Time: 0.001390 seconds
Add Node: bert/encoder/layer_8/intermediate/dense/add_1, Execution Time: 0.001313 seconds
Node: bert/encoder/layer_8/intermediate/dense/mul_2, Execution Time: 0.001375 seconds
Node: bert/encoder/layer_8/intermediate/dense/mul_3, Execution Time: 0.001365 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_8/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_8/output/dense/MatMul, Execution Time: 0.000986 seconds
Skipping already processed Node: bert/encoder/layer_8/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_8/output/add
Node: bert/encoder/layer_8/output/LayerNorm/moments/mean, Execution Time: 0.000085 seconds
Node: bert/encoder/layer_8/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000489 seconds
Node: bert/encoder/layer_8/output/LayerNorm/moments/SquaredDifference__425, Execution Time: 0.000483 seconds
Node: bert/encoder/layer_8/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/add, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/Rsqrt__427, Execution Time: 0.000073 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000444 seconds
Add Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000456 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_9/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_9/attention/self/value/MatMul, Execution Time: 0.000708 seconds
Skipping already processed Node: bert/encoder/layer_9/attention/self/value/BiasAdd
Node: bert/encoder/layer_9/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_9/attention/self/transpose_2, Execution Time: 0.000458 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_9/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_9/attention/self/query/MatMul, Execution Time: 0.000642 seconds
Skipping already processed Node: bert/encoder/layer_9/attention/self/query/BiasAdd
Node: bert/encoder/layer_9/attention/self/Reshape, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_9/attention/self/transpose, Execution Time: 0.000452 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_9/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_9/attention/self/key/MatMul, Execution Time: 0.000621 seconds
Skipping already processed Node: bert/encoder/layer_9/attention/self/key/BiasAdd
Node: bert/encoder/layer_9/attention/self/Reshape_1, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_9/attention/self/MatMul__432, Execution Time: 0.000462 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/MatMul, Execution Time: 0.000492 seconds
Node: bert/encoder/layer_9/attention/self/Mul, Execution Time: 0.001414 seconds
Add Node: bert/encoder/layer_9/attention/self/add, Execution Time: 0.001318 seconds
Node: bert/encoder/layer_9/attention/self/Softmax, Execution Time: 0.001571 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/MatMul_1, Execution Time: 0.000562 seconds
Node: bert/encoder/layer_9/attention/self/transpose_3, Execution Time: 0.000447 seconds
Node: bert/encoder/layer_9/attention/self/Reshape_3, Execution Time: 0.000038 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_9/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_9/attention/output/dense/MatMul, Execution Time: 0.000661 seconds
Skipping already processed Node: bert/encoder/layer_9/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_9/attention/output/add
Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/mean, Execution Time: 0.000082 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000456 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/SquaredDifference__435, Execution Time: 0.000499 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/variance, Execution Time: 0.000067 seconds
Add Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/Rsqrt__437, Execution Time: 0.000076 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000524 seconds
Add Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000565 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_9/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_9/intermediate/dense/MatMul, Execution Time: 0.000738 seconds
Skipping already processed Node: bert/encoder/layer_9/intermediate/dense/BiasAdd
Node: bert/encoder/layer_9/intermediate/dense/Pow, Execution Time: 0.001530 seconds
Node: bert/encoder/layer_9/intermediate/dense/mul, Execution Time: 0.001426 seconds
Add Node: bert/encoder/layer_9/intermediate/dense/add, Execution Time: 0.001411 seconds
Node: bert/encoder/layer_9/intermediate/dense/mul_1, Execution Time: 0.001332 seconds
Node: bert/encoder/layer_9/intermediate/dense/Tanh, Execution Time: 0.001435 seconds
Add Node: bert/encoder/layer_9/intermediate/dense/add_1, Execution Time: 0.001343 seconds
Node: bert/encoder/layer_9/intermediate/dense/mul_2, Execution Time: 0.001372 seconds
Node: bert/encoder/layer_9/intermediate/dense/mul_3, Execution Time: 0.001386 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_9/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_9/output/dense/MatMul, Execution Time: 0.001089 seconds
Skipping already processed Node: bert/encoder/layer_9/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_9/output/add
Node: bert/encoder/layer_9/output/LayerNorm/moments/mean, Execution Time: 0.000101 seconds
Node: bert/encoder/layer_9/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000596 seconds
Node: bert/encoder/layer_9/output/LayerNorm/moments/SquaredDifference__439, Execution Time: 0.000592 seconds
Node: bert/encoder/layer_9/output/LayerNorm/moments/variance, Execution Time: 0.000066 seconds
Add Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/add, Execution Time: 0.000058 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000059 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/Rsqrt__441, Execution Time: 0.000091 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul, Execution Time: 0.000063 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000061 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/sub, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000564 seconds
Add Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000584 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_10/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_10/attention/self/value/MatMul, Execution Time: 0.001988 seconds
Skipping already processed Node: bert/encoder/layer_10/attention/self/value/BiasAdd
Node: bert/encoder/layer_10/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_10/attention/self/transpose_2, Execution Time: 0.000438 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_10/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_10/attention/self/query/MatMul, Execution Time: 0.000623 seconds
Skipping already processed Node: bert/encoder/layer_10/attention/self/query/BiasAdd
Node: bert/encoder/layer_10/attention/self/Reshape, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_10/attention/self/transpose, Execution Time: 0.000460 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_10/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_10/attention/self/key/MatMul, Execution Time: 0.000663 seconds
Skipping already processed Node: bert/encoder/layer_10/attention/self/key/BiasAdd
Node: bert/encoder/layer_10/attention/self/Reshape_1, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_10/attention/self/MatMul__446, Execution Time: 0.000453 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/MatMul, Execution Time: 0.000487 seconds
Node: bert/encoder/layer_10/attention/self/Mul, Execution Time: 0.001345 seconds
Add Node: bert/encoder/layer_10/attention/self/add, Execution Time: 0.001318 seconds
Node: bert/encoder/layer_10/attention/self/Softmax, Execution Time: 0.001414 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/MatMul_1, Execution Time: 0.000694 seconds
Node: bert/encoder/layer_10/attention/self/transpose_3, Execution Time: 0.000443 seconds
Node: bert/encoder/layer_10/attention/self/Reshape_3, Execution Time: 0.000048 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_10/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_10/attention/output/dense/MatMul, Execution Time: 0.000693 seconds
Skipping already processed Node: bert/encoder/layer_10/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_10/attention/output/add
Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/mean, Execution Time: 0.000084 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000475 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/SquaredDifference__449, Execution Time: 0.000465 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/Rsqrt__451, Execution Time: 0.000067 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000531 seconds
Add Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000460 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_10/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_10/intermediate/dense/MatMul, Execution Time: 0.000681 seconds
Skipping already processed Node: bert/encoder/layer_10/intermediate/dense/BiasAdd
Node: bert/encoder/layer_10/intermediate/dense/Pow, Execution Time: 0.001327 seconds
Node: bert/encoder/layer_10/intermediate/dense/mul, Execution Time: 0.001411 seconds
Add Node: bert/encoder/layer_10/intermediate/dense/add, Execution Time: 0.001332 seconds
Node: bert/encoder/layer_10/intermediate/dense/mul_1, Execution Time: 0.001390 seconds
Node: bert/encoder/layer_10/intermediate/dense/Tanh, Execution Time: 0.001319 seconds
Add Node: bert/encoder/layer_10/intermediate/dense/add_1, Execution Time: 0.001312 seconds
Node: bert/encoder/layer_10/intermediate/dense/mul_2, Execution Time: 0.001759 seconds
Node: bert/encoder/layer_10/intermediate/dense/mul_3, Execution Time: 0.001331 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_10/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_10/output/dense/MatMul, Execution Time: 0.000994 seconds
Skipping already processed Node: bert/encoder/layer_10/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_10/output/add
Node: bert/encoder/layer_10/output/LayerNorm/moments/mean, Execution Time: 0.000082 seconds
Node: bert/encoder/layer_10/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000477 seconds
Node: bert/encoder/layer_10/output/LayerNorm/moments/SquaredDifference__453, Execution Time: 0.000459 seconds
Node: bert/encoder/layer_10/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds
Add Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/add, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/Rsqrt__455, Execution Time: 0.000067 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/sub, Execution Time: 0.000059 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000454 seconds
Add Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000557 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_11/attention/self/value/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_11/attention/self/value/MatMul, Execution Time: 0.000667 seconds
Skipping already processed Node: bert/encoder/layer_11/attention/self/value/BiasAdd
Node: bert/encoder/layer_11/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_11/attention/self/transpose_2, Execution Time: 0.000451 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_11/attention/self/query/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_11/attention/self/query/MatMul, Execution Time: 0.000632 seconds
Skipping already processed Node: bert/encoder/layer_11/attention/self/query/BiasAdd
Node: bert/encoder/layer_11/attention/self/Reshape, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_11/attention/self/transpose, Execution Time: 0.000466 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with Add for node: bert/encoder/layer_11/attention/self/key/MatMul
torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_11/attention/self/key/MatMul, Execution Time: 0.000609 seconds
Skipping already processed Node: bert/encoder/layer_11/attention/self/key/BiasAdd
Node: bert/encoder/layer_11/attention/self/Reshape_1, Execution Time: 0.000007 seconds
Node: bert/encoder/layer_11/attention/self/MatMul__460, Execution Time: 0.000451 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/MatMul, Execution Time: 0.000494 seconds
Node: bert/encoder/layer_11/attention/self/Mul, Execution Time: 0.001331 seconds
Add Node: bert/encoder/layer_11/attention/self/add, Execution Time: 0.001391 seconds
Node: bert/encoder/layer_11/attention/self/Softmax, Execution Time: 0.001305 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/MatMul_1, Execution Time: 0.000559 seconds
Node: bert/encoder/layer_11/attention/self/transpose_3, Execution Time: 0.000445 seconds
Node: bert/encoder/layer_11/attention/self/Reshape_3, Execution Time: 0.000047 seconds
Input size: (None, 256, 768, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_11/attention/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_11/attention/output/dense/MatMul, Execution Time: 0.000668 seconds
Skipping already processed Node: bert/encoder/layer_11/attention/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_11/attention/output/add
Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/mean, Execution Time: 0.000082 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000474 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/SquaredDifference__463, Execution Time: 0.000541 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000048 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000048 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/Rsqrt__465, Execution Time: 0.000071 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000075 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000450 seconds
Add Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000453 seconds
Input size: (None, 256, 768, 3072)
Fusing MatMul with Add for node: bert/encoder/layer_11/intermediate/dense/MatMul
torch.Size([256, 3072])
MatMul Fuse node: bert/encoder/layer_11/intermediate/dense/MatMul, Execution Time: 0.000818 seconds
Skipping already processed Node: bert/encoder/layer_11/intermediate/dense/BiasAdd
Node: bert/encoder/layer_11/intermediate/dense/Pow, Execution Time: 0.002038 seconds
Node: bert/encoder/layer_11/intermediate/dense/mul, Execution Time: 0.001370 seconds
Add Node: bert/encoder/layer_11/intermediate/dense/add, Execution Time: 0.001295 seconds
Node: bert/encoder/layer_11/intermediate/dense/mul_1, Execution Time: 0.001367 seconds
Node: bert/encoder/layer_11/intermediate/dense/Tanh, Execution Time: 0.001366 seconds
Add Node: bert/encoder/layer_11/intermediate/dense/add_1, Execution Time: 0.001344 seconds
Node: bert/encoder/layer_11/intermediate/dense/mul_2, Execution Time: 0.001409 seconds
Node: bert/encoder/layer_11/intermediate/dense/mul_3, Execution Time: 0.001320 seconds
Input size: (None, 256, 3072, 768)
Fusing MatMul with 2Add for node: bert/encoder/layer_11/output/dense/MatMul
torch.Size([256, 768]) , torch.Size([256, 768])
MatMul Fuse node: bert/encoder/layer_11/output/dense/MatMul, Execution Time: 0.000977 seconds
Skipping already processed Node: bert/encoder/layer_11/output/dense/BiasAdd
Skipping already processed Node: bert/encoder/layer_11/output/add
Node: bert/encoder/layer_11/output/LayerNorm/moments/mean, Execution Time: 0.000082 seconds
Node: bert/encoder/layer_11/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000461 seconds
Node: bert/encoder/layer_11/output/LayerNorm/moments/SquaredDifference__467, Execution Time: 0.000485 seconds
Node: bert/encoder/layer_11/output/LayerNorm/moments/variance, Execution Time: 0.000055 seconds
Add Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/add, Execution Time: 0.000049 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000048 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/Rsqrt__469, Execution Time: 0.000070 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul, Execution Time: 0.000045 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000533 seconds
Add Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000473 seconds
Input size: (None, 256, 768, 2)
Fusing MatMul with Add for node: MatMul
torch.Size([256, 2])
MatMul Fuse node: MatMul, Execution Time: 0.001725 seconds
Skipping already processed Node: BiasAdd
Node: Reshape_1, Execution Time: 0.000026 seconds
Node: transpose, Execution Time: 0.000045 seconds
Node: unstack, Execution Time: 0.000050 seconds
Node: unstack__490, Execution Time: 0.000020 seconds
Node: unstack__488, Execution Time: 0.000007 seconds
Node Execution Times:
Total Execution Time: 0.436412 seconds
Total Matmul + Add Execution Time: 0.163752 seconds
Execution complete.
Model outputs: {'unstack:1': array([[-4.9148726, -4.6251225, -4.132886 , -4.1499195, -4.7828836,
-4.250844 , -4.77094 , -4.348463 , -2.7006364, -4.424177 ,
-4.510866 , -4.39433 , -4.773833 , -4.480716 , -4.7714205,
-4.6485815, -3.1330094, -4.7139587, -4.7148943, -4.7223635,
-4.7008233, -4.6960616, -4.7121487, -4.708615 , -4.703374 ,
-4.7024655, -4.687359 , -4.693113 , -4.698162 , -4.692563 ,
-4.711712 , -4.7003703, -4.7027717, -4.7279253, -4.709934 ,
-4.715551 , -4.7324576, -4.7294855, -4.7329216, -4.7218866,
-4.7014203, -4.694692 , -4.6925716, -4.700892 , -4.7044754,
-4.68252 , -4.679993 , -4.6824126, -4.6833754, -4.690988 ,
-4.695919 , -4.6797957, -4.683871 , -4.6834297, -4.680781 ,
-4.686977 , -4.681429 , -4.680897 , -4.694978 , -4.685382 ,
-4.70324 , -4.7010674, -4.693331 , -4.7089696, -4.71908 ,
-4.7188516, -4.70435 , -4.685466 , -4.6962924, -4.6972375,
-4.691828 , -4.688009 , -4.691449 , -4.693622 , -4.6890097,
-4.6876435, -4.684474 , -4.7056074, -4.6984677, -4.7068577,
-4.689911 , -4.687499 , -4.6927333, -4.693831 , -4.6965637,
-4.693646 , -4.693519 , -4.71067 , -4.722037 , -4.718479 ,
-4.729904 , -4.721483 , -4.739112 , -4.7325935, -4.7295456,
-4.712435 , -4.712704 , -4.7114053, -4.712399 , -4.704262 ,
-4.6972833, -4.6926665, -4.717176 , -4.6937675, -4.694539 ,
-4.711683 , -4.685275 , -4.6935816, -4.701117 , -4.6866083,
-4.6843753, -4.6876745, -4.684178 , -4.694061 , -4.6890798,
-4.6861553, -4.7003927, -4.7103863, -4.710601 , -4.7194986,
-4.7016277, -4.718649 , -4.743214 , -4.7109504, -4.711556 ,
-4.7007613, -4.7009783, -4.6995244, -4.7007017, -4.7026825,
-4.706376 , -4.7061615, -4.7284904, -4.724841 , -4.7082043,
-4.7080393, -4.7098503, -4.7207146, -4.733838 , -4.7125974,
-4.7276387, -4.721991 , -4.7300687, -4.7229652, -4.7133346,
-4.7109923, -4.71963 , -4.7312083, -4.733224 , -4.7362647,
-4.739877 , -4.74243 , -4.727128 , -4.737834 , -4.74598 ,
-4.738839 , -4.744508 , -4.728359 , -4.726734 , -4.7255516,
-4.7363386, -4.73214 , -4.7196693, -4.721826 , -4.7047076,
-4.7190104, -4.7156587, -4.706273 , -4.7116737, -4.701518 ,
-4.6943965, -4.6903934, -4.6890545, -4.6862764, -4.6875463,
-4.684304 , -4.688264 , -4.691186 , -4.7027955, -4.6910152,
-4.6985803, -4.7152886, -4.723945 , -4.7293673, -4.7427354,
-4.73977 , -4.7290154, -4.7378254, -4.7355986, -4.731869 ,
-4.724579 , -4.7262163, -4.71887 , -4.7058587, -4.7122684,
-4.7009015, -4.696829 , -4.7094407, -4.703914 , -4.703702 ,
-4.7195215, -4.7118044, -4.709847 , -4.721358 , -4.723019 ,
-4.71298 , -4.7218485, -4.724691 , -4.725982 , -4.726673 ,
-4.7187834, -4.709004 , -4.7109466, -4.737439 , -4.7246385,
-4.73252 , -4.7404885, -4.7261868, -4.734698 , -4.732445 ,
-4.736647 , -4.724646 , -4.73208 , -4.7321663, -4.7037077,
-4.718028 , -4.726786 , -4.7345347, -4.7328334, -4.7220054,
-4.7327023, -4.7200413, -4.7459936, -4.728972 , -4.7290406,
-4.7259574, -4.730495 , -4.723769 , -4.7380366, -4.7268267,
-4.692981 , -4.718449 , -4.6935935, -4.6961823, -4.713647 ,
-4.6950507, -4.700345 , -4.7232556, -4.708386 , -4.737004 ,
-4.7273254, -4.716681 , -4.7106347, -4.714922 , -4.7030454,
-4.7468524]], dtype=float32), 'unstack:0': array([[-5.339778 , -4.878685 , -4.312428 , -4.3309417, -5.125337 ,
-4.442749 , -5.1271124, -4.5656004, -4.683339 , -4.6350813,
-4.8042274, -4.6028423, -5.1304255, -4.7185884, -5.0999007,
-4.9003377, -5.1724668, -5.1058035, -5.1073008, -5.1120396,
-5.0958624, -5.092071 , -5.104314 , -5.1013465, -5.0973773,
-5.0955014, -5.086265 , -5.089708 , -5.093198 , -5.089909 ,
-5.1028776, -5.0938663, -5.0976443, -5.1154556, -5.102868 ,
-5.1068664, -5.1185074, -5.1169963, -5.118672 , -5.1110716,
-5.0957775, -5.0914636, -5.089892 , -5.096351 , -5.099577 ,
-5.084194 , -5.082636 , -5.0841656, -5.0848293, -5.089616 ,
-5.0918293, -5.083179 , -5.084272 , -5.0856056, -5.0826926,
-5.087329 , -5.0841713, -5.0831146, -5.092702 , -5.084974 ,
-5.0978565, -5.0952926, -5.090936 , -5.102818 , -5.110067 ,
-5.1097775, -5.0976253, -5.0851665, -5.0931044, -5.093152 ,
-5.089941 , -5.0872903, -5.0898356, -5.0923924, -5.0875926,
-5.086853 , -5.085301 , -5.100186 , -5.094749 , -5.099969 ,
-5.0874996, -5.0855126, -5.0895004, -5.09137 , -5.0918326,
-5.0898056, -5.090782 , -5.1034665, -5.112412 , -5.109096 ,
-5.1174197, -5.1111536, -5.1241746, -5.1188 , -5.116848 ,
-5.1029363, -5.1041894, -5.103745 , -5.105212 , -5.098095 ,
-5.093282 , -5.090341 , -5.1087084, -5.0905395, -5.0906925,
-5.1039257, -5.084995 , -5.090868 , -5.0939407, -5.0842586,
-5.0840406, -5.0855136, -5.08409 , -5.089621 , -5.0858765,
-5.0852404, -5.09481 , -5.1036887, -5.1036325, -5.1107006,
-5.0964427, -5.109834 , -5.128194 , -5.104343 , -5.10455 ,
-5.0965843, -5.0981956, -5.0968714, -5.0971923, -5.096769 ,
-5.1019425, -5.1022315, -5.119105 , -5.116201 , -5.102627 ,
-5.102922 , -5.1034007, -5.111492 , -5.121706 , -5.1049304,
-5.116994 , -5.111964 , -5.1179514, -5.1140733, -5.1069007,
-5.1045523, -5.1113954, -5.119346 , -5.1202354, -5.1230803,
-5.1247115, -5.125494 , -5.1167865, -5.1235557, -5.127506 ,
-5.1223035, -5.124693 , -5.116798 , -5.1166444, -5.1148844,
-5.1223955, -5.1191473, -5.111838 , -5.112754 , -5.1008034,
-5.1111383, -5.1085505, -5.100999 , -5.1052284, -5.0974274,
-5.0922704, -5.0895066, -5.089077 , -5.086511 , -5.0866723,
-5.0855794, -5.0879817, -5.0893273, -5.0967927, -5.08802 ,
-5.093814 , -5.1059337, -5.112577 , -5.1154685, -5.121607 ,
-5.12036 , -5.114813 , -5.1212907, -5.1178846, -5.117335 ,
-5.1129055, -5.1143084, -5.109348 , -5.100045 , -5.1053514,
-5.0964003, -5.0934987, -5.102238 , -5.0983605, -5.0989766,
-5.1099577, -5.10423 , -5.1023245, -5.1104093, -5.111489 ,
-5.1045485, -5.110909 , -5.112187 , -5.1123652, -5.113932 ,
-5.10867 , -5.0995913, -5.101586 , -5.1216726, -5.111117 ,
-5.116669 , -5.12195 , -5.112778 , -5.1199346, -5.117032 ,
-5.120798 , -5.11272 , -5.117168 , -5.1175523, -5.09827 ,
-5.1082807, -5.1146145, -5.1200075, -5.1190424, -5.112625 ,
-5.1200185, -5.1110024, -5.126168 , -5.1168666, -5.11615 ,
-5.113571 , -5.118028 , -5.1132293, -5.122775 , -5.1154203,
-5.091564 , -5.1100745, -5.0914884, -5.0932784, -5.105365 ,
-5.092105 , -5.0959387, -5.1119223, -5.101221 , -5.1215677,
-5.114091 , -5.10658 , -5.101732 , -5.105737 , -5.0961223,
-5.1260395]], dtype=float32), 'unique_ids:0': array([0])}
Question: What is the capital of France?
Context: The capital of France is Paris.
Answer:
Generating '/tmp/nsys-report-b145.qdstrm'
[1/8] [0% ] nsys-report-048e.nsys-rep
[1/8] [0% ] nsys-report-048e.nsys-rep
[1/8] [6% ] nsys-report-048e.nsys-rep
[1/8] [12% ] nsys-report-048e.nsys-rep
[1/8] [10% ] nsys-report-048e.nsys-rep
[1/8] [8% ] nsys-report-048e.nsys-rep
[1/8] [=====30% ] nsys-report-048e.nsys-rep
[1/8] [====26% ] nsys-report-048e.nsys-rep
[1/8] [===23% ] nsys-report-048e.nsys-rep
[1/8] [==20% ] nsys-report-048e.nsys-rep
[1/8] [==18% ] nsys-report-048e.nsys-rep
[1/8] [=16% ] nsys-report-048e.nsys-rep
[1/8] [=17% ] nsys-report-048e.nsys-rep
[1/8] [==18% ] nsys-report-048e.nsys-rep
[1/8] [==19% ] nsys-report-048e.nsys-rep
[1/8] [==20% ] nsys-report-048e.nsys-rep
[1/8] [==21% ] nsys-report-048e.nsys-rep
[1/8] [===22% ] nsys-report-048e.nsys-rep
[1/8] [===23% ] nsys-report-048e.nsys-rep
[1/8] [===24% ] nsys-report-048e.nsys-rep
[1/8] [====25% ] nsys-report-048e.nsys-rep
[1/8] [====26% ] nsys-report-048e.nsys-rep
[1/8] [====27% ] nsys-report-048e.nsys-rep
[1/8] [====28% ] nsys-report-048e.nsys-rep
[1/8] [=====31% ] nsys-report-048e.nsys-rep
[1/8] [======35% ] nsys-report-048e.nsys-rep
[1/8] [=========44% ] nsys-report-048e.nsys-rep
[1/8] [===========53% ] nsys-report-048e.nsys-rep
[1/8] [==============61% ] nsys-report-048e.nsys-rep
[1/8] [===============66% ] nsys-report-048e.nsys-rep
[1/8] [================70% ] nsys-report-048e.nsys-rep
[1/8] [==================76% ] nsys-report-048e.nsys-rep
[1/8] [==================77% ] nsys-report-048e.nsys-rep
[1/8] [==================78% ] nsys-report-048e.nsys-rep
[1/8] [===================79% ] nsys-report-048e.nsys-rep
[1/8] [===================80% ] nsys-report-048e.nsys-rep
[1/8] [=====================89% ] nsys-report-048e.nsys-rep
[1/8] [=======================96% ] nsys-report-048e.nsys-rep
[1/8] [========================98% ] nsys-report-048e.nsys-rep
[1/8] [========================100%] nsys-report-048e.nsys-rep
[1/8] [========================100%] nsys-report-048e.nsys-rep
[2/8] [0% ] nsys-report-b910.sqlite
[2/8] [1% ] nsys-report-b910.sqlite
[2/8] [2% ] nsys-report-b910.sqlite
[2/8] [3% ] nsys-report-b910.sqlite
[2/8] [4% ] nsys-report-b910.sqlite
[2/8] [5% ] nsys-report-b910.sqlite
[2/8] [6% ] nsys-report-b910.sqlite
[2/8] [7% ] nsys-report-b910.sqlite
[2/8] [8% ] nsys-report-b910.sqlite
[2/8] [9% ] nsys-report-b910.sqlite
[2/8] [10% ] nsys-report-b910.sqlite
[2/8] [11% ] nsys-report-b910.sqlite
[2/8] [12% ] nsys-report-b910.sqlite
[2/8] [13% ] nsys-report-b910.sqlite
[2/8] [14% ] nsys-report-b910.sqlite
[2/8] [=15% ] nsys-report-b910.sqlite
[2/8] [=16% ] nsys-report-b910.sqlite
[2/8] [=17% ] nsys-report-b910.sqlite
[2/8] [==18% ] nsys-report-b910.sqlite
[2/8] [==19% ] nsys-report-b910.sqlite
[2/8] [==20% ] nsys-report-b910.sqlite
[2/8] [==21% ] nsys-report-b910.sqlite
[2/8] [===22% ] nsys-report-b910.sqlite
[2/8] [===23% ] nsys-report-b910.sqlite
[2/8] [===24% ] nsys-report-b910.sqlite
[2/8] [====25% ] nsys-report-b910.sqlite
[2/8] [====26% ] nsys-report-b910.sqlite
[2/8] [====27% ] nsys-report-b910.sqlite
[2/8] [====28% ] nsys-report-b910.sqlite
[2/8] [=====29% ] nsys-report-b910.sqlite
[2/8] [=====30% ] nsys-report-b910.sqlite
[2/8] [=====31% ] nsys-report-b910.sqlite
[2/8] [=====32% ] nsys-report-b910.sqlite
[2/8] [======33% ] nsys-report-b910.sqlite
[2/8] [======34% ] nsys-report-b910.sqlite
[2/8] [======35% ] nsys-report-b910.sqlite
[2/8] [=======36% ] nsys-report-b910.sqlite
[2/8] [=======37% ] nsys-report-b910.sqlite
[2/8] [=======38% ] nsys-report-b910.sqlite
[2/8] [=======39% ] nsys-report-b910.sqlite
[2/8] [========40% ] nsys-report-b910.sqlite
[2/8] [========41% ] nsys-report-b910.sqlite
[2/8] [========42% ] nsys-report-b910.sqlite
[2/8] [=========43% ] nsys-report-b910.sqlite
[2/8] [=========44% ] nsys-report-b910.sqlite
[2/8] [=========45% ] nsys-report-b910.sqlite
[2/8] [=========46% ] nsys-report-b910.sqlite
[2/8] [==========47% ] nsys-report-b910.sqlite
[2/8] [==========48% ] nsys-report-b910.sqlite
[2/8] [==========49% ] nsys-report-b910.sqlite
[2/8] [===========50% ] nsys-report-b910.sqlite
[2/8] [===========51% ] nsys-report-b910.sqlite
[2/8] [===========52% ] nsys-report-b910.sqlite
[2/8] [===========53% ] nsys-report-b910.sqlite
[2/8] [============54% ] nsys-report-b910.sqlite
[2/8] [============55% ] nsys-report-b910.sqlite
[2/8] [============56% ] nsys-report-b910.sqlite
[2/8] [============57% ] nsys-report-b910.sqlite
[2/8] [=============58% ] nsys-report-b910.sqlite
[2/8] [=============59% ] nsys-report-b910.sqlite
[2/8] [=============60% ] nsys-report-b910.sqlite
[2/8] [==============61% ] nsys-report-b910.sqlite
[2/8] [==============62% ] nsys-report-b910.sqlite
[2/8] [==============63% ] nsys-report-b910.sqlite
[2/8] [==============64% ] nsys-report-b910.sqlite
[2/8] [===============65% ] nsys-report-b910.sqlite
[2/8] [===============66% ] nsys-report-b910.sqlite
[2/8] [===============67% ] nsys-report-b910.sqlite
[2/8] [================68% ] nsys-report-b910.sqlite
[2/8] [================69% ] nsys-report-b910.sqlite
[2/8] [================70% ] nsys-report-b910.sqlite
[2/8] [================71% ] nsys-report-b910.sqlite
[2/8] [=================72% ] nsys-report-b910.sqlite
[2/8] [=================73% ] nsys-report-b910.sqlite
[2/8] [=================74% ] nsys-report-b910.sqlite
[2/8] [==================75% ] nsys-report-b910.sqlite
[2/8] [==================76% ] nsys-report-b910.sqlite
[2/8] [==================77% ] nsys-report-b910.sqlite
[2/8] [==================78% ] nsys-report-b910.sqlite
[2/8] [===================79% ] nsys-report-b910.sqlite
[2/8] [===================80% ] nsys-report-b910.sqlite
[2/8] [===================81% ] nsys-report-b910.sqlite
[2/8] [===================82% ] nsys-report-b910.sqlite
[2/8] [====================83% ] nsys-report-b910.sqlite
[2/8] [====================84% ] nsys-report-b910.sqlite
[2/8] [====================85% ] nsys-report-b910.sqlite
[2/8] [=====================86% ] nsys-report-b910.sqlite
[2/8] [=====================87% ] nsys-report-b910.sqlite
[2/8] [=====================88% ] nsys-report-b910.sqlite
[2/8] [=====================89% ] nsys-report-b910.sqlite
[2/8] [======================90% ] nsys-report-b910.sqlite
[2/8] [======================91% ] nsys-report-b910.sqlite
[2/8] [======================92% ] nsys-report-b910.sqlite
[2/8] [=======================93% ] nsys-report-b910.sqlite
[2/8] [=======================94% ] nsys-report-b910.sqlite
[2/8] [=======================95% ] nsys-report-b910.sqlite
[2/8] [=======================96% ] nsys-report-b910.sqlite
[2/8] [========================97% ] nsys-report-b910.sqlite
[2/8] [========================98% ] nsys-report-b910.sqlite
[2/8] [========================99% ] nsys-report-b910.sqlite
[2/8] [========================100%] nsys-report-b910.sqlite
[2/8] [========================100%] nsys-report-b910.sqlite
[3/8] Executing 'nvtx_sum' stats report
[4/8] Executing 'osrt_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ------------- ------------- ----------- ----------- ------------ ----------------------
53.8 5,534,228,333 66 83,851,944.4 100,143,002.5 1,170 545,269,062 71,605,661.3 poll
43.7 4,500,777,207 9 500,086,356.3 500,086,682.0 500,079,031 500,089,732 3,160.8 pthread_cond_timedwait
1.6 169,470,404 5,645 30,021.3 800.0 290 156,067,077 2,077,185.0 read
0.6 65,433,966 3,057 21,404.6 7,290.0 210 10,657,324 253,752.7 ioctl
0.1 9,565,910 3,192 2,996.8 2,730.0 1,150 37,190 1,560.3 open64
0.0 5,062,319 1 5,062,319.0 5,062,319.0 5,062,319 5,062,319 0.0 nanosleep
0.0 3,515,399 133,713 26.3 20.0 20 7,690 37.9 pthread_cond_signal
0.0 3,019,548 138 21,880.8 5,050.0 2,120 1,585,212 135,490.4 mmap64
0.0 888,370 10 88,837.0 61,496.0 16,131 321,794 89,799.7 sem_timedwait
0.0 875,984 13 67,383.4 60,021.0 54,961 81,122 11,142.0 sleep
0.0 507,661 583 870.8 50.0 20 57,101 5,351.7 fgets
0.0 344,517 32 10,766.2 5,985.0 430 48,080 13,305.5 write
0.0 339,116 8 42,389.5 38,491.0 23,730 62,011 14,666.4 pthread_create
0.0 303,824 27 11,252.7 7,160.0 1,910 78,201 14,616.3 mmap
0.0 211,907 44 4,816.1 2,895.0 1,130 23,071 4,821.3 fopen
0.0 187,553 9 20,839.2 4,420.0 2,370 83,491 31,534.9 munmap
0.0 167,402 173 967.6 820.0 500 3,971 515.7 pread64
0.0 124,571 1 124,571.0 124,571.0 124,571 124,571 0.0 pthread_cond_wait
0.0 100,471 1 100,471.0 100,471.0 100,471 100,471 0.0 waitpid
0.0 61,040 1,622 37.6 30.0 20 4,320 147.5 pthread_cond_broadcast
0.0 57,899 41 1,412.2 1,150.0 660 4,790 867.4 fclose
0.0 54,840 15 3,656.0 3,270.0 1,820 6,590 1,615.8 open
0.0 38,309 6 6,384.8 4,239.5 2,220 18,640 6,173.0 pipe2
0.0 32,631 2 16,315.5 16,315.5 9,130 23,501 10,161.8 connect
0.0 31,867 133 239.6 250.0 20 1,480 163.8 sigaction
0.0 29,977 1,211 24.8 20.0 20 151 6.3 flockfile
0.0 29,391 4 7,347.8 7,470.0 3,370 11,081 4,026.6 socket
0.0 22,437 68 330.0 300.0 180 1,160 173.5 fcntl
0.0 20,210 6 3,368.3 2,620.0 1,360 7,370 2,188.4 fopen64
0.0 16,430 192 85.6 100.0 20 550 66.3 pthread_mutex_trylock
0.0 15,540 3 5,180.0 5,620.0 1,600 8,320 3,381.5 fread
0.0 8,140 2 4,070.0 4,070.0 2,350 5,790 2,432.4 bind
0.0 3,480 2 1,740.0 1,740.0 800 2,680 1,329.4 fwrite
0.0 2,629 10 262.9 260.0 189 360 49.6 dup
0.0 2,602 30 86.7 30.0 20 900 182.5 fflush
0.0 2,250 2 1,125.0 1,125.0 660 1,590 657.6 dup2
0.0 769 1 769.0 769.0 769 769 0.0 getc
0.0 680 1 680.0 680.0 680 680 0.0 listen
[5/8] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------- --------- -------- ---------- ----------- ---------------------------------
66.8 458,889,319 1,804 254,373.2 53,460.5 2,211 2,177,000 394,198.6 cudaMemcpyAsync
16.4 112,515,093 1,804 62,369.8 11,100.0 650 257,654 79,474.5 cudaStreamSynchronize
10.9 75,217,217 707 106,389.3 7,460.0 2,850 16,497,323 927,843.2 cudaLaunchKernel
1.5 10,141,562 98 103,485.3 91,441.5 5,390 327,454 88,149.1 cuCtxSynchronize
1.4 9,551,675 2,624 3,640.1 3,085.0 490 20,001 2,831.1 cudaDeviceSynchronize
1.0 6,839,815 2,624 2,606.6 1,560.0 1,190 32,571 2,225.5 cudaEventRecord
0.9 6,327,816 26 243,377.5 715.0 290 6,308,675 1,237,082.8 cudaStreamIsCapturing_v10000
0.4 2,729,205 23 118,661.1 126,411.0 73,641 167,492 30,706.1 cudaMalloc
0.3 1,776,952 2,624 677.2 600.0 240 18,670 548.3 cudaEventCreateWithFlags
0.2 1,274,525 98 13,005.4 12,935.0 7,760 27,621 1,979.2 cuLaunchKernel
0.1 922,031 2,624 351.4 300.0 180 7,720 263.5 cudaEventDestroy
0.1 361,385 5 72,277.0 70,091.0 56,771 89,731 12,660.4 cuModuleLoadData
0.0 326,636 1,149 284.3 200.0 50 7,880 367.3 cuGetProcAddress_v2
0.0 262,753 50 5,255.1 5,465.0 3,130 9,450 1,868.7 cudaMemsetAsync
0.0 171,663 1 171,663.0 171,663.0 171,663 171,663 0.0 cudaGetDeviceProperties_v2_v12000
0.0 3,930 3 1,310.0 1,300.0 510 2,120 805.0 cuInit
0.0 3,530 1 3,530.0 3,530.0 3,530 3,530 0.0 cuMemFree_v2
0.0 950 3 316.7 240.0 60 650 302.4 cuModuleGetLoadingMode
0.0 840 1 840.0 840.0 840 840 0.0 cuCtxSetCurrent
[6/8] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- -------- -------- -------- -------- ----------- ----------------------------------------------------------------------------------------------------
84.2 9,532,306 97 98,271.2 84,480.0 11,072 322,784 89,061.0 cutlass_tensorop_s1688tf32gemm_256x128_16x3_tt_align4
3.1 345,470 125 2,763.8 2,368.0 1,343 6,016 1,403.3 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
2.8 315,425 121 2,606.8 2,304.0 1,280 4,288 724.6 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
2.0 225,953 75 3,012.7 2,368.0 1,600 4,993 1,136.4 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
1.9 217,087 123 1,764.9 1,280.0 800 3,136 758.5 void at::native::vectorized_elementwise_kernel<(int)4, at::native::FillFunctor<float>, at::detail::…
1.6 182,945 50 3,658.9 3,712.0 3,488 4,000 149.1 void at::native::reduce_kernel<(int)512, (int)1, at::native::ReduceOp<float, at::native::MeanOps<fl…
0.9 104,833 12 8,736.1 8,688.0 8,608 9,248 171.0 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
0.9 103,266 64 1,613.5 960.0 864 4,544 1,348.0 void at::native::vectorized_elementwise_kernel<(int)4, at::native::CUDAFunctor_add<float>, at::deta…
0.9 96,288 37 2,602.4 1,824.0 1,728 4,384 1,188.8 void at::native::vectorized_elementwise_kernel<(int)4, at::native::BinaryFunctor<float, float, floa…
0.6 71,392 12 5,949.3 5,952.0 5,856 6,048 66.1 void <unnamed>::softmax_warp_forward<float, float, float, (int)8, (bool)0, (bool)0>(T2 *, const T1 …
0.4 45,536 12 3,794.7 3,792.0 3,712 3,872 53.6 void at::native::vectorized_elementwise_kernel<(int)4, at::native::tanh_kernel_cuda(at::TensorItera…
0.2 25,922 25 1,036.9 1,024.0 1,024 1,057 16.1 void at::native::vectorized_elementwise_kernel<(int)4, at::native::reciprocal_kernel_cuda(at::Tenso…
0.2 25,664 25 1,026.6 1,024.0 992 1,056 12.8 void at::native::vectorized_elementwise_kernel<(int)4, at::native::sqrt_kernel_cuda(at::TensorItera…
0.2 22,911 25 916.4 928.0 895 928 15.7 void at::native::vectorized_elementwise_kernel<(int)4, at::native::AUnaryFunctor<float, float, floa…
0.1 5,760 1 5,760.0 5,760.0 5,760 5,760 0.0 cutlass_tensorop_s1688tf32gemm_256x128_16x3_tt_align2
0.0 1,600 1 1,600.0 1,600.0 1,600 1,600 0.0 void at::native::<unnamed>::CatArrayBatchedCopy_aligned16_contig<int, unsigned int, (int)1, (int)12…
[7/8] Executing 'cuda_gpu_mem_time_sum' stats report
Time (%) Total Time (ns) Count Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Operation
-------- --------------- ----- --------- --------- -------- --------- ----------- ----------------------------
56.6 187,729,069 1,157 162,255.0 119,617.0 287 2,133,603 254,865.0 [CUDA memcpy Host-to-Device]
43.4 143,824,334 647 222,294.2 117,216.0 1,056 1,011,362 282,104.0 [CUDA memcpy Device-to-Host]
0.0 24,355 50 487.1 320.0 288 1,088 294.8 [CUDA memset]
[8/8] Executing 'cuda_gpu_mem_size_sum' stats report
Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation
---------- ----- -------- -------- -------- -------- ----------- ----------------------------
1,224.510 1,157 1.058 0.786 0.000 9.437 1.648 [CUDA memcpy Host-to-Device]
707.510 647 1.094 0.786 0.000 3.146 1.206 [CUDA memcpy Device-to-Host]
0.000 50 0.000 0.000 0.000 0.000 0.000 [CUDA memset]
Generated:
/tmp/nsys-report-048e.nsys-rep
/tmp/nsys-report-b910.sqlite
Editor is loading...
Leave a Comment