Untitled

 avatar
unknown
c_cpp
a month ago
117 kB
1
Indexable
Model Input Name: unique_ids_raw_output___9:0, Shape: [0]
Model Input Name: segment_ids:0, Shape: [0, 256]
Model Input Name: input_mask:0, Shape: [0, 256]
Model Input Name: input_ids:0, Shape: [0, 256]
Starting model execution...

Inputs Details:
Input Name: input_ids:0
Shape: (1, 256)
Data (first 10 values): [ 101 2054 2003 1996 3007 1997 2605 1029  102 1996]...
--------------------------------------------------
Input Name: segment_ids:0
Shape: (1, 256)
Data (first 10 values): [0 0 0 0 0 0 0 0 0 1]...
--------------------------------------------------
Input Name: input_mask:0
Shape: (1, 256)
Data (first 10 values): [1 1 1 1 1 1 1 1 1 1]...
--------------------------------------------------
Input Name: unique_ids_raw_output___9:0
Shape: (1,)
Data (first 10 values): [0]...
--------------------------------------------------
Node: unique_ids_graph_outputs_Identity__10, Execution Time: 0.000497 seconds

Node: bert/encoder/Shape, Execution Time: 0.000030 seconds

Node: bert/encoder/Shape__12, Execution Time: 0.000043 seconds

Node: bert/encoder/strided_slice, Execution Time: 0.000166 seconds

Node: bert/encoder/strided_slice__16, Execution Time: 0.000030 seconds

Node: bert/encoder/strided_slice__17, Execution Time: 0.000020 seconds

Node: bert/encoder/ones/packed_Unsqueeze__18, Execution Time: 0.000035 seconds

Node: bert/encoder/ones/packed_Concat__21, Execution Time: 0.004864 seconds

Node: bert/encoder/ones__22, Execution Time: 0.000045 seconds

Node: bert/encoder/ones, Execution Time: 0.000072 seconds

Node: bert/encoder/Reshape, Execution Time: 0.000041 seconds

Node: bert/encoder/Cast, Execution Time: 0.000020 seconds

Node: bert/encoder/mul, Execution Time: 0.007905 seconds

Node: bert/encoder/layer_9/attention/self/ExpandDims, Execution Time: 0.000021 seconds

Node: bert/encoder/layer_9/attention/self/sub, Execution Time: 0.006667 seconds

Node: bert/encoder/layer_9/attention/self/mul_1, Execution Time: 0.000229 seconds

Node: bert/embeddings/Reshape_2, Execution Time: 0.000020 seconds

Node: bert/embeddings/Reshape, Execution Time: 0.000004 seconds

Node: bert/embeddings/GatherV2, Execution Time: 0.000160 seconds

Node: bert/embeddings/Reshape_1, Execution Time: 0.000020 seconds

Node: bert/embeddings/one_hot, Execution Time: 0.000218 seconds

Input size: (None, 256, 2, 768)
No Add node related to MatMul output: bert/embeddings/MatMul. Executing regular MatMul.
MatMul Node: bert/embeddings/MatMul, Execution Time: 0.025803 seconds

Node: bert/embeddings/Reshape_3, Execution Time: 0.000024 seconds

Add Node: bert/embeddings/add, Execution Time: 0.000617 seconds

Add Node: bert/embeddings/add_1, Execution Time: 0.000539 seconds

Node: bert/embeddings/LayerNorm/moments/mean, Execution Time: 0.005122 seconds

Node: bert/embeddings/LayerNorm/moments/SquaredDifference, Execution Time: 0.000512 seconds

Node: bert/embeddings/LayerNorm/moments/SquaredDifference__72, Execution Time: 0.000581 seconds

Node: bert/embeddings/LayerNorm/moments/variance, Execution Time: 0.000065 seconds

Add Node: bert/embeddings/LayerNorm/batchnorm/add, Execution Time: 0.000063 seconds

Node: bert/embeddings/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.010223 seconds

Node: bert/embeddings/LayerNorm/batchnorm/Rsqrt__74, Execution Time: 0.005414 seconds

Node: bert/embeddings/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds

Node: bert/embeddings/LayerNorm/batchnorm/mul_2, Execution Time: 0.000059 seconds

Node: bert/embeddings/LayerNorm/batchnorm/sub, Execution Time: 0.000057 seconds

Node: bert/embeddings/LayerNorm/batchnorm/mul_1, Execution Time: 0.000468 seconds

Add Node: bert/embeddings/LayerNorm/batchnorm/add_1, Execution Time: 0.000573 seconds

Node: bert/encoder/Reshape_1, Execution Time: 0.000024 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/value/MatMul, Execution Time: 0.001978 seconds

Add Node: bert/encoder/layer_0/attention/self/value/BiasAdd, Execution Time: 0.000459 seconds

Node: bert/encoder/layer_0/attention/self/Reshape_2, Execution Time: 0.000020 seconds

Node: bert/encoder/layer_0/attention/self/transpose_2, Execution Time: 0.000455 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/query/MatMul, Execution Time: 0.000855 seconds

Add Node: bert/encoder/layer_0/attention/self/query/BiasAdd, Execution Time: 0.000456 seconds

Node: bert/encoder/layer_0/attention/self/Reshape, Execution Time: 0.000010 seconds

Node: bert/encoder/layer_0/attention/self/transpose, Execution Time: 0.000475 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/key/MatMul, Execution Time: 0.000611 seconds

Add Node: bert/encoder/layer_0/attention/self/key/BiasAdd, Execution Time: 0.000486 seconds

Node: bert/encoder/layer_0/attention/self/Reshape_1, Execution Time: 0.000009 seconds

Node: bert/encoder/layer_0/attention/self/MatMul__306, Execution Time: 0.000471 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/MatMul, Execution Time: 0.001572 seconds

Node: bert/encoder/layer_0/attention/self/Mul, Execution Time: 0.001380 seconds

Add Node: bert/encoder/layer_0/attention/self/add, Execution Time: 0.001374 seconds

Node: bert/encoder/layer_0/attention/self/Softmax, Execution Time: 0.009023 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/MatMul_1, Execution Time: 0.000642 seconds

Node: bert/encoder/layer_0/attention/self/transpose_3, Execution Time: 0.000459 seconds

Node: bert/encoder/layer_0/attention/self/Reshape_3, Execution Time: 0.000065 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_0/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/output/dense/MatMul, Execution Time: 0.000608 seconds

Add Node: bert/encoder/layer_0/attention/output/dense/BiasAdd, Execution Time: 0.000476 seconds

Add Node: bert/encoder/layer_0/attention/output/add, Execution Time: 0.000619 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/mean, Execution Time: 0.000072 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000467 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference__309, Execution Time: 0.000468 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/variance, Execution Time: 0.000065 seconds

Add Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt__311, Execution Time: 0.000068 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000054 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds

Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000454 seconds

Add Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000539 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_0/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/intermediate/dense/MatMul, Execution Time: 0.000634 seconds

Add Node: bert/encoder/layer_0/intermediate/dense/BiasAdd, Execution Time: 0.001340 seconds

Node: bert/encoder/layer_0/intermediate/dense/Pow, Execution Time: 0.018156 seconds

Node: bert/encoder/layer_0/intermediate/dense/mul, Execution Time: 0.001935 seconds

Add Node: bert/encoder/layer_0/intermediate/dense/add, Execution Time: 0.001330 seconds

Node: bert/encoder/layer_0/intermediate/dense/mul_1, Execution Time: 0.001392 seconds

Node: bert/encoder/layer_0/intermediate/dense/Tanh, Execution Time: 0.003783 seconds

Add Node: bert/encoder/layer_0/intermediate/dense/add_1, Execution Time: 0.001652 seconds

Node: bert/encoder/layer_0/intermediate/dense/mul_2, Execution Time: 0.001321 seconds

Node: bert/encoder/layer_0/intermediate/dense/mul_3, Execution Time: 0.001385 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_0/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/output/dense/MatMul, Execution Time: 0.000917 seconds

Add Node: bert/encoder/layer_0/output/dense/BiasAdd, Execution Time: 0.000492 seconds

Add Node: bert/encoder/layer_0/output/add, Execution Time: 0.000489 seconds

Node: bert/encoder/layer_0/output/LayerNorm/moments/mean, Execution Time: 0.000070 seconds

Node: bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000472 seconds

Node: bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference__313, Execution Time: 0.000493 seconds

Node: bert/encoder/layer_0/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds

Add Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/add, Execution Time: 0.000043 seconds

Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds

Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt__315, Execution Time: 0.000067 seconds

Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000054 seconds

Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/sub, Execution Time: 0.000055 seconds

Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000472 seconds

Add Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000493 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/value/MatMul, Execution Time: 0.000622 seconds

Add Node: bert/encoder/layer_1/attention/self/value/BiasAdd, Execution Time: 0.000484 seconds

Node: bert/encoder/layer_1/attention/self/Reshape_2, Execution Time: 0.000020 seconds

Node: bert/encoder/layer_1/attention/self/transpose_2, Execution Time: 0.000481 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/query/MatMul, Execution Time: 0.000583 seconds

Add Node: bert/encoder/layer_1/attention/self/query/BiasAdd, Execution Time: 0.000481 seconds

Node: bert/encoder/layer_1/attention/self/Reshape, Execution Time: 0.000010 seconds

Node: bert/encoder/layer_1/attention/self/transpose, Execution Time: 0.000438 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/key/MatMul, Execution Time: 0.000589 seconds

Add Node: bert/encoder/layer_1/attention/self/key/BiasAdd, Execution Time: 0.000462 seconds

Node: bert/encoder/layer_1/attention/self/Reshape_1, Execution Time: 0.000010 seconds

Node: bert/encoder/layer_1/attention/self/MatMul__320, Execution Time: 0.000445 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/MatMul, Execution Time: 0.000498 seconds

Node: bert/encoder/layer_1/attention/self/Mul, Execution Time: 0.001336 seconds

Add Node: bert/encoder/layer_1/attention/self/add, Execution Time: 0.001386 seconds

Node: bert/encoder/layer_1/attention/self/Softmax, Execution Time: 0.001339 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/MatMul_1, Execution Time: 0.000655 seconds

Node: bert/encoder/layer_1/attention/self/transpose_3, Execution Time: 0.000478 seconds

Node: bert/encoder/layer_1/attention/self/Reshape_3, Execution Time: 0.000052 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_1/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/output/dense/MatMul, Execution Time: 0.000575 seconds

Add Node: bert/encoder/layer_1/attention/output/dense/BiasAdd, Execution Time: 0.000460 seconds

Add Node: bert/encoder/layer_1/attention/output/add, Execution Time: 0.000628 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/mean, Execution Time: 0.000069 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000452 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/SquaredDifference__323, Execution Time: 0.000468 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/variance, Execution Time: 0.000052 seconds

Add Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000041 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/Rsqrt__325, Execution Time: 0.000072 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000057 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000453 seconds

Add Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000458 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_1/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/intermediate/dense/MatMul, Execution Time: 0.000684 seconds

Add Node: bert/encoder/layer_1/intermediate/dense/BiasAdd, Execution Time: 0.001391 seconds

Node: bert/encoder/layer_1/intermediate/dense/Pow, Execution Time: 0.001334 seconds

Node: bert/encoder/layer_1/intermediate/dense/mul, Execution Time: 0.001634 seconds

Add Node: bert/encoder/layer_1/intermediate/dense/add, Execution Time: 0.001318 seconds

Node: bert/encoder/layer_1/intermediate/dense/mul_1, Execution Time: 0.001405 seconds

Node: bert/encoder/layer_1/intermediate/dense/Tanh, Execution Time: 0.001327 seconds

Add Node: bert/encoder/layer_1/intermediate/dense/add_1, Execution Time: 0.001342 seconds

Node: bert/encoder/layer_1/intermediate/dense/mul_2, Execution Time: 0.001412 seconds

Node: bert/encoder/layer_1/intermediate/dense/mul_3, Execution Time: 0.001328 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_1/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/output/dense/MatMul, Execution Time: 0.000919 seconds

Add Node: bert/encoder/layer_1/output/dense/BiasAdd, Execution Time: 0.000513 seconds

Add Node: bert/encoder/layer_1/output/add, Execution Time: 0.000639 seconds

Node: bert/encoder/layer_1/output/LayerNorm/moments/mean, Execution Time: 0.000080 seconds

Node: bert/encoder/layer_1/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000468 seconds

Node: bert/encoder/layer_1/output/LayerNorm/moments/SquaredDifference__327, Execution Time: 0.000491 seconds

Node: bert/encoder/layer_1/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds

Add Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds

Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds

Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/Rsqrt__329, Execution Time: 0.000069 seconds

Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds

Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000468 seconds

Add Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000599 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/value/MatMul, Execution Time: 0.000905 seconds

Add Node: bert/encoder/layer_2/attention/self/value/BiasAdd, Execution Time: 0.000607 seconds

Node: bert/encoder/layer_2/attention/self/Reshape_2, Execution Time: 0.000028 seconds

Node: bert/encoder/layer_2/attention/self/transpose_2, Execution Time: 0.000581 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/query/MatMul, Execution Time: 0.000616 seconds

Add Node: bert/encoder/layer_2/attention/self/query/BiasAdd, Execution Time: 0.000477 seconds

Node: bert/encoder/layer_2/attention/self/Reshape, Execution Time: 0.000011 seconds

Node: bert/encoder/layer_2/attention/self/transpose, Execution Time: 0.000478 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/key/MatMul, Execution Time: 0.000656 seconds

Add Node: bert/encoder/layer_2/attention/self/key/BiasAdd, Execution Time: 0.000499 seconds

Node: bert/encoder/layer_2/attention/self/Reshape_1, Execution Time: 0.000010 seconds

Node: bert/encoder/layer_2/attention/self/MatMul__334, Execution Time: 0.000461 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/MatMul, Execution Time: 0.000500 seconds

Node: bert/encoder/layer_2/attention/self/Mul, Execution Time: 0.001413 seconds

Add Node: bert/encoder/layer_2/attention/self/add, Execution Time: 0.002262 seconds

Node: bert/encoder/layer_2/attention/self/Softmax, Execution Time: 0.001362 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/MatMul_1, Execution Time: 0.000561 seconds

Node: bert/encoder/layer_2/attention/self/transpose_3, Execution Time: 0.000498 seconds

Node: bert/encoder/layer_2/attention/self/Reshape_3, Execution Time: 0.000050 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_2/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/output/dense/MatMul, Execution Time: 0.000587 seconds

Add Node: bert/encoder/layer_2/attention/output/dense/BiasAdd, Execution Time: 0.000457 seconds

Add Node: bert/encoder/layer_2/attention/output/add, Execution Time: 0.000584 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/mean, Execution Time: 0.000088 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000456 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/SquaredDifference__337, Execution Time: 0.000495 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds

Add Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/Rsqrt__339, Execution Time: 0.000074 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000058 seconds

Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000442 seconds

Add Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000456 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_2/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/intermediate/dense/MatMul, Execution Time: 0.000642 seconds

Add Node: bert/encoder/layer_2/intermediate/dense/BiasAdd, Execution Time: 0.001408 seconds

Node: bert/encoder/layer_2/intermediate/dense/Pow, Execution Time: 0.001425 seconds

Node: bert/encoder/layer_2/intermediate/dense/mul, Execution Time: 0.001326 seconds

Add Node: bert/encoder/layer_2/intermediate/dense/add, Execution Time: 0.001330 seconds

Node: bert/encoder/layer_2/intermediate/dense/mul_1, Execution Time: 0.001393 seconds

Node: bert/encoder/layer_2/intermediate/dense/Tanh, Execution Time: 0.001312 seconds

Add Node: bert/encoder/layer_2/intermediate/dense/add_1, Execution Time: 0.001741 seconds

Node: bert/encoder/layer_2/intermediate/dense/mul_2, Execution Time: 0.001384 seconds

Node: bert/encoder/layer_2/intermediate/dense/mul_3, Execution Time: 0.001297 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_2/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/output/dense/MatMul, Execution Time: 0.000920 seconds

Add Node: bert/encoder/layer_2/output/dense/BiasAdd, Execution Time: 0.000510 seconds

Add Node: bert/encoder/layer_2/output/add, Execution Time: 0.000488 seconds

Node: bert/encoder/layer_2/output/LayerNorm/moments/mean, Execution Time: 0.000071 seconds

Node: bert/encoder/layer_2/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000541 seconds

Node: bert/encoder/layer_2/output/LayerNorm/moments/SquaredDifference__341, Execution Time: 0.000462 seconds

Node: bert/encoder/layer_2/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds

Add Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds

Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds

Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/Rsqrt__343, Execution Time: 0.000073 seconds

Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds

Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds

Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000454 seconds

Add Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000455 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/value/MatMul, Execution Time: 0.000614 seconds

Add Node: bert/encoder/layer_3/attention/self/value/BiasAdd, Execution Time: 0.000466 seconds

Node: bert/encoder/layer_3/attention/self/Reshape_2, Execution Time: 0.000020 seconds

Node: bert/encoder/layer_3/attention/self/transpose_2, Execution Time: 0.000468 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/query/MatMul, Execution Time: 0.000611 seconds

Add Node: bert/encoder/layer_3/attention/self/query/BiasAdd, Execution Time: 0.000453 seconds

Node: bert/encoder/layer_3/attention/self/Reshape, Execution Time: 0.000010 seconds

Node: bert/encoder/layer_3/attention/self/transpose, Execution Time: 0.000478 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/key/MatMul, Execution Time: 0.000578 seconds

Add Node: bert/encoder/layer_3/attention/self/key/BiasAdd, Execution Time: 0.000452 seconds

Node: bert/encoder/layer_3/attention/self/Reshape_1, Execution Time: 0.000009 seconds

Node: bert/encoder/layer_3/attention/self/MatMul__348, Execution Time: 0.000477 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/MatMul, Execution Time: 0.001466 seconds

Node: bert/encoder/layer_3/attention/self/Mul, Execution Time: 0.001347 seconds

Add Node: bert/encoder/layer_3/attention/self/add, Execution Time: 0.001328 seconds

Node: bert/encoder/layer_3/attention/self/Softmax, Execution Time: 0.001364 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/MatMul_1, Execution Time: 0.000567 seconds

Node: bert/encoder/layer_3/attention/self/transpose_3, Execution Time: 0.000470 seconds

Node: bert/encoder/layer_3/attention/self/Reshape_3, Execution Time: 0.000048 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_3/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/output/dense/MatMul, Execution Time: 0.000573 seconds

Add Node: bert/encoder/layer_3/attention/output/dense/BiasAdd, Execution Time: 0.000461 seconds

Add Node: bert/encoder/layer_3/attention/output/add, Execution Time: 0.000479 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/mean, Execution Time: 0.000068 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000468 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/SquaredDifference__351, Execution Time: 0.000559 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds

Add Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000054 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/Rsqrt__353, Execution Time: 0.000068 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000042 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000459 seconds

Add Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000474 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_3/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/intermediate/dense/MatMul, Execution Time: 0.000606 seconds

Add Node: bert/encoder/layer_3/intermediate/dense/BiasAdd, Execution Time: 0.001397 seconds

Node: bert/encoder/layer_3/intermediate/dense/Pow, Execution Time: 0.001356 seconds

Node: bert/encoder/layer_3/intermediate/dense/mul, Execution Time: 0.001531 seconds

Add Node: bert/encoder/layer_3/intermediate/dense/add, Execution Time: 0.001359 seconds

Node: bert/encoder/layer_3/intermediate/dense/mul_1, Execution Time: 0.001323 seconds

Node: bert/encoder/layer_3/intermediate/dense/Tanh, Execution Time: 0.001316 seconds

Add Node: bert/encoder/layer_3/intermediate/dense/add_1, Execution Time: 0.001360 seconds

Node: bert/encoder/layer_3/intermediate/dense/mul_2, Execution Time: 0.001329 seconds

Node: bert/encoder/layer_3/intermediate/dense/mul_3, Execution Time: 0.001352 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_3/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/output/dense/MatMul, Execution Time: 0.000910 seconds

Add Node: bert/encoder/layer_3/output/dense/BiasAdd, Execution Time: 0.000477 seconds

Add Node: bert/encoder/layer_3/output/add, Execution Time: 0.000456 seconds

Node: bert/encoder/layer_3/output/LayerNorm/moments/mean, Execution Time: 0.000070 seconds

Node: bert/encoder/layer_3/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000571 seconds

Node: bert/encoder/layer_3/output/LayerNorm/moments/SquaredDifference__355, Execution Time: 0.000565 seconds

Node: bert/encoder/layer_3/output/LayerNorm/moments/variance, Execution Time: 0.000060 seconds

Add Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/add, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000064 seconds

Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/Rsqrt__357, Execution Time: 0.000086 seconds

Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul, Execution Time: 0.000064 seconds

Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000057 seconds

Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/sub, Execution Time: 0.000059 seconds

Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000572 seconds

Add Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000580 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/value/MatMul, Execution Time: 0.000795 seconds

Add Node: bert/encoder/layer_4/attention/self/value/BiasAdd, Execution Time: 0.000488 seconds

Node: bert/encoder/layer_4/attention/self/Reshape_2, Execution Time: 0.000020 seconds

Node: bert/encoder/layer_4/attention/self/transpose_2, Execution Time: 0.000460 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/query/MatMul, Execution Time: 0.000605 seconds

Add Node: bert/encoder/layer_4/attention/self/query/BiasAdd, Execution Time: 0.000484 seconds

Node: bert/encoder/layer_4/attention/self/Reshape, Execution Time: 0.000010 seconds

Node: bert/encoder/layer_4/attention/self/transpose, Execution Time: 0.000438 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/key/MatMul, Execution Time: 0.000582 seconds

Add Node: bert/encoder/layer_4/attention/self/key/BiasAdd, Execution Time: 0.000486 seconds

Node: bert/encoder/layer_4/attention/self/Reshape_1, Execution Time: 0.000009 seconds

Node: bert/encoder/layer_4/attention/self/MatMul__362, Execution Time: 0.000439 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/MatMul, Execution Time: 0.000488 seconds

Node: bert/encoder/layer_4/attention/self/Mul, Execution Time: 0.001312 seconds

Add Node: bert/encoder/layer_4/attention/self/add, Execution Time: 0.001385 seconds

Node: bert/encoder/layer_4/attention/self/Softmax, Execution Time: 0.001311 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/MatMul_1, Execution Time: 0.000636 seconds

Node: bert/encoder/layer_4/attention/self/transpose_3, Execution Time: 0.000449 seconds

Node: bert/encoder/layer_4/attention/self/Reshape_3, Execution Time: 0.000038 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_4/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/output/dense/MatMul, Execution Time: 0.000573 seconds

Add Node: bert/encoder/layer_4/attention/output/dense/BiasAdd, Execution Time: 0.000459 seconds

Add Node: bert/encoder/layer_4/attention/output/add, Execution Time: 0.000449 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/mean, Execution Time: 0.000083 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000516 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/SquaredDifference__365, Execution Time: 0.000445 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/variance, Execution Time: 0.000059 seconds

Add Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/Rsqrt__367, Execution Time: 0.000067 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000059 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000057 seconds

Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000445 seconds

Add Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000447 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_4/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/intermediate/dense/MatMul, Execution Time: 0.000721 seconds

Add Node: bert/encoder/layer_4/intermediate/dense/BiasAdd, Execution Time: 0.001380 seconds

Node: bert/encoder/layer_4/intermediate/dense/Pow, Execution Time: 0.001323 seconds

Node: bert/encoder/layer_4/intermediate/dense/mul, Execution Time: 0.001327 seconds

Add Node: bert/encoder/layer_4/intermediate/dense/add, Execution Time: 0.001417 seconds

Node: bert/encoder/layer_4/intermediate/dense/mul_1, Execution Time: 0.001328 seconds

Node: bert/encoder/layer_4/intermediate/dense/Tanh, Execution Time: 0.001388 seconds

Add Node: bert/encoder/layer_4/intermediate/dense/add_1, Execution Time: 0.001321 seconds

Node: bert/encoder/layer_4/intermediate/dense/mul_2, Execution Time: 0.001313 seconds

Node: bert/encoder/layer_4/intermediate/dense/mul_3, Execution Time: 0.001348 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_4/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/output/dense/MatMul, Execution Time: 0.000919 seconds

Add Node: bert/encoder/layer_4/output/dense/BiasAdd, Execution Time: 0.000462 seconds

Add Node: bert/encoder/layer_4/output/add, Execution Time: 0.000495 seconds

Node: bert/encoder/layer_4/output/LayerNorm/moments/mean, Execution Time: 0.000070 seconds

Node: bert/encoder/layer_4/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000446 seconds

Node: bert/encoder/layer_4/output/LayerNorm/moments/SquaredDifference__369, Execution Time: 0.000488 seconds

Node: bert/encoder/layer_4/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds

Add Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/add, Execution Time: 0.000041 seconds

Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds

Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/Rsqrt__371, Execution Time: 0.000070 seconds

Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul, Execution Time: 0.000061 seconds

Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000044 seconds

Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/sub, Execution Time: 0.000043 seconds

Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000455 seconds

Add Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000448 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/value/MatMul, Execution Time: 0.000642 seconds

Add Node: bert/encoder/layer_5/attention/self/value/BiasAdd, Execution Time: 0.000496 seconds

Node: bert/encoder/layer_5/attention/self/Reshape_2, Execution Time: 0.000020 seconds

Node: bert/encoder/layer_5/attention/self/transpose_2, Execution Time: 0.000448 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/query/MatMul, Execution Time: 0.000588 seconds

Add Node: bert/encoder/layer_5/attention/self/query/BiasAdd, Execution Time: 0.000455 seconds

Node: bert/encoder/layer_5/attention/self/Reshape, Execution Time: 0.000014 seconds

Node: bert/encoder/layer_5/attention/self/transpose, Execution Time: 0.000442 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/key/MatMul, Execution Time: 0.000567 seconds

Add Node: bert/encoder/layer_5/attention/self/key/BiasAdd, Execution Time: 0.000444 seconds

Node: bert/encoder/layer_5/attention/self/Reshape_1, Execution Time: 0.000013 seconds

Node: bert/encoder/layer_5/attention/self/MatMul__376, Execution Time: 0.000500 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/MatMul, Execution Time: 0.000501 seconds

Node: bert/encoder/layer_5/attention/self/Mul, Execution Time: 0.001309 seconds

Add Node: bert/encoder/layer_5/attention/self/add, Execution Time: 0.001395 seconds

Node: bert/encoder/layer_5/attention/self/Softmax, Execution Time: 0.001304 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/MatMul_1, Execution Time: 0.000555 seconds

Node: bert/encoder/layer_5/attention/self/transpose_3, Execution Time: 0.000481 seconds

Node: bert/encoder/layer_5/attention/self/Reshape_3, Execution Time: 0.000047 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_5/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/output/dense/MatMul, Execution Time: 0.000663 seconds

Add Node: bert/encoder/layer_5/attention/output/dense/BiasAdd, Execution Time: 0.000540 seconds

Add Node: bert/encoder/layer_5/attention/output/add, Execution Time: 0.000479 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/mean, Execution Time: 0.000067 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000482 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/SquaredDifference__379, Execution Time: 0.000475 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/variance, Execution Time: 0.000056 seconds

Add Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000054 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/Rsqrt__381, Execution Time: 0.000068 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000045 seconds

Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000464 seconds

Add Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000575 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_5/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/intermediate/dense/MatMul, Execution Time: 0.000763 seconds

Add Node: bert/encoder/layer_5/intermediate/dense/BiasAdd, Execution Time: 0.001429 seconds

Node: bert/encoder/layer_5/intermediate/dense/Pow, Execution Time: 0.001294 seconds

Node: bert/encoder/layer_5/intermediate/dense/mul, Execution Time: 0.001361 seconds

Add Node: bert/encoder/layer_5/intermediate/dense/add, Execution Time: 0.001307 seconds

Node: bert/encoder/layer_5/intermediate/dense/mul_1, Execution Time: 0.001307 seconds

Node: bert/encoder/layer_5/intermediate/dense/Tanh, Execution Time: 0.001370 seconds

Add Node: bert/encoder/layer_5/intermediate/dense/add_1, Execution Time: 0.001283 seconds

Node: bert/encoder/layer_5/intermediate/dense/mul_2, Execution Time: 0.001304 seconds

Node: bert/encoder/layer_5/intermediate/dense/mul_3, Execution Time: 0.001364 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_5/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/output/dense/MatMul, Execution Time: 0.001011 seconds

Add Node: bert/encoder/layer_5/output/dense/BiasAdd, Execution Time: 0.000497 seconds

Add Node: bert/encoder/layer_5/output/add, Execution Time: 0.000463 seconds

Node: bert/encoder/layer_5/output/LayerNorm/moments/mean, Execution Time: 0.000083 seconds

Node: bert/encoder/layer_5/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000456 seconds

Node: bert/encoder/layer_5/output/LayerNorm/moments/SquaredDifference__383, Execution Time: 0.000471 seconds

Node: bert/encoder/layer_5/output/LayerNorm/moments/variance, Execution Time: 0.000056 seconds

Add Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds

Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds

Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/Rsqrt__385, Execution Time: 0.000067 seconds

Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000479 seconds

Add Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000451 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/value/MatMul, Execution Time: 0.000685 seconds

Add Node: bert/encoder/layer_6/attention/self/value/BiasAdd, Execution Time: 0.000451 seconds

Node: bert/encoder/layer_6/attention/self/Reshape_2, Execution Time: 0.000020 seconds

Node: bert/encoder/layer_6/attention/self/transpose_2, Execution Time: 0.000459 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/query/MatMul, Execution Time: 0.000654 seconds

Add Node: bert/encoder/layer_6/attention/self/query/BiasAdd, Execution Time: 0.000448 seconds

Node: bert/encoder/layer_6/attention/self/Reshape, Execution Time: 0.000009 seconds

Node: bert/encoder/layer_6/attention/self/transpose, Execution Time: 0.000467 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/key/MatMul, Execution Time: 0.000576 seconds

Add Node: bert/encoder/layer_6/attention/self/key/BiasAdd, Execution Time: 0.000455 seconds

Node: bert/encoder/layer_6/attention/self/Reshape_1, Execution Time: 0.000010 seconds

Node: bert/encoder/layer_6/attention/self/MatMul__390, Execution Time: 0.000441 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/MatMul, Execution Time: 0.000488 seconds

Node: bert/encoder/layer_6/attention/self/Mul, Execution Time: 0.001314 seconds

Add Node: bert/encoder/layer_6/attention/self/add, Execution Time: 0.001356 seconds

Node: bert/encoder/layer_6/attention/self/Softmax, Execution Time: 0.001345 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/MatMul_1, Execution Time: 0.000570 seconds

Node: bert/encoder/layer_6/attention/self/transpose_3, Execution Time: 0.000473 seconds

Node: bert/encoder/layer_6/attention/self/Reshape_3, Execution Time: 0.000037 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_6/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/output/dense/MatMul, Execution Time: 0.000584 seconds

Add Node: bert/encoder/layer_6/attention/output/dense/BiasAdd, Execution Time: 0.000483 seconds

Add Node: bert/encoder/layer_6/attention/output/add, Execution Time: 0.000607 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/mean, Execution Time: 0.000073 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000443 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/SquaredDifference__393, Execution Time: 0.000462 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds

Add Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000042 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/Rsqrt__395, Execution Time: 0.000072 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000055 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000473 seconds

Add Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000446 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_6/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/intermediate/dense/MatMul, Execution Time: 0.000619 seconds

Add Node: bert/encoder/layer_6/intermediate/dense/BiasAdd, Execution Time: 0.001369 seconds

Node: bert/encoder/layer_6/intermediate/dense/Pow, Execution Time: 0.001318 seconds

Node: bert/encoder/layer_6/intermediate/dense/mul, Execution Time: 0.001365 seconds

Add Node: bert/encoder/layer_6/intermediate/dense/add, Execution Time: 0.001338 seconds

Node: bert/encoder/layer_6/intermediate/dense/mul_1, Execution Time: 0.001392 seconds

Node: bert/encoder/layer_6/intermediate/dense/Tanh, Execution Time: 0.001564 seconds

Add Node: bert/encoder/layer_6/intermediate/dense/add_1, Execution Time: 0.001328 seconds

Node: bert/encoder/layer_6/intermediate/dense/mul_2, Execution Time: 0.001371 seconds

Node: bert/encoder/layer_6/intermediate/dense/mul_3, Execution Time: 0.001315 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_6/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/output/dense/MatMul, Execution Time: 0.000912 seconds

Add Node: bert/encoder/layer_6/output/dense/BiasAdd, Execution Time: 0.000472 seconds

Add Node: bert/encoder/layer_6/output/add, Execution Time: 0.000454 seconds

Node: bert/encoder/layer_6/output/LayerNorm/moments/mean, Execution Time: 0.000080 seconds

Node: bert/encoder/layer_6/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000532 seconds

Node: bert/encoder/layer_6/output/LayerNorm/moments/SquaredDifference__397, Execution Time: 0.000452 seconds

Node: bert/encoder/layer_6/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds

Add Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/add, Execution Time: 0.000052 seconds

Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds

Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/Rsqrt__399, Execution Time: 0.000067 seconds

Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds

Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000479 seconds

Add Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000470 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/value/MatMul, Execution Time: 0.000731 seconds

Add Node: bert/encoder/layer_7/attention/self/value/BiasAdd, Execution Time: 0.000454 seconds

Node: bert/encoder/layer_7/attention/self/Reshape_2, Execution Time: 0.000020 seconds

Node: bert/encoder/layer_7/attention/self/transpose_2, Execution Time: 0.000461 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/query/MatMul, Execution Time: 0.000590 seconds

Add Node: bert/encoder/layer_7/attention/self/query/BiasAdd, Execution Time: 0.000451 seconds

Node: bert/encoder/layer_7/attention/self/Reshape, Execution Time: 0.000009 seconds

Node: bert/encoder/layer_7/attention/self/transpose, Execution Time: 0.000524 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/key/MatMul, Execution Time: 0.000639 seconds

Add Node: bert/encoder/layer_7/attention/self/key/BiasAdd, Execution Time: 0.000482 seconds

Node: bert/encoder/layer_7/attention/self/Reshape_1, Execution Time: 0.000009 seconds

Node: bert/encoder/layer_7/attention/self/MatMul__404, Execution Time: 0.000479 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/MatMul, Execution Time: 0.000487 seconds

Node: bert/encoder/layer_7/attention/self/Mul, Execution Time: 0.001356 seconds

Add Node: bert/encoder/layer_7/attention/self/add, Execution Time: 0.001314 seconds

Node: bert/encoder/layer_7/attention/self/Softmax, Execution Time: 0.001310 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/MatMul_1, Execution Time: 0.000533 seconds

Node: bert/encoder/layer_7/attention/self/transpose_3, Execution Time: 0.000475 seconds

Node: bert/encoder/layer_7/attention/self/Reshape_3, Execution Time: 0.000043 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_7/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/output/dense/MatMul, Execution Time: 0.000734 seconds

Add Node: bert/encoder/layer_7/attention/output/dense/BiasAdd, Execution Time: 0.000624 seconds

Add Node: bert/encoder/layer_7/attention/output/add, Execution Time: 0.000640 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/mean, Execution Time: 0.000101 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000620 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/SquaredDifference__407, Execution Time: 0.000822 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/variance, Execution Time: 0.000097 seconds

Add Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000078 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000085 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/Rsqrt__409, Execution Time: 0.000116 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000092 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000081 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000086 seconds

Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000847 seconds

Add Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000706 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_7/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/intermediate/dense/MatMul, Execution Time: 0.000950 seconds

Add Node: bert/encoder/layer_7/intermediate/dense/BiasAdd, Execution Time: 0.001974 seconds

Node: bert/encoder/layer_7/intermediate/dense/Pow, Execution Time: 0.001916 seconds

Node: bert/encoder/layer_7/intermediate/dense/mul, Execution Time: 0.002038 seconds

Add Node: bert/encoder/layer_7/intermediate/dense/add, Execution Time: 0.001887 seconds

Node: bert/encoder/layer_7/intermediate/dense/mul_1, Execution Time: 0.001875 seconds

Node: bert/encoder/layer_7/intermediate/dense/Tanh, Execution Time: 0.002064 seconds

Add Node: bert/encoder/layer_7/intermediate/dense/add_1, Execution Time: 0.001889 seconds

Node: bert/encoder/layer_7/intermediate/dense/mul_2, Execution Time: 0.001939 seconds

Node: bert/encoder/layer_7/intermediate/dense/mul_3, Execution Time: 0.001944 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_7/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/output/dense/MatMul, Execution Time: 0.001181 seconds

Add Node: bert/encoder/layer_7/output/dense/BiasAdd, Execution Time: 0.000527 seconds

Add Node: bert/encoder/layer_7/output/add, Execution Time: 0.000661 seconds

Node: bert/encoder/layer_7/output/LayerNorm/moments/mean, Execution Time: 0.000089 seconds

Node: bert/encoder/layer_7/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000520 seconds

Node: bert/encoder/layer_7/output/LayerNorm/moments/SquaredDifference__411, Execution Time: 0.000544 seconds

Node: bert/encoder/layer_7/output/LayerNorm/moments/variance, Execution Time: 0.000075 seconds

Add Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/add, Execution Time: 0.000050 seconds

Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/Rsqrt__413, Execution Time: 0.000129 seconds

Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul, Execution Time: 0.000044 seconds

Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000043 seconds

Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/sub, Execution Time: 0.000058 seconds

Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000562 seconds

Add Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000626 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/value/MatMul, Execution Time: 0.000742 seconds

Add Node: bert/encoder/layer_8/attention/self/value/BiasAdd, Execution Time: 0.000571 seconds

Node: bert/encoder/layer_8/attention/self/Reshape_2, Execution Time: 0.000023 seconds

Node: bert/encoder/layer_8/attention/self/transpose_2, Execution Time: 0.000514 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/query/MatMul, Execution Time: 0.000766 seconds

Add Node: bert/encoder/layer_8/attention/self/query/BiasAdd, Execution Time: 0.000573 seconds

Node: bert/encoder/layer_8/attention/self/Reshape, Execution Time: 0.000023 seconds

Node: bert/encoder/layer_8/attention/self/transpose, Execution Time: 0.000567 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/key/MatMul, Execution Time: 0.000825 seconds

Add Node: bert/encoder/layer_8/attention/self/key/BiasAdd, Execution Time: 0.000538 seconds

Node: bert/encoder/layer_8/attention/self/Reshape_1, Execution Time: 0.000022 seconds

Node: bert/encoder/layer_8/attention/self/MatMul__418, Execution Time: 0.000509 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/MatMul, Execution Time: 0.000715 seconds

Node: bert/encoder/layer_8/attention/self/Mul, Execution Time: 0.001661 seconds

Add Node: bert/encoder/layer_8/attention/self/add, Execution Time: 0.001515 seconds

Node: bert/encoder/layer_8/attention/self/Softmax, Execution Time: 0.001514 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/MatMul_1, Execution Time: 0.000726 seconds

Node: bert/encoder/layer_8/attention/self/transpose_3, Execution Time: 0.000521 seconds

Node: bert/encoder/layer_8/attention/self/Reshape_3, Execution Time: 0.000063 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_8/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/output/dense/MatMul, Execution Time: 0.000861 seconds

Add Node: bert/encoder/layer_8/attention/output/dense/BiasAdd, Execution Time: 0.000516 seconds

Add Node: bert/encoder/layer_8/attention/output/add, Execution Time: 0.000510 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/mean, Execution Time: 0.000094 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000504 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/SquaredDifference__421, Execution Time: 0.000531 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/variance, Execution Time: 0.000079 seconds

Add Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000049 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/Rsqrt__423, Execution Time: 0.000087 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000065 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000063 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000084 seconds

Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000503 seconds

Add Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000522 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_8/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/intermediate/dense/MatMul, Execution Time: 0.000727 seconds

Add Node: bert/encoder/layer_8/intermediate/dense/BiasAdd, Execution Time: 0.001507 seconds

Node: bert/encoder/layer_8/intermediate/dense/Pow, Execution Time: 0.001634 seconds

Node: bert/encoder/layer_8/intermediate/dense/mul, Execution Time: 0.001581 seconds

Add Node: bert/encoder/layer_8/intermediate/dense/add, Execution Time: 0.001411 seconds

Node: bert/encoder/layer_8/intermediate/dense/mul_1, Execution Time: 0.002158 seconds

Node: bert/encoder/layer_8/intermediate/dense/Tanh, Execution Time: 0.002181 seconds

Add Node: bert/encoder/layer_8/intermediate/dense/add_1, Execution Time: 0.002447 seconds

Node: bert/encoder/layer_8/intermediate/dense/mul_2, Execution Time: 0.001522 seconds

Node: bert/encoder/layer_8/intermediate/dense/mul_3, Execution Time: 0.001564 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_8/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/output/dense/MatMul, Execution Time: 0.001133 seconds

Add Node: bert/encoder/layer_8/output/dense/BiasAdd, Execution Time: 0.000553 seconds

Add Node: bert/encoder/layer_8/output/add, Execution Time: 0.000525 seconds

Node: bert/encoder/layer_8/output/LayerNorm/moments/mean, Execution Time: 0.000081 seconds

Node: bert/encoder/layer_8/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000554 seconds

Node: bert/encoder/layer_8/output/LayerNorm/moments/SquaredDifference__425, Execution Time: 0.000521 seconds

Node: bert/encoder/layer_8/output/LayerNorm/moments/variance, Execution Time: 0.000072 seconds

Add Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/add, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000072 seconds

Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/Rsqrt__427, Execution Time: 0.000072 seconds

Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul, Execution Time: 0.000059 seconds

Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000055 seconds

Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/sub, Execution Time: 0.000055 seconds

Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000489 seconds

Add Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000502 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/value/MatMul, Execution Time: 0.000749 seconds

Add Node: bert/encoder/layer_9/attention/self/value/BiasAdd, Execution Time: 0.000525 seconds

Node: bert/encoder/layer_9/attention/self/Reshape_2, Execution Time: 0.000023 seconds

Node: bert/encoder/layer_9/attention/self/transpose_2, Execution Time: 0.000478 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/query/MatMul, Execution Time: 0.000729 seconds

Add Node: bert/encoder/layer_9/attention/self/query/BiasAdd, Execution Time: 0.000517 seconds

Node: bert/encoder/layer_9/attention/self/Reshape, Execution Time: 0.000029 seconds

Node: bert/encoder/layer_9/attention/self/transpose, Execution Time: 0.000518 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/key/MatMul, Execution Time: 0.000738 seconds

Add Node: bert/encoder/layer_9/attention/self/key/BiasAdd, Execution Time: 0.000548 seconds

Node: bert/encoder/layer_9/attention/self/Reshape_1, Execution Time: 0.000026 seconds

Node: bert/encoder/layer_9/attention/self/MatMul__432, Execution Time: 0.000496 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/MatMul, Execution Time: 0.000644 seconds

Node: bert/encoder/layer_9/attention/self/Mul, Execution Time: 0.001557 seconds

Add Node: bert/encoder/layer_9/attention/self/add, Execution Time: 0.001600 seconds

Node: bert/encoder/layer_9/attention/self/Softmax, Execution Time: 0.001492 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/MatMul_1, Execution Time: 0.000706 seconds

Node: bert/encoder/layer_9/attention/self/transpose_3, Execution Time: 0.000526 seconds

Node: bert/encoder/layer_9/attention/self/Reshape_3, Execution Time: 0.000126 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_9/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/output/dense/MatMul, Execution Time: 0.000759 seconds

Add Node: bert/encoder/layer_9/attention/output/dense/BiasAdd, Execution Time: 0.000531 seconds

Add Node: bert/encoder/layer_9/attention/output/add, Execution Time: 0.000754 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/mean, Execution Time: 0.000087 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000511 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/SquaredDifference__435, Execution Time: 0.000521 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/variance, Execution Time: 0.000084 seconds

Add Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000048 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/Rsqrt__437, Execution Time: 0.000089 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000059 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000505 seconds

Add Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000526 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_9/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/intermediate/dense/MatMul, Execution Time: 0.000951 seconds

Add Node: bert/encoder/layer_9/intermediate/dense/BiasAdd, Execution Time: 0.001550 seconds

Node: bert/encoder/layer_9/intermediate/dense/Pow, Execution Time: 0.001605 seconds

Node: bert/encoder/layer_9/intermediate/dense/mul, Execution Time: 0.001486 seconds

Add Node: bert/encoder/layer_9/intermediate/dense/add, Execution Time: 0.001552 seconds

Node: bert/encoder/layer_9/intermediate/dense/mul_1, Execution Time: 0.001474 seconds

Node: bert/encoder/layer_9/intermediate/dense/Tanh, Execution Time: 0.001496 seconds

Add Node: bert/encoder/layer_9/intermediate/dense/add_1, Execution Time: 0.001672 seconds

Node: bert/encoder/layer_9/intermediate/dense/mul_2, Execution Time: 0.001510 seconds

Node: bert/encoder/layer_9/intermediate/dense/mul_3, Execution Time: 0.001506 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_9/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/output/dense/MatMul, Execution Time: 0.000965 seconds

Add Node: bert/encoder/layer_9/output/dense/BiasAdd, Execution Time: 0.000566 seconds

Add Node: bert/encoder/layer_9/output/add, Execution Time: 0.000555 seconds

Node: bert/encoder/layer_9/output/LayerNorm/moments/mean, Execution Time: 0.000087 seconds

Node: bert/encoder/layer_9/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000504 seconds

Node: bert/encoder/layer_9/output/LayerNorm/moments/SquaredDifference__439, Execution Time: 0.000708 seconds

Node: bert/encoder/layer_9/output/LayerNorm/moments/variance, Execution Time: 0.000077 seconds

Add Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/add, Execution Time: 0.000058 seconds

Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000055 seconds

Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/Rsqrt__441, Execution Time: 0.000077 seconds

Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul, Execution Time: 0.000058 seconds

Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds

Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/sub, Execution Time: 0.000047 seconds

Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000488 seconds

Add Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000522 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/value/MatMul, Execution Time: 0.002145 seconds

Add Node: bert/encoder/layer_10/attention/self/value/BiasAdd, Execution Time: 0.000565 seconds

Node: bert/encoder/layer_10/attention/self/Reshape_2, Execution Time: 0.000023 seconds

Node: bert/encoder/layer_10/attention/self/transpose_2, Execution Time: 0.000578 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/query/MatMul, Execution Time: 0.000732 seconds

Add Node: bert/encoder/layer_10/attention/self/query/BiasAdd, Execution Time: 0.000525 seconds

Node: bert/encoder/layer_10/attention/self/Reshape, Execution Time: 0.000022 seconds

Node: bert/encoder/layer_10/attention/self/transpose, Execution Time: 0.000506 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/key/MatMul, Execution Time: 0.000711 seconds

Add Node: bert/encoder/layer_10/attention/self/key/BiasAdd, Execution Time: 0.000510 seconds

Node: bert/encoder/layer_10/attention/self/Reshape_1, Execution Time: 0.000021 seconds

Node: bert/encoder/layer_10/attention/self/MatMul__446, Execution Time: 0.000484 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/MatMul, Execution Time: 0.000691 seconds

Node: bert/encoder/layer_10/attention/self/Mul, Execution Time: 0.001509 seconds

Add Node: bert/encoder/layer_10/attention/self/add, Execution Time: 0.001477 seconds

Node: bert/encoder/layer_10/attention/self/Softmax, Execution Time: 0.001505 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/MatMul_1, Execution Time: 0.000802 seconds

Node: bert/encoder/layer_10/attention/self/transpose_3, Execution Time: 0.000508 seconds

Node: bert/encoder/layer_10/attention/self/Reshape_3, Execution Time: 0.000071 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_10/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/output/dense/MatMul, Execution Time: 0.001301 seconds

Add Node: bert/encoder/layer_10/attention/output/dense/BiasAdd, Execution Time: 0.000725 seconds

Add Node: bert/encoder/layer_10/attention/output/add, Execution Time: 0.000648 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/mean, Execution Time: 0.000116 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000646 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/SquaredDifference__449, Execution Time: 0.000779 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/variance, Execution Time: 0.000078 seconds

Add Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000062 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/Rsqrt__451, Execution Time: 0.000078 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000068 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000069 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000544 seconds

Add Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000511 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_10/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/intermediate/dense/MatMul, Execution Time: 0.000758 seconds

Add Node: bert/encoder/layer_10/intermediate/dense/BiasAdd, Execution Time: 0.001694 seconds

Node: bert/encoder/layer_10/intermediate/dense/Pow, Execution Time: 0.001672 seconds

Node: bert/encoder/layer_10/intermediate/dense/mul, Execution Time: 0.001566 seconds

Add Node: bert/encoder/layer_10/intermediate/dense/add, Execution Time: 0.001636 seconds

Node: bert/encoder/layer_10/intermediate/dense/mul_1, Execution Time: 0.001593 seconds

Node: bert/encoder/layer_10/intermediate/dense/Tanh, Execution Time: 0.001675 seconds

Add Node: bert/encoder/layer_10/intermediate/dense/add_1, Execution Time: 0.001609 seconds

Node: bert/encoder/layer_10/intermediate/dense/mul_2, Execution Time: 0.001731 seconds

Node: bert/encoder/layer_10/intermediate/dense/mul_3, Execution Time: 0.001667 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_10/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/output/dense/MatMul, Execution Time: 0.001178 seconds

Add Node: bert/encoder/layer_10/output/dense/BiasAdd, Execution Time: 0.000525 seconds

Add Node: bert/encoder/layer_10/output/add, Execution Time: 0.000566 seconds

Node: bert/encoder/layer_10/output/LayerNorm/moments/mean, Execution Time: 0.000088 seconds

Node: bert/encoder/layer_10/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000522 seconds

Node: bert/encoder/layer_10/output/LayerNorm/moments/SquaredDifference__453, Execution Time: 0.000492 seconds

Node: bert/encoder/layer_10/output/LayerNorm/moments/variance, Execution Time: 0.000065 seconds

Add Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/add, Execution Time: 0.000057 seconds

Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000059 seconds

Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/Rsqrt__455, Execution Time: 0.000077 seconds

Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds

Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/sub, Execution Time: 0.000058 seconds

Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000527 seconds

Add Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000467 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/value/MatMul, Execution Time: 0.000770 seconds

Add Node: bert/encoder/layer_11/attention/self/value/BiasAdd, Execution Time: 0.000548 seconds

Node: bert/encoder/layer_11/attention/self/Reshape_2, Execution Time: 0.000024 seconds

Node: bert/encoder/layer_11/attention/self/transpose_2, Execution Time: 0.000496 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/query/MatMul, Execution Time: 0.000994 seconds

Add Node: bert/encoder/layer_11/attention/self/query/BiasAdd, Execution Time: 0.000512 seconds

Node: bert/encoder/layer_11/attention/self/Reshape, Execution Time: 0.000021 seconds

Node: bert/encoder/layer_11/attention/self/transpose, Execution Time: 0.000501 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/key/MatMul, Execution Time: 0.000724 seconds

Add Node: bert/encoder/layer_11/attention/self/key/BiasAdd, Execution Time: 0.000537 seconds

Node: bert/encoder/layer_11/attention/self/Reshape_1, Execution Time: 0.000020 seconds

Node: bert/encoder/layer_11/attention/self/MatMul__460, Execution Time: 0.000478 seconds

Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/MatMul, Execution Time: 0.000700 seconds

Node: bert/encoder/layer_11/attention/self/Mul, Execution Time: 0.001564 seconds

Add Node: bert/encoder/layer_11/attention/self/add, Execution Time: 0.001570 seconds

Node: bert/encoder/layer_11/attention/self/Softmax, Execution Time: 0.001483 seconds

Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/MatMul_1, Execution Time: 0.000719 seconds

Node: bert/encoder/layer_11/attention/self/transpose_3, Execution Time: 0.000530 seconds

Node: bert/encoder/layer_11/attention/self/Reshape_3, Execution Time: 0.000068 seconds

Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_11/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/output/dense/MatMul, Execution Time: 0.000749 seconds

Add Node: bert/encoder/layer_11/attention/output/dense/BiasAdd, Execution Time: 0.000514 seconds

Add Node: bert/encoder/layer_11/attention/output/add, Execution Time: 0.000556 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/mean, Execution Time: 0.000100 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000525 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/SquaredDifference__463, Execution Time: 0.000520 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/variance, Execution Time: 0.000067 seconds

Add Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000055 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000048 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/Rsqrt__465, Execution Time: 0.000086 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000049 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000064 seconds

Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000474 seconds

Add Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000592 seconds

Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_11/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/intermediate/dense/MatMul, Execution Time: 0.000806 seconds

Add Node: bert/encoder/layer_11/intermediate/dense/BiasAdd, Execution Time: 0.001625 seconds

Node: bert/encoder/layer_11/intermediate/dense/Pow, Execution Time: 0.001478 seconds

Node: bert/encoder/layer_11/intermediate/dense/mul, Execution Time: 0.001571 seconds

Add Node: bert/encoder/layer_11/intermediate/dense/add, Execution Time: 0.001557 seconds

Node: bert/encoder/layer_11/intermediate/dense/mul_1, Execution Time: 0.001958 seconds

Node: bert/encoder/layer_11/intermediate/dense/Tanh, Execution Time: 0.002749 seconds

Add Node: bert/encoder/layer_11/intermediate/dense/add_1, Execution Time: 0.001997 seconds

Node: bert/encoder/layer_11/intermediate/dense/mul_2, Execution Time: 0.001461 seconds

Node: bert/encoder/layer_11/intermediate/dense/mul_3, Execution Time: 0.001569 seconds

Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_11/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/output/dense/MatMul, Execution Time: 0.000994 seconds

Add Node: bert/encoder/layer_11/output/dense/BiasAdd, Execution Time: 0.000538 seconds

Add Node: bert/encoder/layer_11/output/add, Execution Time: 0.000495 seconds

Node: bert/encoder/layer_11/output/LayerNorm/moments/mean, Execution Time: 0.000099 seconds

Node: bert/encoder/layer_11/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000514 seconds

Node: bert/encoder/layer_11/output/LayerNorm/moments/SquaredDifference__467, Execution Time: 0.000520 seconds

Node: bert/encoder/layer_11/output/LayerNorm/moments/variance, Execution Time: 0.000106 seconds

Add Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/add, Execution Time: 0.000049 seconds

Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000053 seconds

Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/Rsqrt__469, Execution Time: 0.000093 seconds

Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul, Execution Time: 0.000062 seconds

Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000056 seconds

Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/sub, Execution Time: 0.000059 seconds

Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000480 seconds

Add Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000476 seconds

Input size: (None, 256, 768, 2)
No Add node related to MatMul output: MatMul. Executing regular MatMul.
MatMul Node: MatMul, Execution Time: 0.002046 seconds

Add Node: BiasAdd, Execution Time: 0.000067 seconds

Node: Reshape_1, Execution Time: 0.000024 seconds

Node: transpose, Execution Time: 0.000048 seconds

Node: unstack, Execution Time: 0.000057 seconds

Node: unstack__490, Execution Time: 0.000021 seconds

Node: unstack__488, Execution Time: 0.000011 seconds


Node Execution Times:

Total Execution Time: 0.519998 seconds

Total Matmul + Add Execution Time: 0.233186 seconds
Execution complete.
Model outputs: {'unstack:1': array([[-4.9148726, -4.6251225, -4.132886 , -4.1499195, -4.7828836,
        -4.250844 , -4.77094  , -4.348463 , -2.7006364, -4.424177 ,
        -4.510866 , -4.39433  , -4.773833 , -4.480716 , -4.7714205,
        -4.6485815, -3.1330094, -4.7139587, -4.7148943, -4.7223635,
        -4.7008233, -4.6960616, -4.7121487, -4.708615 , -4.703374 ,
        -4.7024655, -4.687359 , -4.693113 , -4.698162 , -4.692563 ,
        -4.711712 , -4.7003703, -4.7027717, -4.7279253, -4.709934 ,
        -4.715551 , -4.7324576, -4.7294855, -4.7329216, -4.7218866,
        -4.7014203, -4.694692 , -4.6925716, -4.700892 , -4.7044754,
        -4.68252  , -4.679993 , -4.6824126, -4.6833754, -4.690988 ,
        -4.695919 , -4.6797957, -4.683871 , -4.6834297, -4.680781 ,
        -4.686977 , -4.681429 , -4.680897 , -4.694978 , -4.685382 ,
        -4.70324  , -4.7010674, -4.693331 , -4.7089696, -4.71908  ,
        -4.7188516, -4.70435  , -4.685466 , -4.6962924, -4.6972375,
        -4.691828 , -4.688009 , -4.691449 , -4.693622 , -4.6890097,
        -4.6876435, -4.684474 , -4.7056074, -4.6984677, -4.7068577,
        -4.689911 , -4.687499 , -4.6927333, -4.693831 , -4.6965637,
        -4.693646 , -4.693519 , -4.71067  , -4.722037 , -4.718479 ,
        -4.729904 , -4.721483 , -4.739112 , -4.7325935, -4.7295456,
        -4.712435 , -4.712704 , -4.7114053, -4.712399 , -4.704262 ,
        -4.6972833, -4.6926665, -4.717176 , -4.6937675, -4.694539 ,
        -4.711683 , -4.685275 , -4.6935816, -4.701117 , -4.6866083,
        -4.6843753, -4.6876745, -4.684178 , -4.694061 , -4.6890798,
        -4.6861553, -4.7003927, -4.7103863, -4.710601 , -4.7194986,
        -4.7016277, -4.718649 , -4.743214 , -4.7109504, -4.711556 ,
        -4.7007613, -4.7009783, -4.6995244, -4.7007017, -4.7026825,
        -4.706376 , -4.7061615, -4.7284904, -4.724841 , -4.7082043,
        -4.7080393, -4.7098503, -4.7207146, -4.733838 , -4.7125974,
        -4.7276387, -4.721991 , -4.7300687, -4.7229652, -4.7133346,
        -4.7109923, -4.71963  , -4.7312083, -4.733224 , -4.7362647,
        -4.739877 , -4.74243  , -4.727128 , -4.737834 , -4.74598  ,
        -4.738839 , -4.744508 , -4.728359 , -4.726734 , -4.7255516,
        -4.7363386, -4.73214  , -4.7196693, -4.721826 , -4.7047076,
        -4.7190104, -4.7156587, -4.706273 , -4.7116737, -4.701518 ,
        -4.6943965, -4.6903934, -4.6890545, -4.6862764, -4.6875463,
        -4.684304 , -4.688264 , -4.691186 , -4.7027955, -4.6910152,
        -4.6985803, -4.7152886, -4.723945 , -4.7293673, -4.7427354,
        -4.73977  , -4.7290154, -4.7378254, -4.7355986, -4.731869 ,
        -4.724579 , -4.7262163, -4.71887  , -4.7058587, -4.7122684,
        -4.7009015, -4.696829 , -4.7094407, -4.703914 , -4.703702 ,
        -4.7195215, -4.7118044, -4.709847 , -4.721358 , -4.723019 ,
        -4.71298  , -4.7218485, -4.724691 , -4.725982 , -4.726673 ,
        -4.7187834, -4.709004 , -4.7109466, -4.737439 , -4.7246385,
        -4.73252  , -4.7404885, -4.7261868, -4.734698 , -4.732445 ,
        -4.736647 , -4.724646 , -4.73208  , -4.7321663, -4.7037077,
        -4.718028 , -4.726786 , -4.7345347, -4.7328334, -4.7220054,
        -4.7327023, -4.7200413, -4.7459936, -4.728972 , -4.7290406,
        -4.7259574, -4.730495 , -4.723769 , -4.7380366, -4.7268267,
        -4.692981 , -4.718449 , -4.6935935, -4.6961823, -4.713647 ,
        -4.6950507, -4.700345 , -4.7232556, -4.708386 , -4.737004 ,
        -4.7273254, -4.716681 , -4.7106347, -4.714922 , -4.7030454,
        -4.7468524]], dtype=float32), 'unstack:0': array([[-5.339778 , -4.878685 , -4.312428 , -4.3309417, -5.125337 ,
        -4.442749 , -5.1271124, -4.5656004, -4.683339 , -4.6350813,
        -4.8042274, -4.6028423, -5.1304255, -4.7185884, -5.0999007,
        -4.9003377, -5.1724668, -5.1058035, -5.1073008, -5.1120396,
        -5.0958624, -5.092071 , -5.104314 , -5.1013465, -5.0973773,
        -5.0955014, -5.086265 , -5.089708 , -5.093198 , -5.089909 ,
        -5.1028776, -5.0938663, -5.0976443, -5.1154556, -5.102868 ,
        -5.1068664, -5.1185074, -5.1169963, -5.118672 , -5.1110716,
        -5.0957775, -5.0914636, -5.089892 , -5.096351 , -5.099577 ,
        -5.084194 , -5.082636 , -5.0841656, -5.0848293, -5.089616 ,
        -5.0918293, -5.083179 , -5.084272 , -5.0856056, -5.0826926,
        -5.087329 , -5.0841713, -5.0831146, -5.092702 , -5.084974 ,
        -5.0978565, -5.0952926, -5.090936 , -5.102818 , -5.110067 ,
        -5.1097775, -5.0976253, -5.0851665, -5.0931044, -5.093152 ,
        -5.089941 , -5.0872903, -5.0898356, -5.0923924, -5.0875926,
        -5.086853 , -5.085301 , -5.100186 , -5.094749 , -5.099969 ,
        -5.0874996, -5.0855126, -5.0895004, -5.09137  , -5.0918326,
        -5.0898056, -5.090782 , -5.1034665, -5.112412 , -5.109096 ,
        -5.1174197, -5.1111536, -5.1241746, -5.1188   , -5.116848 ,
        -5.1029363, -5.1041894, -5.103745 , -5.105212 , -5.098095 ,
        -5.093282 , -5.090341 , -5.1087084, -5.0905395, -5.0906925,
        -5.1039257, -5.084995 , -5.090868 , -5.0939407, -5.0842586,
        -5.0840406, -5.0855136, -5.08409  , -5.089621 , -5.0858765,
        -5.0852404, -5.09481  , -5.1036887, -5.1036325, -5.1107006,
        -5.0964427, -5.109834 , -5.128194 , -5.104343 , -5.10455  ,
        -5.0965843, -5.0981956, -5.0968714, -5.0971923, -5.096769 ,
        -5.1019425, -5.1022315, -5.119105 , -5.116201 , -5.102627 ,
        -5.102922 , -5.1034007, -5.111492 , -5.121706 , -5.1049304,
        -5.116994 , -5.111964 , -5.1179514, -5.1140733, -5.1069007,
        -5.1045523, -5.1113954, -5.119346 , -5.1202354, -5.1230803,
        -5.1247115, -5.125494 , -5.1167865, -5.1235557, -5.127506 ,
        -5.1223035, -5.124693 , -5.116798 , -5.1166444, -5.1148844,
        -5.1223955, -5.1191473, -5.111838 , -5.112754 , -5.1008034,
        -5.1111383, -5.1085505, -5.100999 , -5.1052284, -5.0974274,
        -5.0922704, -5.0895066, -5.089077 , -5.086511 , -5.0866723,
        -5.0855794, -5.0879817, -5.0893273, -5.0967927, -5.08802  ,
        -5.093814 , -5.1059337, -5.112577 , -5.1154685, -5.121607 ,
        -5.12036  , -5.114813 , -5.1212907, -5.1178846, -5.117335 ,
        -5.1129055, -5.1143084, -5.109348 , -5.100045 , -5.1053514,
        -5.0964003, -5.0934987, -5.102238 , -5.0983605, -5.0989766,
        -5.1099577, -5.10423  , -5.1023245, -5.1104093, -5.111489 ,
        -5.1045485, -5.110909 , -5.112187 , -5.1123652, -5.113932 ,
        -5.10867  , -5.0995913, -5.101586 , -5.1216726, -5.111117 ,
        -5.116669 , -5.12195  , -5.112778 , -5.1199346, -5.117032 ,
        -5.120798 , -5.11272  , -5.117168 , -5.1175523, -5.09827  ,
        -5.1082807, -5.1146145, -5.1200075, -5.1190424, -5.112625 ,
        -5.1200185, -5.1110024, -5.126168 , -5.1168666, -5.11615  ,
        -5.113571 , -5.118028 , -5.1132293, -5.122775 , -5.1154203,
        -5.091564 , -5.1100745, -5.0914884, -5.0932784, -5.105365 ,
        -5.092105 , -5.0959387, -5.1119223, -5.101221 , -5.1215677,
        -5.114091 , -5.10658  , -5.101732 , -5.105737 , -5.0961223,
        -5.1260395]], dtype=float32), 'unique_ids:0': array([0])}

Question: What is the capital of France?
Context: The capital of France is Paris.
Answer: 
Generating '/tmp/nsys-report-dbd3.qdstrm'

[1/8] [0%                          ] nsys-report-a359.nsys-rep
[1/8] [0%                          ] nsys-report-a359.nsys-rep
[1/8] [6%                          ] nsys-report-a359.nsys-rep
[1/8] [9%                          ] nsys-report-a359.nsys-rep
[1/8] [8%                          ] nsys-report-a359.nsys-rep
[1/8] [7%                          ] nsys-report-a359.nsys-rep
[1/8] [6%                          ] nsys-report-a359.nsys-rep
[1/8] [5%                          ] nsys-report-a359.nsys-rep
[1/8] [===22%                      ] nsys-report-a359.nsys-rep
[1/8] [==20%                       ] nsys-report-a359.nsys-rep
[1/8] [==18%                       ] nsys-report-a359.nsys-rep
[1/8] [==19%                       ] nsys-report-a359.nsys-rep
[1/8] [==20%                       ] nsys-report-a359.nsys-rep
[1/8] [==21%                       ] nsys-report-a359.nsys-rep
[1/8] [===22%                      ] nsys-report-a359.nsys-rep
[1/8] [===23%                      ] nsys-report-a359.nsys-rep
[1/8] [===24%                      ] nsys-report-a359.nsys-rep
[1/8] [====25%                     ] nsys-report-a359.nsys-rep
[1/8] [====26%                     ] nsys-report-a359.nsys-rep
[1/8] [====27%                     ] nsys-report-a359.nsys-rep
[1/8] [====28%                     ] nsys-report-a359.nsys-rep
[1/8] [=====29%                    ] nsys-report-a359.nsys-rep
[1/8] [=====30%                    ] nsys-report-a359.nsys-rep
[1/8] [=====31%                    ] nsys-report-a359.nsys-rep
[1/8] [======34%                   ] nsys-report-a359.nsys-rep
[1/8] [=======37%                  ] nsys-report-a359.nsys-rep
[1/8] [=========45%                ] nsys-report-a359.nsys-rep
[1/8] [===========53%              ] nsys-report-a359.nsys-rep
[1/8] [============54%             ] nsys-report-a359.nsys-rep
[1/8] [==============62%           ] nsys-report-a359.nsys-rep
[1/8] [===============66%          ] nsys-report-a359.nsys-rep
[1/8] [================70%         ] nsys-report-a359.nsys-rep
[1/8] [==================76%       ] nsys-report-a359.nsys-rep
[1/8] [==================77%       ] nsys-report-a359.nsys-rep
[1/8] [==================78%       ] nsys-report-a359.nsys-rep
[1/8] [===================79%      ] nsys-report-a359.nsys-rep
[1/8] [===================80%      ] nsys-report-a359.nsys-rep
[1/8] [=====================87%    ] nsys-report-a359.nsys-rep
[1/8] [=======================94%  ] nsys-report-a359.nsys-rep
[1/8] [========================98% ] nsys-report-a359.nsys-rep
[1/8] [========================100%] nsys-report-a359.nsys-rep
[1/8] [========================100%] nsys-report-a359.nsys-rep

[2/8] [0%                          ] nsys-report-4332.sqlite
[2/8] [1%                          ] nsys-report-4332.sqlite
[2/8] [2%                          ] nsys-report-4332.sqlite
[2/8] [3%                          ] nsys-report-4332.sqlite
[2/8] [4%                          ] nsys-report-4332.sqlite
[2/8] [5%                          ] nsys-report-4332.sqlite
[2/8] [6%                          ] nsys-report-4332.sqlite
[2/8] [7%                          ] nsys-report-4332.sqlite
[2/8] [8%                          ] nsys-report-4332.sqlite
[2/8] [9%                          ] nsys-report-4332.sqlite
[2/8] [10%                         ] nsys-report-4332.sqlite
[2/8] [11%                         ] nsys-report-4332.sqlite
[2/8] [12%                         ] nsys-report-4332.sqlite
[2/8] [13%                         ] nsys-report-4332.sqlite
[2/8] [14%                         ] nsys-report-4332.sqlite
[2/8] [=15%                        ] nsys-report-4332.sqlite
[2/8] [=16%                        ] nsys-report-4332.sqlite
[2/8] [=17%                        ] nsys-report-4332.sqlite
[2/8] [==18%                       ] nsys-report-4332.sqlite
[2/8] [==19%                       ] nsys-report-4332.sqlite
[2/8] [==20%                       ] nsys-report-4332.sqlite
[2/8] [==21%                       ] nsys-report-4332.sqlite
[2/8] [===22%                      ] nsys-report-4332.sqlite
[2/8] [===23%                      ] nsys-report-4332.sqlite
[2/8] [===24%                      ] nsys-report-4332.sqlite
[2/8] [====25%                     ] nsys-report-4332.sqlite
[2/8] [====26%                     ] nsys-report-4332.sqlite
[2/8] [====27%                     ] nsys-report-4332.sqlite
[2/8] [====28%                     ] nsys-report-4332.sqlite
[2/8] [=====29%                    ] nsys-report-4332.sqlite
[2/8] [=====30%                    ] nsys-report-4332.sqlite
[2/8] [=====31%                    ] nsys-report-4332.sqlite
[2/8] [=====32%                    ] nsys-report-4332.sqlite
[2/8] [======33%                   ] nsys-report-4332.sqlite
[2/8] [======34%                   ] nsys-report-4332.sqlite
[2/8] [======35%                   ] nsys-report-4332.sqlite
[2/8] [=======36%                  ] nsys-report-4332.sqlite
[2/8] [=======37%                  ] nsys-report-4332.sqlite
[2/8] [=======38%                  ] nsys-report-4332.sqlite
[2/8] [=======39%                  ] nsys-report-4332.sqlite
[2/8] [========40%                 ] nsys-report-4332.sqlite
[2/8] [========41%                 ] nsys-report-4332.sqlite
[2/8] [========42%                 ] nsys-report-4332.sqlite
[2/8] [=========43%                ] nsys-report-4332.sqlite
[2/8] [=========44%                ] nsys-report-4332.sqlite
[2/8] [=========45%                ] nsys-report-4332.sqlite
[2/8] [=========46%                ] nsys-report-4332.sqlite
[2/8] [==========47%               ] nsys-report-4332.sqlite
[2/8] [==========48%               ] nsys-report-4332.sqlite
[2/8] [==========49%               ] nsys-report-4332.sqlite
[2/8] [===========50%              ] nsys-report-4332.sqlite
[2/8] [===========51%              ] nsys-report-4332.sqlite
[2/8] [===========52%              ] nsys-report-4332.sqlite
[2/8] [===========53%              ] nsys-report-4332.sqlite
[2/8] [============54%             ] nsys-report-4332.sqlite
[2/8] [============55%             ] nsys-report-4332.sqlite
[2/8] [============56%             ] nsys-report-4332.sqlite
[2/8] [============57%             ] nsys-report-4332.sqlite
[2/8] [=============58%            ] nsys-report-4332.sqlite
[2/8] [=============59%            ] nsys-report-4332.sqlite
[2/8] [=============60%            ] nsys-report-4332.sqlite
[2/8] [==============61%           ] nsys-report-4332.sqlite
[2/8] [==============62%           ] nsys-report-4332.sqlite
[2/8] [==============63%           ] nsys-report-4332.sqlite
[2/8] [==============64%           ] nsys-report-4332.sqlite
[2/8] [===============65%          ] nsys-report-4332.sqlite
[2/8] [===============66%          ] nsys-report-4332.sqlite
[2/8] [===============67%          ] nsys-report-4332.sqlite
[2/8] [================68%         ] nsys-report-4332.sqlite
[2/8] [================69%         ] nsys-report-4332.sqlite
[2/8] [================70%         ] nsys-report-4332.sqlite
[2/8] [================71%         ] nsys-report-4332.sqlite
[2/8] [=================72%        ] nsys-report-4332.sqlite
[2/8] [=================73%        ] nsys-report-4332.sqlite
[2/8] [=================74%        ] nsys-report-4332.sqlite
[2/8] [==================75%       ] nsys-report-4332.sqlite
[2/8] [==================76%       ] nsys-report-4332.sqlite
[2/8] [==================77%       ] nsys-report-4332.sqlite
[2/8] [==================78%       ] nsys-report-4332.sqlite
[2/8] [===================79%      ] nsys-report-4332.sqlite
[2/8] [===================80%      ] nsys-report-4332.sqlite
[2/8] [===================81%      ] nsys-report-4332.sqlite
[2/8] [===================82%      ] nsys-report-4332.sqlite
[2/8] [====================83%     ] nsys-report-4332.sqlite
[2/8] [====================84%     ] nsys-report-4332.sqlite
[2/8] [====================85%     ] nsys-report-4332.sqlite
[2/8] [=====================86%    ] nsys-report-4332.sqlite
[2/8] [=====================87%    ] nsys-report-4332.sqlite
[2/8] [=====================88%    ] nsys-report-4332.sqlite
[2/8] [=====================89%    ] nsys-report-4332.sqlite
[2/8] [======================90%   ] nsys-report-4332.sqlite
[2/8] [======================91%   ] nsys-report-4332.sqlite
[2/8] [======================92%   ] nsys-report-4332.sqlite
[2/8] [=======================93%  ] nsys-report-4332.sqlite
[2/8] [=======================94%  ] nsys-report-4332.sqlite
[2/8] [=======================95%  ] nsys-report-4332.sqlite
[2/8] [=======================96%  ] nsys-report-4332.sqlite
[2/8] [========================97% ] nsys-report-4332.sqlite
[2/8] [========================98% ] nsys-report-4332.sqlite
[2/8] [========================99% ] nsys-report-4332.sqlite
[2/8] [========================100%] nsys-report-4332.sqlite
[2/8] [========================100%] nsys-report-4332.sqlite
[3/8] Executing 'nvtx_sum' stats report
[4/8] Executing 'osrt_sum' stats report

 Time (%)  Total Time (ns)  Num Calls    Avg (ns)       Med (ns)      Min (ns)     Max (ns)    StdDev (ns)            Name         
 --------  ---------------  ---------  -------------  -------------  -----------  -----------  ------------  ----------------------
     53.0    5,379,923,819         68   79,116,526.8  100,140,946.5        1,120  195,318,656  44,171,310.2  poll                  
     44.3    4,501,072,548          9  500,119,172.0  500,089,739.0  500,083,210  500,365,743      92,611.1  pthread_cond_timedwait
      1.7      169,966,424      5,645       30,109.2          790.0          290  156,348,110   2,080,925.3  read                  
      0.7       75,547,694      3,053       24,745.4        7,400.0          210   13,567,883     347,835.0  ioctl                 
      0.1        9,689,995      3,189        3,038.6        2,760.0        1,100       47,310       1,529.4  open64                
      0.0        5,062,449          1    5,062,449.0    5,062,449.0    5,062,449    5,062,449           0.0  nanosleep             
      0.0        3,655,800    135,467           27.0           20.0           20        6,820          46.9  pthread_cond_signal   
      0.0        3,051,371        139       21,952.3        5,090.0        1,990    1,588,811     135,272.1  mmap64                
      0.0          970,652         10       97,065.2       55,206.0       16,790      336,714     113,411.1  sem_timedwait         
      0.0          896,861         13       68,989.3       60,501.0       54,070      102,672      14,607.2  sleep                 
      0.0          527,226        583          904.3           50.0           20       69,801       5,637.4  fgets                 
      0.0          379,756          8       47,469.5       34,285.5       27,340       90,271      23,307.7  pthread_create        
      0.0          334,766         27       12,398.7        6,731.0        1,890       79,391      16,644.2  mmap                  
      0.0          306,232         31        9,878.5        6,580.0          590       51,641      13,059.2  write                 
      0.0          298,423         12       24,868.6        9,260.0        2,420       73,561      28,255.3  munmap                
      0.0          221,827         44        5,041.5        2,970.5          960       24,511       5,539.6  fopen                 
      0.0          129,122        133          970.8          800.0          491        3,360         520.2  pread64               
      0.0          126,131          1      126,131.0      126,131.0      126,131      126,131           0.0  pthread_cond_wait     
      0.0           92,441          1       92,441.0       92,441.0       92,441       92,441           0.0  waitpid               
      0.0           58,821         41        1,434.7        1,120.0          620        4,630         883.9  fclose                
      0.0           55,951         15        3,730.1        3,190.0        1,820        6,870       1,786.7  open                  
      0.0           55,646      1,622           34.3           30.0           20        5,050         150.7  pthread_cond_broadcast
      0.0           35,250          2       17,625.0       17,625.0        9,240       26,010      11,858.2  connect               
      0.0           30,919        133          232.5          269.0           20        1,020         125.4  sigaction             
      0.0           30,130      1,211           24.9           20.0           20          230           8.2  flockfile             
      0.0           29,160          6        4,860.0        4,095.0        2,020       10,640       3,356.3  pipe2                 
      0.0           27,791          4        6,947.8        6,830.0        3,010       11,121       4,054.6  socket                
      0.0           22,113         68          325.2          295.5          180        1,191         168.8  fcntl                 
      0.0           19,880          6        3,313.3        2,584.5        1,211        7,190       2,139.7  fopen64               
      0.0           17,775        192           92.6          110.0           20          430          49.7  pthread_mutex_trylock 
      0.0           15,640          3        5,213.3        5,310.0        1,670        8,660       3,496.0  fread                 
      0.0            6,840          2        3,420.0        3,420.0        1,580        5,260       2,602.2  bind                  
      0.0            3,360          2        1,680.0        1,680.0        1,030        2,330         919.2  fwrite                
      0.0            2,670         30           89.0           30.0           20          860         174.5  fflush                
      0.0            2,641         10          264.1          260.0          200          340          53.7  dup                   
      0.0            1,440          2          720.0          720.0          450          990         381.8  dup2                  
      0.0              900          1          900.0          900.0          900          900           0.0  getc                  
      0.0              750          1          750.0          750.0          750          750           0.0  listen                

[5/8] Executing 'cuda_api_sum' stats report

 Time (%)  Total Time (ns)  Num Calls  Avg (ns)   Med (ns)   Min (ns)   Max (ns)   StdDev (ns)                Name               
 --------  ---------------  ---------  ---------  ---------  --------  ----------  -----------  ---------------------------------
     69.5      554,863,571      1,998  277,709.5   60,631.0     2,210   2,639,775    418,746.2  cudaMemcpyAsync                  
     15.4      123,069,139      1,998   61,596.2   11,050.5       650     266,844     76,253.9  cudaStreamSynchronize            
      9.6       76,880,061        804   95,622.0    7,510.0     2,640  16,544,823    873,475.2  cudaLaunchKernel                 
      1.4       10,995,963      3,012    3,650.7    2,935.0       490     130,372      3,635.0  cudaDeviceSynchronize            
      1.2        9,791,616         98   99,914.4   86,646.0     3,490     325,235     87,991.0  cuCtxSynchronize                 
      1.0        7,914,589      3,012    2,627.7    1,610.0     1,180      17,040      2,147.4  cudaEventRecord                  
      0.8        6,255,114         25  250,204.6      900.0       280   6,234,124  1,246,649.9  cudaStreamIsCapturing_v10000     
      0.4        2,854,366         22  129,743.9  138,422.0    74,141     180,112     27,463.6  cudaMalloc                       
      0.3        2,139,262      3,012      710.2      610.0       250      12,520        550.6  cudaEventCreateWithFlags         
      0.2        1,402,750         98   14,313.8   13,155.0     8,100      53,211      5,475.2  cuLaunchKernel                   
      0.1        1,129,074      3,012      374.9      320.0       170       4,860        210.7  cudaEventDestroy                 
      0.0          289,084          4   72,271.0   73,451.0    55,660      86,522     13,790.2  cuModuleLoadData                 
      0.0          277,234         50    5,544.7    5,391.0     3,000      11,160      2,037.8  cudaMemsetAsync                  
      0.0          271,102      1,149      235.9      200.0        50       5,130        255.2  cuGetProcAddress_v2              
      0.0          161,892          1  161,892.0  161,892.0   161,892     161,892          0.0  cudaGetDeviceProperties_v2_v12000
      0.0            3,320          1    3,320.0    3,320.0     3,320       3,320          0.0  cuMemFree_v2                     
      0.0            3,320          3    1,106.7    1,340.0       480       1,500        548.6  cuInit                           
      0.0              770          1      770.0      770.0       770         770          0.0  cuCtxSetCurrent                  
      0.0              670          3      223.3      250.0        60         360        151.8  cuModuleGetLoadingMode           

[6/8] Executing 'cuda_gpu_kern_sum' stats report

 Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)                                                  Name                                                
 --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
     82.8        9,405,772         97  96,966.7  83,200.0    11,008   319,904     88,359.2  cutlass_tensorop_s1688tf32gemm_256x128_16x3_tt_align4                                               
      3.8          427,746        148   2,890.2   2,399.5     1,568     4,993      1,038.5  void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
      3.1          349,890        125   2,799.1   2,368.0     1,312     7,937      1,488.7  void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
      2.9          328,035        196   1,673.6   1,280.0       768     3,104        708.1  void at::native::vectorized_elementwise_kernel<(int)4, at::native::FillFunctor<float>, at::detail::…
      1.6          178,464         50   3,569.3   3,520.0     3,488     3,968        108.2  void at::native::reduce_kernel<(int)512, (int)1, at::native::ReduceOp<float, at::native::MeanOps<fl…
      1.3          144,578         88   1,642.9     960.0       863     4,353      1,127.0  void at::native::vectorized_elementwise_kernel<(int)4, at::native::CUDAFunctor_add<float>, at::deta…
      1.2          131,327         48   2,736.0   2,368.0     2,304     3,968        652.0  void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
      0.9          103,808         12   8,650.7   8,640.0     8,608     8,673         21.0  void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
      0.8           96,448         37   2,606.7   1,824.0     1,760     4,384      1,175.8  void at::native::vectorized_elementwise_kernel<(int)4, at::native::BinaryFunctor<float, float, floa…
      0.6           71,105         12   5,925.4   5,936.0     5,856     6,016         66.5  void <unnamed>::softmax_warp_forward<float, float, float, (int)8, (bool)0, (bool)0>(T2 *, const T1 …
      0.4           45,790         12   3,815.8   3,808.0     3,712     3,936         65.5  void at::native::vectorized_elementwise_kernel<(int)4, at::native::tanh_kernel_cuda(at::TensorItera…
      0.2           25,120         25   1,004.8     992.0       991     1,024         16.0  void at::native::vectorized_elementwise_kernel<(int)4, at::native::sqrt_kernel_cuda(at::TensorItera…
      0.2           24,896         25     995.8     992.0       960     1,056         26.6  void at::native::vectorized_elementwise_kernel<(int)4, at::native::reciprocal_kernel_cuda(at::Tenso…
      0.2           22,528         25     901.1     896.0       864       928         20.0  void at::native::vectorized_elementwise_kernel<(int)4, at::native::AUnaryFunctor<float, float, floa…
      0.1            5,728          1   5,728.0   5,728.0     5,728     5,728          0.0  cutlass_tensorop_s1688tf32gemm_256x128_16x3_tt_align2                                               
      0.0            1,600          1   1,600.0   1,600.0     1,600     1,600          0.0  void at::native::<unnamed>::CatArrayBatchedCopy_aligned16_contig<int, unsigned int, (int)1, (int)12…

[7/8] Executing 'cuda_gpu_mem_time_sum' stats report

 Time (%)  Total Time (ns)  Count  Avg (ns)   Med (ns)   Min (ns)  Max (ns)   StdDev (ns)           Operation          
 --------  ---------------  -----  ---------  ---------  --------  ---------  -----------  ----------------------------
     55.0      205,251,787  1,254  163,677.7  119,681.0       287  2,364,355    253,334.2  [CUDA memcpy Host-to-Device]
     45.0      167,735,526    744  225,451.0  117,216.0       960  1,134,081    287,579.0  [CUDA memcpy Device-to-Host]
      0.0           24,832     50      496.6      320.0       287      1,088        261.5  [CUDA memset]               

[8/8] Executing 'cuda_gpu_mem_size_sum' stats report

 Total (MB)  Count  Avg (MB)  Med (MB)  Min (MB)  Max (MB)  StdDev (MB)           Operation          
 ----------  -----  --------  --------  --------  --------  -----------  ----------------------------
  1,328.322  1,254     1.059     0.786     0.000     9.437        1.597  [CUDA memcpy Host-to-Device]
    811.321    744     1.090     0.786     0.000     3.146        1.160  [CUDA memcpy Device-to-Host]
      0.000     50     0.000     0.000     0.000     0.000        0.000  [CUDA memset]               

Generated:
    /tmp/nsys-report-a359.nsys-rep
    /tmp/nsys-report-4332.sqlite
Leave a Comment