Untitled
unknown
c_cpp
a year ago
117 kB
16
Indexable
Model Input Name: unique_ids_raw_output___9:0, Shape: [0]
Model Input Name: segment_ids:0, Shape: [0, 256]
Model Input Name: input_mask:0, Shape: [0, 256]
Model Input Name: input_ids:0, Shape: [0, 256]
Starting model execution...
Inputs Details:
Input Name: input_ids:0
Shape: (1, 256)
Data (first 10 values): [ 101 2054 2003 1996 3007 1997 2605 1029 102 1996]...
--------------------------------------------------
Input Name: segment_ids:0
Shape: (1, 256)
Data (first 10 values): [0 0 0 0 0 0 0 0 0 1]...
--------------------------------------------------
Input Name: input_mask:0
Shape: (1, 256)
Data (first 10 values): [1 1 1 1 1 1 1 1 1 1]...
--------------------------------------------------
Input Name: unique_ids_raw_output___9:0
Shape: (1,)
Data (first 10 values): [0]...
--------------------------------------------------
Node: unique_ids_graph_outputs_Identity__10, Execution Time: 0.000497 seconds
Node: bert/encoder/Shape, Execution Time: 0.000030 seconds
Node: bert/encoder/Shape__12, Execution Time: 0.000043 seconds
Node: bert/encoder/strided_slice, Execution Time: 0.000166 seconds
Node: bert/encoder/strided_slice__16, Execution Time: 0.000030 seconds
Node: bert/encoder/strided_slice__17, Execution Time: 0.000020 seconds
Node: bert/encoder/ones/packed_Unsqueeze__18, Execution Time: 0.000035 seconds
Node: bert/encoder/ones/packed_Concat__21, Execution Time: 0.004864 seconds
Node: bert/encoder/ones__22, Execution Time: 0.000045 seconds
Node: bert/encoder/ones, Execution Time: 0.000072 seconds
Node: bert/encoder/Reshape, Execution Time: 0.000041 seconds
Node: bert/encoder/Cast, Execution Time: 0.000020 seconds
Node: bert/encoder/mul, Execution Time: 0.007905 seconds
Node: bert/encoder/layer_9/attention/self/ExpandDims, Execution Time: 0.000021 seconds
Node: bert/encoder/layer_9/attention/self/sub, Execution Time: 0.006667 seconds
Node: bert/encoder/layer_9/attention/self/mul_1, Execution Time: 0.000229 seconds
Node: bert/embeddings/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/embeddings/Reshape, Execution Time: 0.000004 seconds
Node: bert/embeddings/GatherV2, Execution Time: 0.000160 seconds
Node: bert/embeddings/Reshape_1, Execution Time: 0.000020 seconds
Node: bert/embeddings/one_hot, Execution Time: 0.000218 seconds
Input size: (None, 256, 2, 768)
No Add node related to MatMul output: bert/embeddings/MatMul. Executing regular MatMul.
MatMul Node: bert/embeddings/MatMul, Execution Time: 0.025803 seconds
Node: bert/embeddings/Reshape_3, Execution Time: 0.000024 seconds
Add Node: bert/embeddings/add, Execution Time: 0.000617 seconds
Add Node: bert/embeddings/add_1, Execution Time: 0.000539 seconds
Node: bert/embeddings/LayerNorm/moments/mean, Execution Time: 0.005122 seconds
Node: bert/embeddings/LayerNorm/moments/SquaredDifference, Execution Time: 0.000512 seconds
Node: bert/embeddings/LayerNorm/moments/SquaredDifference__72, Execution Time: 0.000581 seconds
Node: bert/embeddings/LayerNorm/moments/variance, Execution Time: 0.000065 seconds
Add Node: bert/embeddings/LayerNorm/batchnorm/add, Execution Time: 0.000063 seconds
Node: bert/embeddings/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.010223 seconds
Node: bert/embeddings/LayerNorm/batchnorm/Rsqrt__74, Execution Time: 0.005414 seconds
Node: bert/embeddings/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds
Node: bert/embeddings/LayerNorm/batchnorm/mul_2, Execution Time: 0.000059 seconds
Node: bert/embeddings/LayerNorm/batchnorm/sub, Execution Time: 0.000057 seconds
Node: bert/embeddings/LayerNorm/batchnorm/mul_1, Execution Time: 0.000468 seconds
Add Node: bert/embeddings/LayerNorm/batchnorm/add_1, Execution Time: 0.000573 seconds
Node: bert/encoder/Reshape_1, Execution Time: 0.000024 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/value/MatMul, Execution Time: 0.001978 seconds
Add Node: bert/encoder/layer_0/attention/self/value/BiasAdd, Execution Time: 0.000459 seconds
Node: bert/encoder/layer_0/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_0/attention/self/transpose_2, Execution Time: 0.000455 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/query/MatMul, Execution Time: 0.000855 seconds
Add Node: bert/encoder/layer_0/attention/self/query/BiasAdd, Execution Time: 0.000456 seconds
Node: bert/encoder/layer_0/attention/self/Reshape, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_0/attention/self/transpose, Execution Time: 0.000475 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/key/MatMul, Execution Time: 0.000611 seconds
Add Node: bert/encoder/layer_0/attention/self/key/BiasAdd, Execution Time: 0.000486 seconds
Node: bert/encoder/layer_0/attention/self/Reshape_1, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_0/attention/self/MatMul__306, Execution Time: 0.000471 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/MatMul, Execution Time: 0.001572 seconds
Node: bert/encoder/layer_0/attention/self/Mul, Execution Time: 0.001380 seconds
Add Node: bert/encoder/layer_0/attention/self/add, Execution Time: 0.001374 seconds
Node: bert/encoder/layer_0/attention/self/Softmax, Execution Time: 0.009023 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_0/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/self/MatMul_1, Execution Time: 0.000642 seconds
Node: bert/encoder/layer_0/attention/self/transpose_3, Execution Time: 0.000459 seconds
Node: bert/encoder/layer_0/attention/self/Reshape_3, Execution Time: 0.000065 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_0/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/attention/output/dense/MatMul, Execution Time: 0.000608 seconds
Add Node: bert/encoder/layer_0/attention/output/dense/BiasAdd, Execution Time: 0.000476 seconds
Add Node: bert/encoder/layer_0/attention/output/add, Execution Time: 0.000619 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/mean, Execution Time: 0.000072 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000467 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference__309, Execution Time: 0.000468 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/variance, Execution Time: 0.000065 seconds
Add Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt__311, Execution Time: 0.000068 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000454 seconds
Add Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000539 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_0/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/intermediate/dense/MatMul, Execution Time: 0.000634 seconds
Add Node: bert/encoder/layer_0/intermediate/dense/BiasAdd, Execution Time: 0.001340 seconds
Node: bert/encoder/layer_0/intermediate/dense/Pow, Execution Time: 0.018156 seconds
Node: bert/encoder/layer_0/intermediate/dense/mul, Execution Time: 0.001935 seconds
Add Node: bert/encoder/layer_0/intermediate/dense/add, Execution Time: 0.001330 seconds
Node: bert/encoder/layer_0/intermediate/dense/mul_1, Execution Time: 0.001392 seconds
Node: bert/encoder/layer_0/intermediate/dense/Tanh, Execution Time: 0.003783 seconds
Add Node: bert/encoder/layer_0/intermediate/dense/add_1, Execution Time: 0.001652 seconds
Node: bert/encoder/layer_0/intermediate/dense/mul_2, Execution Time: 0.001321 seconds
Node: bert/encoder/layer_0/intermediate/dense/mul_3, Execution Time: 0.001385 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_0/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_0/output/dense/MatMul, Execution Time: 0.000917 seconds
Add Node: bert/encoder/layer_0/output/dense/BiasAdd, Execution Time: 0.000492 seconds
Add Node: bert/encoder/layer_0/output/add, Execution Time: 0.000489 seconds
Node: bert/encoder/layer_0/output/LayerNorm/moments/mean, Execution Time: 0.000070 seconds
Node: bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000472 seconds
Node: bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference__313, Execution Time: 0.000493 seconds
Node: bert/encoder/layer_0/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds
Add Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/add, Execution Time: 0.000043 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt__315, Execution Time: 0.000067 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/sub, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000472 seconds
Add Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000493 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/value/MatMul, Execution Time: 0.000622 seconds
Add Node: bert/encoder/layer_1/attention/self/value/BiasAdd, Execution Time: 0.000484 seconds
Node: bert/encoder/layer_1/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_1/attention/self/transpose_2, Execution Time: 0.000481 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/query/MatMul, Execution Time: 0.000583 seconds
Add Node: bert/encoder/layer_1/attention/self/query/BiasAdd, Execution Time: 0.000481 seconds
Node: bert/encoder/layer_1/attention/self/Reshape, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_1/attention/self/transpose, Execution Time: 0.000438 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/key/MatMul, Execution Time: 0.000589 seconds
Add Node: bert/encoder/layer_1/attention/self/key/BiasAdd, Execution Time: 0.000462 seconds
Node: bert/encoder/layer_1/attention/self/Reshape_1, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_1/attention/self/MatMul__320, Execution Time: 0.000445 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/MatMul, Execution Time: 0.000498 seconds
Node: bert/encoder/layer_1/attention/self/Mul, Execution Time: 0.001336 seconds
Add Node: bert/encoder/layer_1/attention/self/add, Execution Time: 0.001386 seconds
Node: bert/encoder/layer_1/attention/self/Softmax, Execution Time: 0.001339 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_1/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/self/MatMul_1, Execution Time: 0.000655 seconds
Node: bert/encoder/layer_1/attention/self/transpose_3, Execution Time: 0.000478 seconds
Node: bert/encoder/layer_1/attention/self/Reshape_3, Execution Time: 0.000052 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_1/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/attention/output/dense/MatMul, Execution Time: 0.000575 seconds
Add Node: bert/encoder/layer_1/attention/output/dense/BiasAdd, Execution Time: 0.000460 seconds
Add Node: bert/encoder/layer_1/attention/output/add, Execution Time: 0.000628 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/mean, Execution Time: 0.000069 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000452 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/SquaredDifference__323, Execution Time: 0.000468 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/variance, Execution Time: 0.000052 seconds
Add Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000041 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/Rsqrt__325, Execution Time: 0.000072 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000453 seconds
Add Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000458 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_1/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/intermediate/dense/MatMul, Execution Time: 0.000684 seconds
Add Node: bert/encoder/layer_1/intermediate/dense/BiasAdd, Execution Time: 0.001391 seconds
Node: bert/encoder/layer_1/intermediate/dense/Pow, Execution Time: 0.001334 seconds
Node: bert/encoder/layer_1/intermediate/dense/mul, Execution Time: 0.001634 seconds
Add Node: bert/encoder/layer_1/intermediate/dense/add, Execution Time: 0.001318 seconds
Node: bert/encoder/layer_1/intermediate/dense/mul_1, Execution Time: 0.001405 seconds
Node: bert/encoder/layer_1/intermediate/dense/Tanh, Execution Time: 0.001327 seconds
Add Node: bert/encoder/layer_1/intermediate/dense/add_1, Execution Time: 0.001342 seconds
Node: bert/encoder/layer_1/intermediate/dense/mul_2, Execution Time: 0.001412 seconds
Node: bert/encoder/layer_1/intermediate/dense/mul_3, Execution Time: 0.001328 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_1/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_1/output/dense/MatMul, Execution Time: 0.000919 seconds
Add Node: bert/encoder/layer_1/output/dense/BiasAdd, Execution Time: 0.000513 seconds
Add Node: bert/encoder/layer_1/output/add, Execution Time: 0.000639 seconds
Node: bert/encoder/layer_1/output/LayerNorm/moments/mean, Execution Time: 0.000080 seconds
Node: bert/encoder/layer_1/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000468 seconds
Node: bert/encoder/layer_1/output/LayerNorm/moments/SquaredDifference__327, Execution Time: 0.000491 seconds
Node: bert/encoder/layer_1/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/Rsqrt__329, Execution Time: 0.000069 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000468 seconds
Add Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000599 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/value/MatMul, Execution Time: 0.000905 seconds
Add Node: bert/encoder/layer_2/attention/self/value/BiasAdd, Execution Time: 0.000607 seconds
Node: bert/encoder/layer_2/attention/self/Reshape_2, Execution Time: 0.000028 seconds
Node: bert/encoder/layer_2/attention/self/transpose_2, Execution Time: 0.000581 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/query/MatMul, Execution Time: 0.000616 seconds
Add Node: bert/encoder/layer_2/attention/self/query/BiasAdd, Execution Time: 0.000477 seconds
Node: bert/encoder/layer_2/attention/self/Reshape, Execution Time: 0.000011 seconds
Node: bert/encoder/layer_2/attention/self/transpose, Execution Time: 0.000478 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/key/MatMul, Execution Time: 0.000656 seconds
Add Node: bert/encoder/layer_2/attention/self/key/BiasAdd, Execution Time: 0.000499 seconds
Node: bert/encoder/layer_2/attention/self/Reshape_1, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_2/attention/self/MatMul__334, Execution Time: 0.000461 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/MatMul, Execution Time: 0.000500 seconds
Node: bert/encoder/layer_2/attention/self/Mul, Execution Time: 0.001413 seconds
Add Node: bert/encoder/layer_2/attention/self/add, Execution Time: 0.002262 seconds
Node: bert/encoder/layer_2/attention/self/Softmax, Execution Time: 0.001362 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_2/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/self/MatMul_1, Execution Time: 0.000561 seconds
Node: bert/encoder/layer_2/attention/self/transpose_3, Execution Time: 0.000498 seconds
Node: bert/encoder/layer_2/attention/self/Reshape_3, Execution Time: 0.000050 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_2/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/attention/output/dense/MatMul, Execution Time: 0.000587 seconds
Add Node: bert/encoder/layer_2/attention/output/dense/BiasAdd, Execution Time: 0.000457 seconds
Add Node: bert/encoder/layer_2/attention/output/add, Execution Time: 0.000584 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/mean, Execution Time: 0.000088 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000456 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/SquaredDifference__337, Execution Time: 0.000495 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/Rsqrt__339, Execution Time: 0.000074 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000058 seconds
Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000442 seconds
Add Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000456 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_2/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/intermediate/dense/MatMul, Execution Time: 0.000642 seconds
Add Node: bert/encoder/layer_2/intermediate/dense/BiasAdd, Execution Time: 0.001408 seconds
Node: bert/encoder/layer_2/intermediate/dense/Pow, Execution Time: 0.001425 seconds
Node: bert/encoder/layer_2/intermediate/dense/mul, Execution Time: 0.001326 seconds
Add Node: bert/encoder/layer_2/intermediate/dense/add, Execution Time: 0.001330 seconds
Node: bert/encoder/layer_2/intermediate/dense/mul_1, Execution Time: 0.001393 seconds
Node: bert/encoder/layer_2/intermediate/dense/Tanh, Execution Time: 0.001312 seconds
Add Node: bert/encoder/layer_2/intermediate/dense/add_1, Execution Time: 0.001741 seconds
Node: bert/encoder/layer_2/intermediate/dense/mul_2, Execution Time: 0.001384 seconds
Node: bert/encoder/layer_2/intermediate/dense/mul_3, Execution Time: 0.001297 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_2/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_2/output/dense/MatMul, Execution Time: 0.000920 seconds
Add Node: bert/encoder/layer_2/output/dense/BiasAdd, Execution Time: 0.000510 seconds
Add Node: bert/encoder/layer_2/output/add, Execution Time: 0.000488 seconds
Node: bert/encoder/layer_2/output/LayerNorm/moments/mean, Execution Time: 0.000071 seconds
Node: bert/encoder/layer_2/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000541 seconds
Node: bert/encoder/layer_2/output/LayerNorm/moments/SquaredDifference__341, Execution Time: 0.000462 seconds
Node: bert/encoder/layer_2/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds
Add Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/Rsqrt__343, Execution Time: 0.000073 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000454 seconds
Add Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000455 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/value/MatMul, Execution Time: 0.000614 seconds
Add Node: bert/encoder/layer_3/attention/self/value/BiasAdd, Execution Time: 0.000466 seconds
Node: bert/encoder/layer_3/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_3/attention/self/transpose_2, Execution Time: 0.000468 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/query/MatMul, Execution Time: 0.000611 seconds
Add Node: bert/encoder/layer_3/attention/self/query/BiasAdd, Execution Time: 0.000453 seconds
Node: bert/encoder/layer_3/attention/self/Reshape, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_3/attention/self/transpose, Execution Time: 0.000478 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/key/MatMul, Execution Time: 0.000578 seconds
Add Node: bert/encoder/layer_3/attention/self/key/BiasAdd, Execution Time: 0.000452 seconds
Node: bert/encoder/layer_3/attention/self/Reshape_1, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_3/attention/self/MatMul__348, Execution Time: 0.000477 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/MatMul, Execution Time: 0.001466 seconds
Node: bert/encoder/layer_3/attention/self/Mul, Execution Time: 0.001347 seconds
Add Node: bert/encoder/layer_3/attention/self/add, Execution Time: 0.001328 seconds
Node: bert/encoder/layer_3/attention/self/Softmax, Execution Time: 0.001364 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_3/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/self/MatMul_1, Execution Time: 0.000567 seconds
Node: bert/encoder/layer_3/attention/self/transpose_3, Execution Time: 0.000470 seconds
Node: bert/encoder/layer_3/attention/self/Reshape_3, Execution Time: 0.000048 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_3/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/attention/output/dense/MatMul, Execution Time: 0.000573 seconds
Add Node: bert/encoder/layer_3/attention/output/dense/BiasAdd, Execution Time: 0.000461 seconds
Add Node: bert/encoder/layer_3/attention/output/add, Execution Time: 0.000479 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/mean, Execution Time: 0.000068 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000468 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/SquaredDifference__351, Execution Time: 0.000559 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds
Add Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/Rsqrt__353, Execution Time: 0.000068 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000042 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000459 seconds
Add Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000474 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_3/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/intermediate/dense/MatMul, Execution Time: 0.000606 seconds
Add Node: bert/encoder/layer_3/intermediate/dense/BiasAdd, Execution Time: 0.001397 seconds
Node: bert/encoder/layer_3/intermediate/dense/Pow, Execution Time: 0.001356 seconds
Node: bert/encoder/layer_3/intermediate/dense/mul, Execution Time: 0.001531 seconds
Add Node: bert/encoder/layer_3/intermediate/dense/add, Execution Time: 0.001359 seconds
Node: bert/encoder/layer_3/intermediate/dense/mul_1, Execution Time: 0.001323 seconds
Node: bert/encoder/layer_3/intermediate/dense/Tanh, Execution Time: 0.001316 seconds
Add Node: bert/encoder/layer_3/intermediate/dense/add_1, Execution Time: 0.001360 seconds
Node: bert/encoder/layer_3/intermediate/dense/mul_2, Execution Time: 0.001329 seconds
Node: bert/encoder/layer_3/intermediate/dense/mul_3, Execution Time: 0.001352 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_3/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_3/output/dense/MatMul, Execution Time: 0.000910 seconds
Add Node: bert/encoder/layer_3/output/dense/BiasAdd, Execution Time: 0.000477 seconds
Add Node: bert/encoder/layer_3/output/add, Execution Time: 0.000456 seconds
Node: bert/encoder/layer_3/output/LayerNorm/moments/mean, Execution Time: 0.000070 seconds
Node: bert/encoder/layer_3/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000571 seconds
Node: bert/encoder/layer_3/output/LayerNorm/moments/SquaredDifference__355, Execution Time: 0.000565 seconds
Node: bert/encoder/layer_3/output/LayerNorm/moments/variance, Execution Time: 0.000060 seconds
Add Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/add, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/Rsqrt__357, Execution Time: 0.000086 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/sub, Execution Time: 0.000059 seconds
Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000572 seconds
Add Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000580 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/value/MatMul, Execution Time: 0.000795 seconds
Add Node: bert/encoder/layer_4/attention/self/value/BiasAdd, Execution Time: 0.000488 seconds
Node: bert/encoder/layer_4/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_4/attention/self/transpose_2, Execution Time: 0.000460 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/query/MatMul, Execution Time: 0.000605 seconds
Add Node: bert/encoder/layer_4/attention/self/query/BiasAdd, Execution Time: 0.000484 seconds
Node: bert/encoder/layer_4/attention/self/Reshape, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_4/attention/self/transpose, Execution Time: 0.000438 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/key/MatMul, Execution Time: 0.000582 seconds
Add Node: bert/encoder/layer_4/attention/self/key/BiasAdd, Execution Time: 0.000486 seconds
Node: bert/encoder/layer_4/attention/self/Reshape_1, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_4/attention/self/MatMul__362, Execution Time: 0.000439 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/MatMul, Execution Time: 0.000488 seconds
Node: bert/encoder/layer_4/attention/self/Mul, Execution Time: 0.001312 seconds
Add Node: bert/encoder/layer_4/attention/self/add, Execution Time: 0.001385 seconds
Node: bert/encoder/layer_4/attention/self/Softmax, Execution Time: 0.001311 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_4/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/self/MatMul_1, Execution Time: 0.000636 seconds
Node: bert/encoder/layer_4/attention/self/transpose_3, Execution Time: 0.000449 seconds
Node: bert/encoder/layer_4/attention/self/Reshape_3, Execution Time: 0.000038 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_4/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/attention/output/dense/MatMul, Execution Time: 0.000573 seconds
Add Node: bert/encoder/layer_4/attention/output/dense/BiasAdd, Execution Time: 0.000459 seconds
Add Node: bert/encoder/layer_4/attention/output/add, Execution Time: 0.000449 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/mean, Execution Time: 0.000083 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000516 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/SquaredDifference__365, Execution Time: 0.000445 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/variance, Execution Time: 0.000059 seconds
Add Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/Rsqrt__367, Execution Time: 0.000067 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000059 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000445 seconds
Add Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000447 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_4/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/intermediate/dense/MatMul, Execution Time: 0.000721 seconds
Add Node: bert/encoder/layer_4/intermediate/dense/BiasAdd, Execution Time: 0.001380 seconds
Node: bert/encoder/layer_4/intermediate/dense/Pow, Execution Time: 0.001323 seconds
Node: bert/encoder/layer_4/intermediate/dense/mul, Execution Time: 0.001327 seconds
Add Node: bert/encoder/layer_4/intermediate/dense/add, Execution Time: 0.001417 seconds
Node: bert/encoder/layer_4/intermediate/dense/mul_1, Execution Time: 0.001328 seconds
Node: bert/encoder/layer_4/intermediate/dense/Tanh, Execution Time: 0.001388 seconds
Add Node: bert/encoder/layer_4/intermediate/dense/add_1, Execution Time: 0.001321 seconds
Node: bert/encoder/layer_4/intermediate/dense/mul_2, Execution Time: 0.001313 seconds
Node: bert/encoder/layer_4/intermediate/dense/mul_3, Execution Time: 0.001348 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_4/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_4/output/dense/MatMul, Execution Time: 0.000919 seconds
Add Node: bert/encoder/layer_4/output/dense/BiasAdd, Execution Time: 0.000462 seconds
Add Node: bert/encoder/layer_4/output/add, Execution Time: 0.000495 seconds
Node: bert/encoder/layer_4/output/LayerNorm/moments/mean, Execution Time: 0.000070 seconds
Node: bert/encoder/layer_4/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000446 seconds
Node: bert/encoder/layer_4/output/LayerNorm/moments/SquaredDifference__369, Execution Time: 0.000488 seconds
Node: bert/encoder/layer_4/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds
Add Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/add, Execution Time: 0.000041 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/Rsqrt__371, Execution Time: 0.000070 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul, Execution Time: 0.000061 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000044 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/sub, Execution Time: 0.000043 seconds
Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000455 seconds
Add Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000448 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/value/MatMul, Execution Time: 0.000642 seconds
Add Node: bert/encoder/layer_5/attention/self/value/BiasAdd, Execution Time: 0.000496 seconds
Node: bert/encoder/layer_5/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_5/attention/self/transpose_2, Execution Time: 0.000448 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/query/MatMul, Execution Time: 0.000588 seconds
Add Node: bert/encoder/layer_5/attention/self/query/BiasAdd, Execution Time: 0.000455 seconds
Node: bert/encoder/layer_5/attention/self/Reshape, Execution Time: 0.000014 seconds
Node: bert/encoder/layer_5/attention/self/transpose, Execution Time: 0.000442 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/key/MatMul, Execution Time: 0.000567 seconds
Add Node: bert/encoder/layer_5/attention/self/key/BiasAdd, Execution Time: 0.000444 seconds
Node: bert/encoder/layer_5/attention/self/Reshape_1, Execution Time: 0.000013 seconds
Node: bert/encoder/layer_5/attention/self/MatMul__376, Execution Time: 0.000500 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/MatMul, Execution Time: 0.000501 seconds
Node: bert/encoder/layer_5/attention/self/Mul, Execution Time: 0.001309 seconds
Add Node: bert/encoder/layer_5/attention/self/add, Execution Time: 0.001395 seconds
Node: bert/encoder/layer_5/attention/self/Softmax, Execution Time: 0.001304 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_5/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/self/MatMul_1, Execution Time: 0.000555 seconds
Node: bert/encoder/layer_5/attention/self/transpose_3, Execution Time: 0.000481 seconds
Node: bert/encoder/layer_5/attention/self/Reshape_3, Execution Time: 0.000047 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_5/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/attention/output/dense/MatMul, Execution Time: 0.000663 seconds
Add Node: bert/encoder/layer_5/attention/output/dense/BiasAdd, Execution Time: 0.000540 seconds
Add Node: bert/encoder/layer_5/attention/output/add, Execution Time: 0.000479 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/mean, Execution Time: 0.000067 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000482 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/SquaredDifference__379, Execution Time: 0.000475 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/variance, Execution Time: 0.000056 seconds
Add Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/Rsqrt__381, Execution Time: 0.000068 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000045 seconds
Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000464 seconds
Add Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000575 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_5/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/intermediate/dense/MatMul, Execution Time: 0.000763 seconds
Add Node: bert/encoder/layer_5/intermediate/dense/BiasAdd, Execution Time: 0.001429 seconds
Node: bert/encoder/layer_5/intermediate/dense/Pow, Execution Time: 0.001294 seconds
Node: bert/encoder/layer_5/intermediate/dense/mul, Execution Time: 0.001361 seconds
Add Node: bert/encoder/layer_5/intermediate/dense/add, Execution Time: 0.001307 seconds
Node: bert/encoder/layer_5/intermediate/dense/mul_1, Execution Time: 0.001307 seconds
Node: bert/encoder/layer_5/intermediate/dense/Tanh, Execution Time: 0.001370 seconds
Add Node: bert/encoder/layer_5/intermediate/dense/add_1, Execution Time: 0.001283 seconds
Node: bert/encoder/layer_5/intermediate/dense/mul_2, Execution Time: 0.001304 seconds
Node: bert/encoder/layer_5/intermediate/dense/mul_3, Execution Time: 0.001364 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_5/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_5/output/dense/MatMul, Execution Time: 0.001011 seconds
Add Node: bert/encoder/layer_5/output/dense/BiasAdd, Execution Time: 0.000497 seconds
Add Node: bert/encoder/layer_5/output/add, Execution Time: 0.000463 seconds
Node: bert/encoder/layer_5/output/LayerNorm/moments/mean, Execution Time: 0.000083 seconds
Node: bert/encoder/layer_5/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000456 seconds
Node: bert/encoder/layer_5/output/LayerNorm/moments/SquaredDifference__383, Execution Time: 0.000471 seconds
Node: bert/encoder/layer_5/output/LayerNorm/moments/variance, Execution Time: 0.000056 seconds
Add Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/Rsqrt__385, Execution Time: 0.000067 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000479 seconds
Add Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000451 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/value/MatMul, Execution Time: 0.000685 seconds
Add Node: bert/encoder/layer_6/attention/self/value/BiasAdd, Execution Time: 0.000451 seconds
Node: bert/encoder/layer_6/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_6/attention/self/transpose_2, Execution Time: 0.000459 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/query/MatMul, Execution Time: 0.000654 seconds
Add Node: bert/encoder/layer_6/attention/self/query/BiasAdd, Execution Time: 0.000448 seconds
Node: bert/encoder/layer_6/attention/self/Reshape, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_6/attention/self/transpose, Execution Time: 0.000467 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/key/MatMul, Execution Time: 0.000576 seconds
Add Node: bert/encoder/layer_6/attention/self/key/BiasAdd, Execution Time: 0.000455 seconds
Node: bert/encoder/layer_6/attention/self/Reshape_1, Execution Time: 0.000010 seconds
Node: bert/encoder/layer_6/attention/self/MatMul__390, Execution Time: 0.000441 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/MatMul, Execution Time: 0.000488 seconds
Node: bert/encoder/layer_6/attention/self/Mul, Execution Time: 0.001314 seconds
Add Node: bert/encoder/layer_6/attention/self/add, Execution Time: 0.001356 seconds
Node: bert/encoder/layer_6/attention/self/Softmax, Execution Time: 0.001345 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_6/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/self/MatMul_1, Execution Time: 0.000570 seconds
Node: bert/encoder/layer_6/attention/self/transpose_3, Execution Time: 0.000473 seconds
Node: bert/encoder/layer_6/attention/self/Reshape_3, Execution Time: 0.000037 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_6/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/attention/output/dense/MatMul, Execution Time: 0.000584 seconds
Add Node: bert/encoder/layer_6/attention/output/dense/BiasAdd, Execution Time: 0.000483 seconds
Add Node: bert/encoder/layer_6/attention/output/add, Execution Time: 0.000607 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/mean, Execution Time: 0.000073 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000443 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/SquaredDifference__393, Execution Time: 0.000462 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000042 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/Rsqrt__395, Execution Time: 0.000072 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000473 seconds
Add Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000446 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_6/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/intermediate/dense/MatMul, Execution Time: 0.000619 seconds
Add Node: bert/encoder/layer_6/intermediate/dense/BiasAdd, Execution Time: 0.001369 seconds
Node: bert/encoder/layer_6/intermediate/dense/Pow, Execution Time: 0.001318 seconds
Node: bert/encoder/layer_6/intermediate/dense/mul, Execution Time: 0.001365 seconds
Add Node: bert/encoder/layer_6/intermediate/dense/add, Execution Time: 0.001338 seconds
Node: bert/encoder/layer_6/intermediate/dense/mul_1, Execution Time: 0.001392 seconds
Node: bert/encoder/layer_6/intermediate/dense/Tanh, Execution Time: 0.001564 seconds
Add Node: bert/encoder/layer_6/intermediate/dense/add_1, Execution Time: 0.001328 seconds
Node: bert/encoder/layer_6/intermediate/dense/mul_2, Execution Time: 0.001371 seconds
Node: bert/encoder/layer_6/intermediate/dense/mul_3, Execution Time: 0.001315 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_6/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_6/output/dense/MatMul, Execution Time: 0.000912 seconds
Add Node: bert/encoder/layer_6/output/dense/BiasAdd, Execution Time: 0.000472 seconds
Add Node: bert/encoder/layer_6/output/add, Execution Time: 0.000454 seconds
Node: bert/encoder/layer_6/output/LayerNorm/moments/mean, Execution Time: 0.000080 seconds
Node: bert/encoder/layer_6/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000532 seconds
Node: bert/encoder/layer_6/output/LayerNorm/moments/SquaredDifference__397, Execution Time: 0.000452 seconds
Node: bert/encoder/layer_6/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds
Add Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/add, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/Rsqrt__399, Execution Time: 0.000067 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000479 seconds
Add Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000470 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/value/MatMul, Execution Time: 0.000731 seconds
Add Node: bert/encoder/layer_7/attention/self/value/BiasAdd, Execution Time: 0.000454 seconds
Node: bert/encoder/layer_7/attention/self/Reshape_2, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_7/attention/self/transpose_2, Execution Time: 0.000461 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/query/MatMul, Execution Time: 0.000590 seconds
Add Node: bert/encoder/layer_7/attention/self/query/BiasAdd, Execution Time: 0.000451 seconds
Node: bert/encoder/layer_7/attention/self/Reshape, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_7/attention/self/transpose, Execution Time: 0.000524 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/key/MatMul, Execution Time: 0.000639 seconds
Add Node: bert/encoder/layer_7/attention/self/key/BiasAdd, Execution Time: 0.000482 seconds
Node: bert/encoder/layer_7/attention/self/Reshape_1, Execution Time: 0.000009 seconds
Node: bert/encoder/layer_7/attention/self/MatMul__404, Execution Time: 0.000479 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/MatMul, Execution Time: 0.000487 seconds
Node: bert/encoder/layer_7/attention/self/Mul, Execution Time: 0.001356 seconds
Add Node: bert/encoder/layer_7/attention/self/add, Execution Time: 0.001314 seconds
Node: bert/encoder/layer_7/attention/self/Softmax, Execution Time: 0.001310 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_7/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/self/MatMul_1, Execution Time: 0.000533 seconds
Node: bert/encoder/layer_7/attention/self/transpose_3, Execution Time: 0.000475 seconds
Node: bert/encoder/layer_7/attention/self/Reshape_3, Execution Time: 0.000043 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_7/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/attention/output/dense/MatMul, Execution Time: 0.000734 seconds
Add Node: bert/encoder/layer_7/attention/output/dense/BiasAdd, Execution Time: 0.000624 seconds
Add Node: bert/encoder/layer_7/attention/output/add, Execution Time: 0.000640 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/mean, Execution Time: 0.000101 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000620 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/SquaredDifference__407, Execution Time: 0.000822 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/variance, Execution Time: 0.000097 seconds
Add Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000078 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000085 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/Rsqrt__409, Execution Time: 0.000116 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000092 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000081 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000086 seconds
Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000847 seconds
Add Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000706 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_7/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/intermediate/dense/MatMul, Execution Time: 0.000950 seconds
Add Node: bert/encoder/layer_7/intermediate/dense/BiasAdd, Execution Time: 0.001974 seconds
Node: bert/encoder/layer_7/intermediate/dense/Pow, Execution Time: 0.001916 seconds
Node: bert/encoder/layer_7/intermediate/dense/mul, Execution Time: 0.002038 seconds
Add Node: bert/encoder/layer_7/intermediate/dense/add, Execution Time: 0.001887 seconds
Node: bert/encoder/layer_7/intermediate/dense/mul_1, Execution Time: 0.001875 seconds
Node: bert/encoder/layer_7/intermediate/dense/Tanh, Execution Time: 0.002064 seconds
Add Node: bert/encoder/layer_7/intermediate/dense/add_1, Execution Time: 0.001889 seconds
Node: bert/encoder/layer_7/intermediate/dense/mul_2, Execution Time: 0.001939 seconds
Node: bert/encoder/layer_7/intermediate/dense/mul_3, Execution Time: 0.001944 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_7/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_7/output/dense/MatMul, Execution Time: 0.001181 seconds
Add Node: bert/encoder/layer_7/output/dense/BiasAdd, Execution Time: 0.000527 seconds
Add Node: bert/encoder/layer_7/output/add, Execution Time: 0.000661 seconds
Node: bert/encoder/layer_7/output/LayerNorm/moments/mean, Execution Time: 0.000089 seconds
Node: bert/encoder/layer_7/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000520 seconds
Node: bert/encoder/layer_7/output/LayerNorm/moments/SquaredDifference__411, Execution Time: 0.000544 seconds
Node: bert/encoder/layer_7/output/LayerNorm/moments/variance, Execution Time: 0.000075 seconds
Add Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/add, Execution Time: 0.000050 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/Rsqrt__413, Execution Time: 0.000129 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul, Execution Time: 0.000044 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000043 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/sub, Execution Time: 0.000058 seconds
Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000562 seconds
Add Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000626 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/value/MatMul, Execution Time: 0.000742 seconds
Add Node: bert/encoder/layer_8/attention/self/value/BiasAdd, Execution Time: 0.000571 seconds
Node: bert/encoder/layer_8/attention/self/Reshape_2, Execution Time: 0.000023 seconds
Node: bert/encoder/layer_8/attention/self/transpose_2, Execution Time: 0.000514 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/query/MatMul, Execution Time: 0.000766 seconds
Add Node: bert/encoder/layer_8/attention/self/query/BiasAdd, Execution Time: 0.000573 seconds
Node: bert/encoder/layer_8/attention/self/Reshape, Execution Time: 0.000023 seconds
Node: bert/encoder/layer_8/attention/self/transpose, Execution Time: 0.000567 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/key/MatMul, Execution Time: 0.000825 seconds
Add Node: bert/encoder/layer_8/attention/self/key/BiasAdd, Execution Time: 0.000538 seconds
Node: bert/encoder/layer_8/attention/self/Reshape_1, Execution Time: 0.000022 seconds
Node: bert/encoder/layer_8/attention/self/MatMul__418, Execution Time: 0.000509 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/MatMul, Execution Time: 0.000715 seconds
Node: bert/encoder/layer_8/attention/self/Mul, Execution Time: 0.001661 seconds
Add Node: bert/encoder/layer_8/attention/self/add, Execution Time: 0.001515 seconds
Node: bert/encoder/layer_8/attention/self/Softmax, Execution Time: 0.001514 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_8/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/self/MatMul_1, Execution Time: 0.000726 seconds
Node: bert/encoder/layer_8/attention/self/transpose_3, Execution Time: 0.000521 seconds
Node: bert/encoder/layer_8/attention/self/Reshape_3, Execution Time: 0.000063 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_8/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/attention/output/dense/MatMul, Execution Time: 0.000861 seconds
Add Node: bert/encoder/layer_8/attention/output/dense/BiasAdd, Execution Time: 0.000516 seconds
Add Node: bert/encoder/layer_8/attention/output/add, Execution Time: 0.000510 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/mean, Execution Time: 0.000094 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000504 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/SquaredDifference__421, Execution Time: 0.000531 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/variance, Execution Time: 0.000079 seconds
Add Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000049 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/Rsqrt__423, Execution Time: 0.000087 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000065 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000063 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000084 seconds
Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000503 seconds
Add Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000522 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_8/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/intermediate/dense/MatMul, Execution Time: 0.000727 seconds
Add Node: bert/encoder/layer_8/intermediate/dense/BiasAdd, Execution Time: 0.001507 seconds
Node: bert/encoder/layer_8/intermediate/dense/Pow, Execution Time: 0.001634 seconds
Node: bert/encoder/layer_8/intermediate/dense/mul, Execution Time: 0.001581 seconds
Add Node: bert/encoder/layer_8/intermediate/dense/add, Execution Time: 0.001411 seconds
Node: bert/encoder/layer_8/intermediate/dense/mul_1, Execution Time: 0.002158 seconds
Node: bert/encoder/layer_8/intermediate/dense/Tanh, Execution Time: 0.002181 seconds
Add Node: bert/encoder/layer_8/intermediate/dense/add_1, Execution Time: 0.002447 seconds
Node: bert/encoder/layer_8/intermediate/dense/mul_2, Execution Time: 0.001522 seconds
Node: bert/encoder/layer_8/intermediate/dense/mul_3, Execution Time: 0.001564 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_8/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_8/output/dense/MatMul, Execution Time: 0.001133 seconds
Add Node: bert/encoder/layer_8/output/dense/BiasAdd, Execution Time: 0.000553 seconds
Add Node: bert/encoder/layer_8/output/add, Execution Time: 0.000525 seconds
Node: bert/encoder/layer_8/output/LayerNorm/moments/mean, Execution Time: 0.000081 seconds
Node: bert/encoder/layer_8/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000554 seconds
Node: bert/encoder/layer_8/output/LayerNorm/moments/SquaredDifference__425, Execution Time: 0.000521 seconds
Node: bert/encoder/layer_8/output/LayerNorm/moments/variance, Execution Time: 0.000072 seconds
Add Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/add, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000072 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/Rsqrt__427, Execution Time: 0.000072 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul, Execution Time: 0.000059 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/sub, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000489 seconds
Add Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000502 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/value/MatMul, Execution Time: 0.000749 seconds
Add Node: bert/encoder/layer_9/attention/self/value/BiasAdd, Execution Time: 0.000525 seconds
Node: bert/encoder/layer_9/attention/self/Reshape_2, Execution Time: 0.000023 seconds
Node: bert/encoder/layer_9/attention/self/transpose_2, Execution Time: 0.000478 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/query/MatMul, Execution Time: 0.000729 seconds
Add Node: bert/encoder/layer_9/attention/self/query/BiasAdd, Execution Time: 0.000517 seconds
Node: bert/encoder/layer_9/attention/self/Reshape, Execution Time: 0.000029 seconds
Node: bert/encoder/layer_9/attention/self/transpose, Execution Time: 0.000518 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/key/MatMul, Execution Time: 0.000738 seconds
Add Node: bert/encoder/layer_9/attention/self/key/BiasAdd, Execution Time: 0.000548 seconds
Node: bert/encoder/layer_9/attention/self/Reshape_1, Execution Time: 0.000026 seconds
Node: bert/encoder/layer_9/attention/self/MatMul__432, Execution Time: 0.000496 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/MatMul, Execution Time: 0.000644 seconds
Node: bert/encoder/layer_9/attention/self/Mul, Execution Time: 0.001557 seconds
Add Node: bert/encoder/layer_9/attention/self/add, Execution Time: 0.001600 seconds
Node: bert/encoder/layer_9/attention/self/Softmax, Execution Time: 0.001492 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_9/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/self/MatMul_1, Execution Time: 0.000706 seconds
Node: bert/encoder/layer_9/attention/self/transpose_3, Execution Time: 0.000526 seconds
Node: bert/encoder/layer_9/attention/self/Reshape_3, Execution Time: 0.000126 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_9/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/attention/output/dense/MatMul, Execution Time: 0.000759 seconds
Add Node: bert/encoder/layer_9/attention/output/dense/BiasAdd, Execution Time: 0.000531 seconds
Add Node: bert/encoder/layer_9/attention/output/add, Execution Time: 0.000754 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/mean, Execution Time: 0.000087 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000511 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/SquaredDifference__435, Execution Time: 0.000521 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/variance, Execution Time: 0.000084 seconds
Add Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000048 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/Rsqrt__437, Execution Time: 0.000089 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000059 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000505 seconds
Add Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000526 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_9/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/intermediate/dense/MatMul, Execution Time: 0.000951 seconds
Add Node: bert/encoder/layer_9/intermediate/dense/BiasAdd, Execution Time: 0.001550 seconds
Node: bert/encoder/layer_9/intermediate/dense/Pow, Execution Time: 0.001605 seconds
Node: bert/encoder/layer_9/intermediate/dense/mul, Execution Time: 0.001486 seconds
Add Node: bert/encoder/layer_9/intermediate/dense/add, Execution Time: 0.001552 seconds
Node: bert/encoder/layer_9/intermediate/dense/mul_1, Execution Time: 0.001474 seconds
Node: bert/encoder/layer_9/intermediate/dense/Tanh, Execution Time: 0.001496 seconds
Add Node: bert/encoder/layer_9/intermediate/dense/add_1, Execution Time: 0.001672 seconds
Node: bert/encoder/layer_9/intermediate/dense/mul_2, Execution Time: 0.001510 seconds
Node: bert/encoder/layer_9/intermediate/dense/mul_3, Execution Time: 0.001506 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_9/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_9/output/dense/MatMul, Execution Time: 0.000965 seconds
Add Node: bert/encoder/layer_9/output/dense/BiasAdd, Execution Time: 0.000566 seconds
Add Node: bert/encoder/layer_9/output/add, Execution Time: 0.000555 seconds
Node: bert/encoder/layer_9/output/LayerNorm/moments/mean, Execution Time: 0.000087 seconds
Node: bert/encoder/layer_9/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000504 seconds
Node: bert/encoder/layer_9/output/LayerNorm/moments/SquaredDifference__439, Execution Time: 0.000708 seconds
Node: bert/encoder/layer_9/output/LayerNorm/moments/variance, Execution Time: 0.000077 seconds
Add Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/add, Execution Time: 0.000058 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/Rsqrt__441, Execution Time: 0.000077 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul, Execution Time: 0.000058 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/sub, Execution Time: 0.000047 seconds
Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000488 seconds
Add Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000522 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/value/MatMul, Execution Time: 0.002145 seconds
Add Node: bert/encoder/layer_10/attention/self/value/BiasAdd, Execution Time: 0.000565 seconds
Node: bert/encoder/layer_10/attention/self/Reshape_2, Execution Time: 0.000023 seconds
Node: bert/encoder/layer_10/attention/self/transpose_2, Execution Time: 0.000578 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/query/MatMul, Execution Time: 0.000732 seconds
Add Node: bert/encoder/layer_10/attention/self/query/BiasAdd, Execution Time: 0.000525 seconds
Node: bert/encoder/layer_10/attention/self/Reshape, Execution Time: 0.000022 seconds
Node: bert/encoder/layer_10/attention/self/transpose, Execution Time: 0.000506 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/key/MatMul, Execution Time: 0.000711 seconds
Add Node: bert/encoder/layer_10/attention/self/key/BiasAdd, Execution Time: 0.000510 seconds
Node: bert/encoder/layer_10/attention/self/Reshape_1, Execution Time: 0.000021 seconds
Node: bert/encoder/layer_10/attention/self/MatMul__446, Execution Time: 0.000484 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/MatMul, Execution Time: 0.000691 seconds
Node: bert/encoder/layer_10/attention/self/Mul, Execution Time: 0.001509 seconds
Add Node: bert/encoder/layer_10/attention/self/add, Execution Time: 0.001477 seconds
Node: bert/encoder/layer_10/attention/self/Softmax, Execution Time: 0.001505 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_10/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/self/MatMul_1, Execution Time: 0.000802 seconds
Node: bert/encoder/layer_10/attention/self/transpose_3, Execution Time: 0.000508 seconds
Node: bert/encoder/layer_10/attention/self/Reshape_3, Execution Time: 0.000071 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_10/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/attention/output/dense/MatMul, Execution Time: 0.001301 seconds
Add Node: bert/encoder/layer_10/attention/output/dense/BiasAdd, Execution Time: 0.000725 seconds
Add Node: bert/encoder/layer_10/attention/output/add, Execution Time: 0.000648 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/mean, Execution Time: 0.000116 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000646 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/SquaredDifference__449, Execution Time: 0.000779 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/variance, Execution Time: 0.000078 seconds
Add Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000062 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/Rsqrt__451, Execution Time: 0.000078 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000068 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000069 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000544 seconds
Add Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000511 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_10/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/intermediate/dense/MatMul, Execution Time: 0.000758 seconds
Add Node: bert/encoder/layer_10/intermediate/dense/BiasAdd, Execution Time: 0.001694 seconds
Node: bert/encoder/layer_10/intermediate/dense/Pow, Execution Time: 0.001672 seconds
Node: bert/encoder/layer_10/intermediate/dense/mul, Execution Time: 0.001566 seconds
Add Node: bert/encoder/layer_10/intermediate/dense/add, Execution Time: 0.001636 seconds
Node: bert/encoder/layer_10/intermediate/dense/mul_1, Execution Time: 0.001593 seconds
Node: bert/encoder/layer_10/intermediate/dense/Tanh, Execution Time: 0.001675 seconds
Add Node: bert/encoder/layer_10/intermediate/dense/add_1, Execution Time: 0.001609 seconds
Node: bert/encoder/layer_10/intermediate/dense/mul_2, Execution Time: 0.001731 seconds
Node: bert/encoder/layer_10/intermediate/dense/mul_3, Execution Time: 0.001667 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_10/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_10/output/dense/MatMul, Execution Time: 0.001178 seconds
Add Node: bert/encoder/layer_10/output/dense/BiasAdd, Execution Time: 0.000525 seconds
Add Node: bert/encoder/layer_10/output/add, Execution Time: 0.000566 seconds
Node: bert/encoder/layer_10/output/LayerNorm/moments/mean, Execution Time: 0.000088 seconds
Node: bert/encoder/layer_10/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000522 seconds
Node: bert/encoder/layer_10/output/LayerNorm/moments/SquaredDifference__453, Execution Time: 0.000492 seconds
Node: bert/encoder/layer_10/output/LayerNorm/moments/variance, Execution Time: 0.000065 seconds
Add Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/add, Execution Time: 0.000057 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000059 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/Rsqrt__455, Execution Time: 0.000077 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/sub, Execution Time: 0.000058 seconds
Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000527 seconds
Add Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000467 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/value/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/value/MatMul, Execution Time: 0.000770 seconds
Add Node: bert/encoder/layer_11/attention/self/value/BiasAdd, Execution Time: 0.000548 seconds
Node: bert/encoder/layer_11/attention/self/Reshape_2, Execution Time: 0.000024 seconds
Node: bert/encoder/layer_11/attention/self/transpose_2, Execution Time: 0.000496 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/query/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/query/MatMul, Execution Time: 0.000994 seconds
Add Node: bert/encoder/layer_11/attention/self/query/BiasAdd, Execution Time: 0.000512 seconds
Node: bert/encoder/layer_11/attention/self/Reshape, Execution Time: 0.000021 seconds
Node: bert/encoder/layer_11/attention/self/transpose, Execution Time: 0.000501 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/key/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/key/MatMul, Execution Time: 0.000724 seconds
Add Node: bert/encoder/layer_11/attention/self/key/BiasAdd, Execution Time: 0.000537 seconds
Node: bert/encoder/layer_11/attention/self/Reshape_1, Execution Time: 0.000020 seconds
Node: bert/encoder/layer_11/attention/self/MatMul__460, Execution Time: 0.000478 seconds
Input size: (12, 256, 64, 256)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/MatMul, Execution Time: 0.000700 seconds
Node: bert/encoder/layer_11/attention/self/Mul, Execution Time: 0.001564 seconds
Add Node: bert/encoder/layer_11/attention/self/add, Execution Time: 0.001570 seconds
Node: bert/encoder/layer_11/attention/self/Softmax, Execution Time: 0.001483 seconds
Input size: (12, 256, 256, 64)
No Add node related to MatMul output: bert/encoder/layer_11/attention/self/MatMul_1. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/self/MatMul_1, Execution Time: 0.000719 seconds
Node: bert/encoder/layer_11/attention/self/transpose_3, Execution Time: 0.000530 seconds
Node: bert/encoder/layer_11/attention/self/Reshape_3, Execution Time: 0.000068 seconds
Input size: (None, 256, 768, 768)
No Add node related to MatMul output: bert/encoder/layer_11/attention/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/attention/output/dense/MatMul, Execution Time: 0.000749 seconds
Add Node: bert/encoder/layer_11/attention/output/dense/BiasAdd, Execution Time: 0.000514 seconds
Add Node: bert/encoder/layer_11/attention/output/add, Execution Time: 0.000556 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/mean, Execution Time: 0.000100 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000525 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/SquaredDifference__463, Execution Time: 0.000520 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/variance, Execution Time: 0.000067 seconds
Add Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000055 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000048 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/Rsqrt__465, Execution Time: 0.000086 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000049 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000064 seconds
Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000474 seconds
Add Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000592 seconds
Input size: (None, 256, 768, 3072)
No Add node related to MatMul output: bert/encoder/layer_11/intermediate/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/intermediate/dense/MatMul, Execution Time: 0.000806 seconds
Add Node: bert/encoder/layer_11/intermediate/dense/BiasAdd, Execution Time: 0.001625 seconds
Node: bert/encoder/layer_11/intermediate/dense/Pow, Execution Time: 0.001478 seconds
Node: bert/encoder/layer_11/intermediate/dense/mul, Execution Time: 0.001571 seconds
Add Node: bert/encoder/layer_11/intermediate/dense/add, Execution Time: 0.001557 seconds
Node: bert/encoder/layer_11/intermediate/dense/mul_1, Execution Time: 0.001958 seconds
Node: bert/encoder/layer_11/intermediate/dense/Tanh, Execution Time: 0.002749 seconds
Add Node: bert/encoder/layer_11/intermediate/dense/add_1, Execution Time: 0.001997 seconds
Node: bert/encoder/layer_11/intermediate/dense/mul_2, Execution Time: 0.001461 seconds
Node: bert/encoder/layer_11/intermediate/dense/mul_3, Execution Time: 0.001569 seconds
Input size: (None, 256, 3072, 768)
No Add node related to MatMul output: bert/encoder/layer_11/output/dense/MatMul. Executing regular MatMul.
MatMul Node: bert/encoder/layer_11/output/dense/MatMul, Execution Time: 0.000994 seconds
Add Node: bert/encoder/layer_11/output/dense/BiasAdd, Execution Time: 0.000538 seconds
Add Node: bert/encoder/layer_11/output/add, Execution Time: 0.000495 seconds
Node: bert/encoder/layer_11/output/LayerNorm/moments/mean, Execution Time: 0.000099 seconds
Node: bert/encoder/layer_11/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000514 seconds
Node: bert/encoder/layer_11/output/LayerNorm/moments/SquaredDifference__467, Execution Time: 0.000520 seconds
Node: bert/encoder/layer_11/output/LayerNorm/moments/variance, Execution Time: 0.000106 seconds
Add Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/add, Execution Time: 0.000049 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000053 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/Rsqrt__469, Execution Time: 0.000093 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul, Execution Time: 0.000062 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000056 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/sub, Execution Time: 0.000059 seconds
Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000480 seconds
Add Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000476 seconds
Input size: (None, 256, 768, 2)
No Add node related to MatMul output: MatMul. Executing regular MatMul.
MatMul Node: MatMul, Execution Time: 0.002046 seconds
Add Node: BiasAdd, Execution Time: 0.000067 seconds
Node: Reshape_1, Execution Time: 0.000024 seconds
Node: transpose, Execution Time: 0.000048 seconds
Node: unstack, Execution Time: 0.000057 seconds
Node: unstack__490, Execution Time: 0.000021 seconds
Node: unstack__488, Execution Time: 0.000011 seconds
Node Execution Times:
Total Execution Time: 0.519998 seconds
Total Matmul + Add Execution Time: 0.233186 seconds
Execution complete.
Model outputs: {'unstack:1': array([[-4.9148726, -4.6251225, -4.132886 , -4.1499195, -4.7828836,
-4.250844 , -4.77094 , -4.348463 , -2.7006364, -4.424177 ,
-4.510866 , -4.39433 , -4.773833 , -4.480716 , -4.7714205,
-4.6485815, -3.1330094, -4.7139587, -4.7148943, -4.7223635,
-4.7008233, -4.6960616, -4.7121487, -4.708615 , -4.703374 ,
-4.7024655, -4.687359 , -4.693113 , -4.698162 , -4.692563 ,
-4.711712 , -4.7003703, -4.7027717, -4.7279253, -4.709934 ,
-4.715551 , -4.7324576, -4.7294855, -4.7329216, -4.7218866,
-4.7014203, -4.694692 , -4.6925716, -4.700892 , -4.7044754,
-4.68252 , -4.679993 , -4.6824126, -4.6833754, -4.690988 ,
-4.695919 , -4.6797957, -4.683871 , -4.6834297, -4.680781 ,
-4.686977 , -4.681429 , -4.680897 , -4.694978 , -4.685382 ,
-4.70324 , -4.7010674, -4.693331 , -4.7089696, -4.71908 ,
-4.7188516, -4.70435 , -4.685466 , -4.6962924, -4.6972375,
-4.691828 , -4.688009 , -4.691449 , -4.693622 , -4.6890097,
-4.6876435, -4.684474 , -4.7056074, -4.6984677, -4.7068577,
-4.689911 , -4.687499 , -4.6927333, -4.693831 , -4.6965637,
-4.693646 , -4.693519 , -4.71067 , -4.722037 , -4.718479 ,
-4.729904 , -4.721483 , -4.739112 , -4.7325935, -4.7295456,
-4.712435 , -4.712704 , -4.7114053, -4.712399 , -4.704262 ,
-4.6972833, -4.6926665, -4.717176 , -4.6937675, -4.694539 ,
-4.711683 , -4.685275 , -4.6935816, -4.701117 , -4.6866083,
-4.6843753, -4.6876745, -4.684178 , -4.694061 , -4.6890798,
-4.6861553, -4.7003927, -4.7103863, -4.710601 , -4.7194986,
-4.7016277, -4.718649 , -4.743214 , -4.7109504, -4.711556 ,
-4.7007613, -4.7009783, -4.6995244, -4.7007017, -4.7026825,
-4.706376 , -4.7061615, -4.7284904, -4.724841 , -4.7082043,
-4.7080393, -4.7098503, -4.7207146, -4.733838 , -4.7125974,
-4.7276387, -4.721991 , -4.7300687, -4.7229652, -4.7133346,
-4.7109923, -4.71963 , -4.7312083, -4.733224 , -4.7362647,
-4.739877 , -4.74243 , -4.727128 , -4.737834 , -4.74598 ,
-4.738839 , -4.744508 , -4.728359 , -4.726734 , -4.7255516,
-4.7363386, -4.73214 , -4.7196693, -4.721826 , -4.7047076,
-4.7190104, -4.7156587, -4.706273 , -4.7116737, -4.701518 ,
-4.6943965, -4.6903934, -4.6890545, -4.6862764, -4.6875463,
-4.684304 , -4.688264 , -4.691186 , -4.7027955, -4.6910152,
-4.6985803, -4.7152886, -4.723945 , -4.7293673, -4.7427354,
-4.73977 , -4.7290154, -4.7378254, -4.7355986, -4.731869 ,
-4.724579 , -4.7262163, -4.71887 , -4.7058587, -4.7122684,
-4.7009015, -4.696829 , -4.7094407, -4.703914 , -4.703702 ,
-4.7195215, -4.7118044, -4.709847 , -4.721358 , -4.723019 ,
-4.71298 , -4.7218485, -4.724691 , -4.725982 , -4.726673 ,
-4.7187834, -4.709004 , -4.7109466, -4.737439 , -4.7246385,
-4.73252 , -4.7404885, -4.7261868, -4.734698 , -4.732445 ,
-4.736647 , -4.724646 , -4.73208 , -4.7321663, -4.7037077,
-4.718028 , -4.726786 , -4.7345347, -4.7328334, -4.7220054,
-4.7327023, -4.7200413, -4.7459936, -4.728972 , -4.7290406,
-4.7259574, -4.730495 , -4.723769 , -4.7380366, -4.7268267,
-4.692981 , -4.718449 , -4.6935935, -4.6961823, -4.713647 ,
-4.6950507, -4.700345 , -4.7232556, -4.708386 , -4.737004 ,
-4.7273254, -4.716681 , -4.7106347, -4.714922 , -4.7030454,
-4.7468524]], dtype=float32), 'unstack:0': array([[-5.339778 , -4.878685 , -4.312428 , -4.3309417, -5.125337 ,
-4.442749 , -5.1271124, -4.5656004, -4.683339 , -4.6350813,
-4.8042274, -4.6028423, -5.1304255, -4.7185884, -5.0999007,
-4.9003377, -5.1724668, -5.1058035, -5.1073008, -5.1120396,
-5.0958624, -5.092071 , -5.104314 , -5.1013465, -5.0973773,
-5.0955014, -5.086265 , -5.089708 , -5.093198 , -5.089909 ,
-5.1028776, -5.0938663, -5.0976443, -5.1154556, -5.102868 ,
-5.1068664, -5.1185074, -5.1169963, -5.118672 , -5.1110716,
-5.0957775, -5.0914636, -5.089892 , -5.096351 , -5.099577 ,
-5.084194 , -5.082636 , -5.0841656, -5.0848293, -5.089616 ,
-5.0918293, -5.083179 , -5.084272 , -5.0856056, -5.0826926,
-5.087329 , -5.0841713, -5.0831146, -5.092702 , -5.084974 ,
-5.0978565, -5.0952926, -5.090936 , -5.102818 , -5.110067 ,
-5.1097775, -5.0976253, -5.0851665, -5.0931044, -5.093152 ,
-5.089941 , -5.0872903, -5.0898356, -5.0923924, -5.0875926,
-5.086853 , -5.085301 , -5.100186 , -5.094749 , -5.099969 ,
-5.0874996, -5.0855126, -5.0895004, -5.09137 , -5.0918326,
-5.0898056, -5.090782 , -5.1034665, -5.112412 , -5.109096 ,
-5.1174197, -5.1111536, -5.1241746, -5.1188 , -5.116848 ,
-5.1029363, -5.1041894, -5.103745 , -5.105212 , -5.098095 ,
-5.093282 , -5.090341 , -5.1087084, -5.0905395, -5.0906925,
-5.1039257, -5.084995 , -5.090868 , -5.0939407, -5.0842586,
-5.0840406, -5.0855136, -5.08409 , -5.089621 , -5.0858765,
-5.0852404, -5.09481 , -5.1036887, -5.1036325, -5.1107006,
-5.0964427, -5.109834 , -5.128194 , -5.104343 , -5.10455 ,
-5.0965843, -5.0981956, -5.0968714, -5.0971923, -5.096769 ,
-5.1019425, -5.1022315, -5.119105 , -5.116201 , -5.102627 ,
-5.102922 , -5.1034007, -5.111492 , -5.121706 , -5.1049304,
-5.116994 , -5.111964 , -5.1179514, -5.1140733, -5.1069007,
-5.1045523, -5.1113954, -5.119346 , -5.1202354, -5.1230803,
-5.1247115, -5.125494 , -5.1167865, -5.1235557, -5.127506 ,
-5.1223035, -5.124693 , -5.116798 , -5.1166444, -5.1148844,
-5.1223955, -5.1191473, -5.111838 , -5.112754 , -5.1008034,
-5.1111383, -5.1085505, -5.100999 , -5.1052284, -5.0974274,
-5.0922704, -5.0895066, -5.089077 , -5.086511 , -5.0866723,
-5.0855794, -5.0879817, -5.0893273, -5.0967927, -5.08802 ,
-5.093814 , -5.1059337, -5.112577 , -5.1154685, -5.121607 ,
-5.12036 , -5.114813 , -5.1212907, -5.1178846, -5.117335 ,
-5.1129055, -5.1143084, -5.109348 , -5.100045 , -5.1053514,
-5.0964003, -5.0934987, -5.102238 , -5.0983605, -5.0989766,
-5.1099577, -5.10423 , -5.1023245, -5.1104093, -5.111489 ,
-5.1045485, -5.110909 , -5.112187 , -5.1123652, -5.113932 ,
-5.10867 , -5.0995913, -5.101586 , -5.1216726, -5.111117 ,
-5.116669 , -5.12195 , -5.112778 , -5.1199346, -5.117032 ,
-5.120798 , -5.11272 , -5.117168 , -5.1175523, -5.09827 ,
-5.1082807, -5.1146145, -5.1200075, -5.1190424, -5.112625 ,
-5.1200185, -5.1110024, -5.126168 , -5.1168666, -5.11615 ,
-5.113571 , -5.118028 , -5.1132293, -5.122775 , -5.1154203,
-5.091564 , -5.1100745, -5.0914884, -5.0932784, -5.105365 ,
-5.092105 , -5.0959387, -5.1119223, -5.101221 , -5.1215677,
-5.114091 , -5.10658 , -5.101732 , -5.105737 , -5.0961223,
-5.1260395]], dtype=float32), 'unique_ids:0': array([0])}
Question: What is the capital of France?
Context: The capital of France is Paris.
Answer:
Generating '/tmp/nsys-report-dbd3.qdstrm'
[1/8] [0% ] nsys-report-a359.nsys-rep
[1/8] [0% ] nsys-report-a359.nsys-rep
[1/8] [6% ] nsys-report-a359.nsys-rep
[1/8] [9% ] nsys-report-a359.nsys-rep
[1/8] [8% ] nsys-report-a359.nsys-rep
[1/8] [7% ] nsys-report-a359.nsys-rep
[1/8] [6% ] nsys-report-a359.nsys-rep
[1/8] [5% ] nsys-report-a359.nsys-rep
[1/8] [===22% ] nsys-report-a359.nsys-rep
[1/8] [==20% ] nsys-report-a359.nsys-rep
[1/8] [==18% ] nsys-report-a359.nsys-rep
[1/8] [==19% ] nsys-report-a359.nsys-rep
[1/8] [==20% ] nsys-report-a359.nsys-rep
[1/8] [==21% ] nsys-report-a359.nsys-rep
[1/8] [===22% ] nsys-report-a359.nsys-rep
[1/8] [===23% ] nsys-report-a359.nsys-rep
[1/8] [===24% ] nsys-report-a359.nsys-rep
[1/8] [====25% ] nsys-report-a359.nsys-rep
[1/8] [====26% ] nsys-report-a359.nsys-rep
[1/8] [====27% ] nsys-report-a359.nsys-rep
[1/8] [====28% ] nsys-report-a359.nsys-rep
[1/8] [=====29% ] nsys-report-a359.nsys-rep
[1/8] [=====30% ] nsys-report-a359.nsys-rep
[1/8] [=====31% ] nsys-report-a359.nsys-rep
[1/8] [======34% ] nsys-report-a359.nsys-rep
[1/8] [=======37% ] nsys-report-a359.nsys-rep
[1/8] [=========45% ] nsys-report-a359.nsys-rep
[1/8] [===========53% ] nsys-report-a359.nsys-rep
[1/8] [============54% ] nsys-report-a359.nsys-rep
[1/8] [==============62% ] nsys-report-a359.nsys-rep
[1/8] [===============66% ] nsys-report-a359.nsys-rep
[1/8] [================70% ] nsys-report-a359.nsys-rep
[1/8] [==================76% ] nsys-report-a359.nsys-rep
[1/8] [==================77% ] nsys-report-a359.nsys-rep
[1/8] [==================78% ] nsys-report-a359.nsys-rep
[1/8] [===================79% ] nsys-report-a359.nsys-rep
[1/8] [===================80% ] nsys-report-a359.nsys-rep
[1/8] [=====================87% ] nsys-report-a359.nsys-rep
[1/8] [=======================94% ] nsys-report-a359.nsys-rep
[1/8] [========================98% ] nsys-report-a359.nsys-rep
[1/8] [========================100%] nsys-report-a359.nsys-rep
[1/8] [========================100%] nsys-report-a359.nsys-rep
[2/8] [0% ] nsys-report-4332.sqlite
[2/8] [1% ] nsys-report-4332.sqlite
[2/8] [2% ] nsys-report-4332.sqlite
[2/8] [3% ] nsys-report-4332.sqlite
[2/8] [4% ] nsys-report-4332.sqlite
[2/8] [5% ] nsys-report-4332.sqlite
[2/8] [6% ] nsys-report-4332.sqlite
[2/8] [7% ] nsys-report-4332.sqlite
[2/8] [8% ] nsys-report-4332.sqlite
[2/8] [9% ] nsys-report-4332.sqlite
[2/8] [10% ] nsys-report-4332.sqlite
[2/8] [11% ] nsys-report-4332.sqlite
[2/8] [12% ] nsys-report-4332.sqlite
[2/8] [13% ] nsys-report-4332.sqlite
[2/8] [14% ] nsys-report-4332.sqlite
[2/8] [=15% ] nsys-report-4332.sqlite
[2/8] [=16% ] nsys-report-4332.sqlite
[2/8] [=17% ] nsys-report-4332.sqlite
[2/8] [==18% ] nsys-report-4332.sqlite
[2/8] [==19% ] nsys-report-4332.sqlite
[2/8] [==20% ] nsys-report-4332.sqlite
[2/8] [==21% ] nsys-report-4332.sqlite
[2/8] [===22% ] nsys-report-4332.sqlite
[2/8] [===23% ] nsys-report-4332.sqlite
[2/8] [===24% ] nsys-report-4332.sqlite
[2/8] [====25% ] nsys-report-4332.sqlite
[2/8] [====26% ] nsys-report-4332.sqlite
[2/8] [====27% ] nsys-report-4332.sqlite
[2/8] [====28% ] nsys-report-4332.sqlite
[2/8] [=====29% ] nsys-report-4332.sqlite
[2/8] [=====30% ] nsys-report-4332.sqlite
[2/8] [=====31% ] nsys-report-4332.sqlite
[2/8] [=====32% ] nsys-report-4332.sqlite
[2/8] [======33% ] nsys-report-4332.sqlite
[2/8] [======34% ] nsys-report-4332.sqlite
[2/8] [======35% ] nsys-report-4332.sqlite
[2/8] [=======36% ] nsys-report-4332.sqlite
[2/8] [=======37% ] nsys-report-4332.sqlite
[2/8] [=======38% ] nsys-report-4332.sqlite
[2/8] [=======39% ] nsys-report-4332.sqlite
[2/8] [========40% ] nsys-report-4332.sqlite
[2/8] [========41% ] nsys-report-4332.sqlite
[2/8] [========42% ] nsys-report-4332.sqlite
[2/8] [=========43% ] nsys-report-4332.sqlite
[2/8] [=========44% ] nsys-report-4332.sqlite
[2/8] [=========45% ] nsys-report-4332.sqlite
[2/8] [=========46% ] nsys-report-4332.sqlite
[2/8] [==========47% ] nsys-report-4332.sqlite
[2/8] [==========48% ] nsys-report-4332.sqlite
[2/8] [==========49% ] nsys-report-4332.sqlite
[2/8] [===========50% ] nsys-report-4332.sqlite
[2/8] [===========51% ] nsys-report-4332.sqlite
[2/8] [===========52% ] nsys-report-4332.sqlite
[2/8] [===========53% ] nsys-report-4332.sqlite
[2/8] [============54% ] nsys-report-4332.sqlite
[2/8] [============55% ] nsys-report-4332.sqlite
[2/8] [============56% ] nsys-report-4332.sqlite
[2/8] [============57% ] nsys-report-4332.sqlite
[2/8] [=============58% ] nsys-report-4332.sqlite
[2/8] [=============59% ] nsys-report-4332.sqlite
[2/8] [=============60% ] nsys-report-4332.sqlite
[2/8] [==============61% ] nsys-report-4332.sqlite
[2/8] [==============62% ] nsys-report-4332.sqlite
[2/8] [==============63% ] nsys-report-4332.sqlite
[2/8] [==============64% ] nsys-report-4332.sqlite
[2/8] [===============65% ] nsys-report-4332.sqlite
[2/8] [===============66% ] nsys-report-4332.sqlite
[2/8] [===============67% ] nsys-report-4332.sqlite
[2/8] [================68% ] nsys-report-4332.sqlite
[2/8] [================69% ] nsys-report-4332.sqlite
[2/8] [================70% ] nsys-report-4332.sqlite
[2/8] [================71% ] nsys-report-4332.sqlite
[2/8] [=================72% ] nsys-report-4332.sqlite
[2/8] [=================73% ] nsys-report-4332.sqlite
[2/8] [=================74% ] nsys-report-4332.sqlite
[2/8] [==================75% ] nsys-report-4332.sqlite
[2/8] [==================76% ] nsys-report-4332.sqlite
[2/8] [==================77% ] nsys-report-4332.sqlite
[2/8] [==================78% ] nsys-report-4332.sqlite
[2/8] [===================79% ] nsys-report-4332.sqlite
[2/8] [===================80% ] nsys-report-4332.sqlite
[2/8] [===================81% ] nsys-report-4332.sqlite
[2/8] [===================82% ] nsys-report-4332.sqlite
[2/8] [====================83% ] nsys-report-4332.sqlite
[2/8] [====================84% ] nsys-report-4332.sqlite
[2/8] [====================85% ] nsys-report-4332.sqlite
[2/8] [=====================86% ] nsys-report-4332.sqlite
[2/8] [=====================87% ] nsys-report-4332.sqlite
[2/8] [=====================88% ] nsys-report-4332.sqlite
[2/8] [=====================89% ] nsys-report-4332.sqlite
[2/8] [======================90% ] nsys-report-4332.sqlite
[2/8] [======================91% ] nsys-report-4332.sqlite
[2/8] [======================92% ] nsys-report-4332.sqlite
[2/8] [=======================93% ] nsys-report-4332.sqlite
[2/8] [=======================94% ] nsys-report-4332.sqlite
[2/8] [=======================95% ] nsys-report-4332.sqlite
[2/8] [=======================96% ] nsys-report-4332.sqlite
[2/8] [========================97% ] nsys-report-4332.sqlite
[2/8] [========================98% ] nsys-report-4332.sqlite
[2/8] [========================99% ] nsys-report-4332.sqlite
[2/8] [========================100%] nsys-report-4332.sqlite
[2/8] [========================100%] nsys-report-4332.sqlite
[3/8] Executing 'nvtx_sum' stats report
[4/8] Executing 'osrt_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- ------------- ------------- ----------- ----------- ------------ ----------------------
53.0 5,379,923,819 68 79,116,526.8 100,140,946.5 1,120 195,318,656 44,171,310.2 poll
44.3 4,501,072,548 9 500,119,172.0 500,089,739.0 500,083,210 500,365,743 92,611.1 pthread_cond_timedwait
1.7 169,966,424 5,645 30,109.2 790.0 290 156,348,110 2,080,925.3 read
0.7 75,547,694 3,053 24,745.4 7,400.0 210 13,567,883 347,835.0 ioctl
0.1 9,689,995 3,189 3,038.6 2,760.0 1,100 47,310 1,529.4 open64
0.0 5,062,449 1 5,062,449.0 5,062,449.0 5,062,449 5,062,449 0.0 nanosleep
0.0 3,655,800 135,467 27.0 20.0 20 6,820 46.9 pthread_cond_signal
0.0 3,051,371 139 21,952.3 5,090.0 1,990 1,588,811 135,272.1 mmap64
0.0 970,652 10 97,065.2 55,206.0 16,790 336,714 113,411.1 sem_timedwait
0.0 896,861 13 68,989.3 60,501.0 54,070 102,672 14,607.2 sleep
0.0 527,226 583 904.3 50.0 20 69,801 5,637.4 fgets
0.0 379,756 8 47,469.5 34,285.5 27,340 90,271 23,307.7 pthread_create
0.0 334,766 27 12,398.7 6,731.0 1,890 79,391 16,644.2 mmap
0.0 306,232 31 9,878.5 6,580.0 590 51,641 13,059.2 write
0.0 298,423 12 24,868.6 9,260.0 2,420 73,561 28,255.3 munmap
0.0 221,827 44 5,041.5 2,970.5 960 24,511 5,539.6 fopen
0.0 129,122 133 970.8 800.0 491 3,360 520.2 pread64
0.0 126,131 1 126,131.0 126,131.0 126,131 126,131 0.0 pthread_cond_wait
0.0 92,441 1 92,441.0 92,441.0 92,441 92,441 0.0 waitpid
0.0 58,821 41 1,434.7 1,120.0 620 4,630 883.9 fclose
0.0 55,951 15 3,730.1 3,190.0 1,820 6,870 1,786.7 open
0.0 55,646 1,622 34.3 30.0 20 5,050 150.7 pthread_cond_broadcast
0.0 35,250 2 17,625.0 17,625.0 9,240 26,010 11,858.2 connect
0.0 30,919 133 232.5 269.0 20 1,020 125.4 sigaction
0.0 30,130 1,211 24.9 20.0 20 230 8.2 flockfile
0.0 29,160 6 4,860.0 4,095.0 2,020 10,640 3,356.3 pipe2
0.0 27,791 4 6,947.8 6,830.0 3,010 11,121 4,054.6 socket
0.0 22,113 68 325.2 295.5 180 1,191 168.8 fcntl
0.0 19,880 6 3,313.3 2,584.5 1,211 7,190 2,139.7 fopen64
0.0 17,775 192 92.6 110.0 20 430 49.7 pthread_mutex_trylock
0.0 15,640 3 5,213.3 5,310.0 1,670 8,660 3,496.0 fread
0.0 6,840 2 3,420.0 3,420.0 1,580 5,260 2,602.2 bind
0.0 3,360 2 1,680.0 1,680.0 1,030 2,330 919.2 fwrite
0.0 2,670 30 89.0 30.0 20 860 174.5 fflush
0.0 2,641 10 264.1 260.0 200 340 53.7 dup
0.0 1,440 2 720.0 720.0 450 990 381.8 dup2
0.0 900 1 900.0 900.0 900 900 0.0 getc
0.0 750 1 750.0 750.0 750 750 0.0 listen
[5/8] Executing 'cuda_api_sum' stats report
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- --------- --------- -------- ---------- ----------- ---------------------------------
69.5 554,863,571 1,998 277,709.5 60,631.0 2,210 2,639,775 418,746.2 cudaMemcpyAsync
15.4 123,069,139 1,998 61,596.2 11,050.5 650 266,844 76,253.9 cudaStreamSynchronize
9.6 76,880,061 804 95,622.0 7,510.0 2,640 16,544,823 873,475.2 cudaLaunchKernel
1.4 10,995,963 3,012 3,650.7 2,935.0 490 130,372 3,635.0 cudaDeviceSynchronize
1.2 9,791,616 98 99,914.4 86,646.0 3,490 325,235 87,991.0 cuCtxSynchronize
1.0 7,914,589 3,012 2,627.7 1,610.0 1,180 17,040 2,147.4 cudaEventRecord
0.8 6,255,114 25 250,204.6 900.0 280 6,234,124 1,246,649.9 cudaStreamIsCapturing_v10000
0.4 2,854,366 22 129,743.9 138,422.0 74,141 180,112 27,463.6 cudaMalloc
0.3 2,139,262 3,012 710.2 610.0 250 12,520 550.6 cudaEventCreateWithFlags
0.2 1,402,750 98 14,313.8 13,155.0 8,100 53,211 5,475.2 cuLaunchKernel
0.1 1,129,074 3,012 374.9 320.0 170 4,860 210.7 cudaEventDestroy
0.0 289,084 4 72,271.0 73,451.0 55,660 86,522 13,790.2 cuModuleLoadData
0.0 277,234 50 5,544.7 5,391.0 3,000 11,160 2,037.8 cudaMemsetAsync
0.0 271,102 1,149 235.9 200.0 50 5,130 255.2 cuGetProcAddress_v2
0.0 161,892 1 161,892.0 161,892.0 161,892 161,892 0.0 cudaGetDeviceProperties_v2_v12000
0.0 3,320 1 3,320.0 3,320.0 3,320 3,320 0.0 cuMemFree_v2
0.0 3,320 3 1,106.7 1,340.0 480 1,500 548.6 cuInit
0.0 770 1 770.0 770.0 770 770 0.0 cuCtxSetCurrent
0.0 670 3 223.3 250.0 60 360 151.8 cuModuleGetLoadingMode
[6/8] Executing 'cuda_gpu_kern_sum' stats report
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
-------- --------------- --------- -------- -------- -------- -------- ----------- ----------------------------------------------------------------------------------------------------
82.8 9,405,772 97 96,966.7 83,200.0 11,008 319,904 88,359.2 cutlass_tensorop_s1688tf32gemm_256x128_16x3_tt_align4
3.8 427,746 148 2,890.2 2,399.5 1,568 4,993 1,038.5 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
3.1 349,890 125 2,799.1 2,368.0 1,312 7,937 1,488.7 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
2.9 328,035 196 1,673.6 1,280.0 768 3,104 708.1 void at::native::vectorized_elementwise_kernel<(int)4, at::native::FillFunctor<float>, at::detail::…
1.6 178,464 50 3,569.3 3,520.0 3,488 3,968 108.2 void at::native::reduce_kernel<(int)512, (int)1, at::native::ReduceOp<float, at::native::MeanOps<fl…
1.3 144,578 88 1,642.9 960.0 863 4,353 1,127.0 void at::native::vectorized_elementwise_kernel<(int)4, at::native::CUDAFunctor_add<float>, at::deta…
1.2 131,327 48 2,736.0 2,368.0 2,304 3,968 652.0 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
0.9 103,808 12 8,650.7 8,640.0 8,608 8,673 21.0 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::…
0.8 96,448 37 2,606.7 1,824.0 1,760 4,384 1,175.8 void at::native::vectorized_elementwise_kernel<(int)4, at::native::BinaryFunctor<float, float, floa…
0.6 71,105 12 5,925.4 5,936.0 5,856 6,016 66.5 void <unnamed>::softmax_warp_forward<float, float, float, (int)8, (bool)0, (bool)0>(T2 *, const T1 …
0.4 45,790 12 3,815.8 3,808.0 3,712 3,936 65.5 void at::native::vectorized_elementwise_kernel<(int)4, at::native::tanh_kernel_cuda(at::TensorItera…
0.2 25,120 25 1,004.8 992.0 991 1,024 16.0 void at::native::vectorized_elementwise_kernel<(int)4, at::native::sqrt_kernel_cuda(at::TensorItera…
0.2 24,896 25 995.8 992.0 960 1,056 26.6 void at::native::vectorized_elementwise_kernel<(int)4, at::native::reciprocal_kernel_cuda(at::Tenso…
0.2 22,528 25 901.1 896.0 864 928 20.0 void at::native::vectorized_elementwise_kernel<(int)4, at::native::AUnaryFunctor<float, float, floa…
0.1 5,728 1 5,728.0 5,728.0 5,728 5,728 0.0 cutlass_tensorop_s1688tf32gemm_256x128_16x3_tt_align2
0.0 1,600 1 1,600.0 1,600.0 1,600 1,600 0.0 void at::native::<unnamed>::CatArrayBatchedCopy_aligned16_contig<int, unsigned int, (int)1, (int)12…
[7/8] Executing 'cuda_gpu_mem_time_sum' stats report
Time (%) Total Time (ns) Count Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Operation
-------- --------------- ----- --------- --------- -------- --------- ----------- ----------------------------
55.0 205,251,787 1,254 163,677.7 119,681.0 287 2,364,355 253,334.2 [CUDA memcpy Host-to-Device]
45.0 167,735,526 744 225,451.0 117,216.0 960 1,134,081 287,579.0 [CUDA memcpy Device-to-Host]
0.0 24,832 50 496.6 320.0 287 1,088 261.5 [CUDA memset]
[8/8] Executing 'cuda_gpu_mem_size_sum' stats report
Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation
---------- ----- -------- -------- -------- -------- ----------- ----------------------------
1,328.322 1,254 1.059 0.786 0.000 9.437 1.597 [CUDA memcpy Host-to-Device]
811.321 744 1.090 0.786 0.000 3.146 1.160 [CUDA memcpy Device-to-Host]
0.000 50 0.000 0.000 0.000 0.000 0.000 [CUDA memset]
Generated:
/tmp/nsys-report-a359.nsys-rep
/tmp/nsys-report-4332.sqlite
Editor is loading...
Leave a Comment