Untitled
Model Input Name: unique_ids_raw_output___9:0, Shape: [0] Model Input Name: segment_ids:0, Shape: [0, 256] Model Input Name: input_mask:0, Shape: [0, 256] Model Input Name: input_ids:0, Shape: [0, 256] Starting model execution... Inputs Details: Input Name: input_ids:0 Shape: (1, 256) Data (first 10 values): [ 101 2054 2003 1996 3007 1997 2605 1029 102 1996]... -------------------------------------------------- Input Name: segment_ids:0 Shape: (1, 256) Data (first 10 values): [0 0 0 0 0 0 0 0 0 1]... -------------------------------------------------- Input Name: input_mask:0 Shape: (1, 256) Data (first 10 values): [1 1 1 1 1 1 1 1 1 1]... -------------------------------------------------- Input Name: unique_ids_raw_output___9:0 Shape: (1,) Data (first 10 values): [0]... -------------------------------------------------- Node: unique_ids_graph_outputs_Identity__10, Execution Time: 0.000497 seconds Node: bert/encoder/Shape, Execution Time: 0.000030 seconds Node: bert/encoder/Shape__12, Execution Time: 0.000043 seconds Node: bert/encoder/strided_slice, Execution Time: 0.000166 seconds Node: bert/encoder/strided_slice__16, Execution Time: 0.000030 seconds Node: bert/encoder/strided_slice__17, Execution Time: 0.000020 seconds Node: bert/encoder/ones/packed_Unsqueeze__18, Execution Time: 0.000035 seconds Node: bert/encoder/ones/packed_Concat__21, Execution Time: 0.004864 seconds Node: bert/encoder/ones__22, Execution Time: 0.000045 seconds Node: bert/encoder/ones, Execution Time: 0.000072 seconds Node: bert/encoder/Reshape, Execution Time: 0.000041 seconds Node: bert/encoder/Cast, Execution Time: 0.000020 seconds Node: bert/encoder/mul, Execution Time: 0.007905 seconds Node: bert/encoder/layer_9/attention/self/ExpandDims, Execution Time: 0.000021 seconds Node: bert/encoder/layer_9/attention/self/sub, Execution Time: 0.006667 seconds Node: bert/encoder/layer_9/attention/self/mul_1, Execution Time: 0.000229 seconds Node: bert/embeddings/Reshape_2, Execution Time: 0.000020 seconds Node: bert/embeddings/Reshape, Execution Time: 0.000004 seconds Node: bert/embeddings/GatherV2, Execution Time: 0.000160 seconds Node: bert/embeddings/Reshape_1, Execution Time: 0.000020 seconds Node: bert/embeddings/one_hot, Execution Time: 0.000218 seconds Input size: (None, 256, 2, 768) No Add node related to MatMul output: bert/embeddings/MatMul. Executing regular MatMul. MatMul Node: bert/embeddings/MatMul, Execution Time: 0.025803 seconds Node: bert/embeddings/Reshape_3, Execution Time: 0.000024 seconds Add Node: bert/embeddings/add, Execution Time: 0.000617 seconds Add Node: bert/embeddings/add_1, Execution Time: 0.000539 seconds Node: bert/embeddings/LayerNorm/moments/mean, Execution Time: 0.005122 seconds Node: bert/embeddings/LayerNorm/moments/SquaredDifference, Execution Time: 0.000512 seconds Node: bert/embeddings/LayerNorm/moments/SquaredDifference__72, Execution Time: 0.000581 seconds Node: bert/embeddings/LayerNorm/moments/variance, Execution Time: 0.000065 seconds Add Node: bert/embeddings/LayerNorm/batchnorm/add, Execution Time: 0.000063 seconds Node: bert/embeddings/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.010223 seconds Node: bert/embeddings/LayerNorm/batchnorm/Rsqrt__74, Execution Time: 0.005414 seconds Node: bert/embeddings/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds Node: bert/embeddings/LayerNorm/batchnorm/mul_2, Execution Time: 0.000059 seconds Node: bert/embeddings/LayerNorm/batchnorm/sub, Execution Time: 0.000057 seconds Node: bert/embeddings/LayerNorm/batchnorm/mul_1, Execution Time: 0.000468 seconds Add Node: bert/embeddings/LayerNorm/batchnorm/add_1, Execution Time: 0.000573 seconds Node: bert/encoder/Reshape_1, Execution Time: 0.000024 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_0/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_0/attention/self/value/MatMul, Execution Time: 0.001978 seconds Add Node: bert/encoder/layer_0/attention/self/value/BiasAdd, Execution Time: 0.000459 seconds Node: bert/encoder/layer_0/attention/self/Reshape_2, Execution Time: 0.000020 seconds Node: bert/encoder/layer_0/attention/self/transpose_2, Execution Time: 0.000455 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_0/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_0/attention/self/query/MatMul, Execution Time: 0.000855 seconds Add Node: bert/encoder/layer_0/attention/self/query/BiasAdd, Execution Time: 0.000456 seconds Node: bert/encoder/layer_0/attention/self/Reshape, Execution Time: 0.000010 seconds Node: bert/encoder/layer_0/attention/self/transpose, Execution Time: 0.000475 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_0/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_0/attention/self/key/MatMul, Execution Time: 0.000611 seconds Add Node: bert/encoder/layer_0/attention/self/key/BiasAdd, Execution Time: 0.000486 seconds Node: bert/encoder/layer_0/attention/self/Reshape_1, Execution Time: 0.000009 seconds Node: bert/encoder/layer_0/attention/self/MatMul__306, Execution Time: 0.000471 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_0/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_0/attention/self/MatMul, Execution Time: 0.001572 seconds Node: bert/encoder/layer_0/attention/self/Mul, Execution Time: 0.001380 seconds Add Node: bert/encoder/layer_0/attention/self/add, Execution Time: 0.001374 seconds Node: bert/encoder/layer_0/attention/self/Softmax, Execution Time: 0.009023 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_0/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_0/attention/self/MatMul_1, Execution Time: 0.000642 seconds Node: bert/encoder/layer_0/attention/self/transpose_3, Execution Time: 0.000459 seconds Node: bert/encoder/layer_0/attention/self/Reshape_3, Execution Time: 0.000065 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_0/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_0/attention/output/dense/MatMul, Execution Time: 0.000608 seconds Add Node: bert/encoder/layer_0/attention/output/dense/BiasAdd, Execution Time: 0.000476 seconds Add Node: bert/encoder/layer_0/attention/output/add, Execution Time: 0.000619 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/mean, Execution Time: 0.000072 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000467 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/SquaredDifference__309, Execution Time: 0.000468 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/moments/variance, Execution Time: 0.000065 seconds Add Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000053 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/Rsqrt__311, Execution Time: 0.000068 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000054 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000052 seconds Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000454 seconds Add Node: bert/encoder/layer_0/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000539 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_0/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_0/intermediate/dense/MatMul, Execution Time: 0.000634 seconds Add Node: bert/encoder/layer_0/intermediate/dense/BiasAdd, Execution Time: 0.001340 seconds Node: bert/encoder/layer_0/intermediate/dense/Pow, Execution Time: 0.018156 seconds Node: bert/encoder/layer_0/intermediate/dense/mul, Execution Time: 0.001935 seconds Add Node: bert/encoder/layer_0/intermediate/dense/add, Execution Time: 0.001330 seconds Node: bert/encoder/layer_0/intermediate/dense/mul_1, Execution Time: 0.001392 seconds Node: bert/encoder/layer_0/intermediate/dense/Tanh, Execution Time: 0.003783 seconds Add Node: bert/encoder/layer_0/intermediate/dense/add_1, Execution Time: 0.001652 seconds Node: bert/encoder/layer_0/intermediate/dense/mul_2, Execution Time: 0.001321 seconds Node: bert/encoder/layer_0/intermediate/dense/mul_3, Execution Time: 0.001385 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_0/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_0/output/dense/MatMul, Execution Time: 0.000917 seconds Add Node: bert/encoder/layer_0/output/dense/BiasAdd, Execution Time: 0.000492 seconds Add Node: bert/encoder/layer_0/output/add, Execution Time: 0.000489 seconds Node: bert/encoder/layer_0/output/LayerNorm/moments/mean, Execution Time: 0.000070 seconds Node: bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000472 seconds Node: bert/encoder/layer_0/output/LayerNorm/moments/SquaredDifference__313, Execution Time: 0.000493 seconds Node: bert/encoder/layer_0/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds Add Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/add, Execution Time: 0.000043 seconds Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/Rsqrt__315, Execution Time: 0.000067 seconds Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000054 seconds Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/sub, Execution Time: 0.000055 seconds Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000472 seconds Add Node: bert/encoder/layer_0/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000493 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_1/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_1/attention/self/value/MatMul, Execution Time: 0.000622 seconds Add Node: bert/encoder/layer_1/attention/self/value/BiasAdd, Execution Time: 0.000484 seconds Node: bert/encoder/layer_1/attention/self/Reshape_2, Execution Time: 0.000020 seconds Node: bert/encoder/layer_1/attention/self/transpose_2, Execution Time: 0.000481 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_1/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_1/attention/self/query/MatMul, Execution Time: 0.000583 seconds Add Node: bert/encoder/layer_1/attention/self/query/BiasAdd, Execution Time: 0.000481 seconds Node: bert/encoder/layer_1/attention/self/Reshape, Execution Time: 0.000010 seconds Node: bert/encoder/layer_1/attention/self/transpose, Execution Time: 0.000438 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_1/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_1/attention/self/key/MatMul, Execution Time: 0.000589 seconds Add Node: bert/encoder/layer_1/attention/self/key/BiasAdd, Execution Time: 0.000462 seconds Node: bert/encoder/layer_1/attention/self/Reshape_1, Execution Time: 0.000010 seconds Node: bert/encoder/layer_1/attention/self/MatMul__320, Execution Time: 0.000445 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_1/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_1/attention/self/MatMul, Execution Time: 0.000498 seconds Node: bert/encoder/layer_1/attention/self/Mul, Execution Time: 0.001336 seconds Add Node: bert/encoder/layer_1/attention/self/add, Execution Time: 0.001386 seconds Node: bert/encoder/layer_1/attention/self/Softmax, Execution Time: 0.001339 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_1/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_1/attention/self/MatMul_1, Execution Time: 0.000655 seconds Node: bert/encoder/layer_1/attention/self/transpose_3, Execution Time: 0.000478 seconds Node: bert/encoder/layer_1/attention/self/Reshape_3, Execution Time: 0.000052 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_1/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_1/attention/output/dense/MatMul, Execution Time: 0.000575 seconds Add Node: bert/encoder/layer_1/attention/output/dense/BiasAdd, Execution Time: 0.000460 seconds Add Node: bert/encoder/layer_1/attention/output/add, Execution Time: 0.000628 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/mean, Execution Time: 0.000069 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000452 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/SquaredDifference__323, Execution Time: 0.000468 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/moments/variance, Execution Time: 0.000052 seconds Add Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000041 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/Rsqrt__325, Execution Time: 0.000072 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000057 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000453 seconds Add Node: bert/encoder/layer_1/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000458 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_1/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_1/intermediate/dense/MatMul, Execution Time: 0.000684 seconds Add Node: bert/encoder/layer_1/intermediate/dense/BiasAdd, Execution Time: 0.001391 seconds Node: bert/encoder/layer_1/intermediate/dense/Pow, Execution Time: 0.001334 seconds Node: bert/encoder/layer_1/intermediate/dense/mul, Execution Time: 0.001634 seconds Add Node: bert/encoder/layer_1/intermediate/dense/add, Execution Time: 0.001318 seconds Node: bert/encoder/layer_1/intermediate/dense/mul_1, Execution Time: 0.001405 seconds Node: bert/encoder/layer_1/intermediate/dense/Tanh, Execution Time: 0.001327 seconds Add Node: bert/encoder/layer_1/intermediate/dense/add_1, Execution Time: 0.001342 seconds Node: bert/encoder/layer_1/intermediate/dense/mul_2, Execution Time: 0.001412 seconds Node: bert/encoder/layer_1/intermediate/dense/mul_3, Execution Time: 0.001328 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_1/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_1/output/dense/MatMul, Execution Time: 0.000919 seconds Add Node: bert/encoder/layer_1/output/dense/BiasAdd, Execution Time: 0.000513 seconds Add Node: bert/encoder/layer_1/output/add, Execution Time: 0.000639 seconds Node: bert/encoder/layer_1/output/LayerNorm/moments/mean, Execution Time: 0.000080 seconds Node: bert/encoder/layer_1/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000468 seconds Node: bert/encoder/layer_1/output/LayerNorm/moments/SquaredDifference__327, Execution Time: 0.000491 seconds Node: bert/encoder/layer_1/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds Add Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/Rsqrt__329, Execution Time: 0.000069 seconds Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000468 seconds Add Node: bert/encoder/layer_1/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000599 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_2/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_2/attention/self/value/MatMul, Execution Time: 0.000905 seconds Add Node: bert/encoder/layer_2/attention/self/value/BiasAdd, Execution Time: 0.000607 seconds Node: bert/encoder/layer_2/attention/self/Reshape_2, Execution Time: 0.000028 seconds Node: bert/encoder/layer_2/attention/self/transpose_2, Execution Time: 0.000581 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_2/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_2/attention/self/query/MatMul, Execution Time: 0.000616 seconds Add Node: bert/encoder/layer_2/attention/self/query/BiasAdd, Execution Time: 0.000477 seconds Node: bert/encoder/layer_2/attention/self/Reshape, Execution Time: 0.000011 seconds Node: bert/encoder/layer_2/attention/self/transpose, Execution Time: 0.000478 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_2/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_2/attention/self/key/MatMul, Execution Time: 0.000656 seconds Add Node: bert/encoder/layer_2/attention/self/key/BiasAdd, Execution Time: 0.000499 seconds Node: bert/encoder/layer_2/attention/self/Reshape_1, Execution Time: 0.000010 seconds Node: bert/encoder/layer_2/attention/self/MatMul__334, Execution Time: 0.000461 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_2/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_2/attention/self/MatMul, Execution Time: 0.000500 seconds Node: bert/encoder/layer_2/attention/self/Mul, Execution Time: 0.001413 seconds Add Node: bert/encoder/layer_2/attention/self/add, Execution Time: 0.002262 seconds Node: bert/encoder/layer_2/attention/self/Softmax, Execution Time: 0.001362 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_2/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_2/attention/self/MatMul_1, Execution Time: 0.000561 seconds Node: bert/encoder/layer_2/attention/self/transpose_3, Execution Time: 0.000498 seconds Node: bert/encoder/layer_2/attention/self/Reshape_3, Execution Time: 0.000050 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_2/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_2/attention/output/dense/MatMul, Execution Time: 0.000587 seconds Add Node: bert/encoder/layer_2/attention/output/dense/BiasAdd, Execution Time: 0.000457 seconds Add Node: bert/encoder/layer_2/attention/output/add, Execution Time: 0.000584 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/mean, Execution Time: 0.000088 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000456 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/SquaredDifference__337, Execution Time: 0.000495 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds Add Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000051 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/Rsqrt__339, Execution Time: 0.000074 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000058 seconds Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000442 seconds Add Node: bert/encoder/layer_2/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000456 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_2/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_2/intermediate/dense/MatMul, Execution Time: 0.000642 seconds Add Node: bert/encoder/layer_2/intermediate/dense/BiasAdd, Execution Time: 0.001408 seconds Node: bert/encoder/layer_2/intermediate/dense/Pow, Execution Time: 0.001425 seconds Node: bert/encoder/layer_2/intermediate/dense/mul, Execution Time: 0.001326 seconds Add Node: bert/encoder/layer_2/intermediate/dense/add, Execution Time: 0.001330 seconds Node: bert/encoder/layer_2/intermediate/dense/mul_1, Execution Time: 0.001393 seconds Node: bert/encoder/layer_2/intermediate/dense/Tanh, Execution Time: 0.001312 seconds Add Node: bert/encoder/layer_2/intermediate/dense/add_1, Execution Time: 0.001741 seconds Node: bert/encoder/layer_2/intermediate/dense/mul_2, Execution Time: 0.001384 seconds Node: bert/encoder/layer_2/intermediate/dense/mul_3, Execution Time: 0.001297 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_2/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_2/output/dense/MatMul, Execution Time: 0.000920 seconds Add Node: bert/encoder/layer_2/output/dense/BiasAdd, Execution Time: 0.000510 seconds Add Node: bert/encoder/layer_2/output/add, Execution Time: 0.000488 seconds Node: bert/encoder/layer_2/output/LayerNorm/moments/mean, Execution Time: 0.000071 seconds Node: bert/encoder/layer_2/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000541 seconds Node: bert/encoder/layer_2/output/LayerNorm/moments/SquaredDifference__341, Execution Time: 0.000462 seconds Node: bert/encoder/layer_2/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds Add Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000047 seconds Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/Rsqrt__343, Execution Time: 0.000073 seconds Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000454 seconds Add Node: bert/encoder/layer_2/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000455 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_3/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_3/attention/self/value/MatMul, Execution Time: 0.000614 seconds Add Node: bert/encoder/layer_3/attention/self/value/BiasAdd, Execution Time: 0.000466 seconds Node: bert/encoder/layer_3/attention/self/Reshape_2, Execution Time: 0.000020 seconds Node: bert/encoder/layer_3/attention/self/transpose_2, Execution Time: 0.000468 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_3/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_3/attention/self/query/MatMul, Execution Time: 0.000611 seconds Add Node: bert/encoder/layer_3/attention/self/query/BiasAdd, Execution Time: 0.000453 seconds Node: bert/encoder/layer_3/attention/self/Reshape, Execution Time: 0.000010 seconds Node: bert/encoder/layer_3/attention/self/transpose, Execution Time: 0.000478 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_3/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_3/attention/self/key/MatMul, Execution Time: 0.000578 seconds Add Node: bert/encoder/layer_3/attention/self/key/BiasAdd, Execution Time: 0.000452 seconds Node: bert/encoder/layer_3/attention/self/Reshape_1, Execution Time: 0.000009 seconds Node: bert/encoder/layer_3/attention/self/MatMul__348, Execution Time: 0.000477 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_3/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_3/attention/self/MatMul, Execution Time: 0.001466 seconds Node: bert/encoder/layer_3/attention/self/Mul, Execution Time: 0.001347 seconds Add Node: bert/encoder/layer_3/attention/self/add, Execution Time: 0.001328 seconds Node: bert/encoder/layer_3/attention/self/Softmax, Execution Time: 0.001364 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_3/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_3/attention/self/MatMul_1, Execution Time: 0.000567 seconds Node: bert/encoder/layer_3/attention/self/transpose_3, Execution Time: 0.000470 seconds Node: bert/encoder/layer_3/attention/self/Reshape_3, Execution Time: 0.000048 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_3/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_3/attention/output/dense/MatMul, Execution Time: 0.000573 seconds Add Node: bert/encoder/layer_3/attention/output/dense/BiasAdd, Execution Time: 0.000461 seconds Add Node: bert/encoder/layer_3/attention/output/add, Execution Time: 0.000479 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/mean, Execution Time: 0.000068 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000468 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/SquaredDifference__351, Execution Time: 0.000559 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds Add Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000054 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/Rsqrt__353, Execution Time: 0.000068 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000042 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000459 seconds Add Node: bert/encoder/layer_3/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000474 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_3/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_3/intermediate/dense/MatMul, Execution Time: 0.000606 seconds Add Node: bert/encoder/layer_3/intermediate/dense/BiasAdd, Execution Time: 0.001397 seconds Node: bert/encoder/layer_3/intermediate/dense/Pow, Execution Time: 0.001356 seconds Node: bert/encoder/layer_3/intermediate/dense/mul, Execution Time: 0.001531 seconds Add Node: bert/encoder/layer_3/intermediate/dense/add, Execution Time: 0.001359 seconds Node: bert/encoder/layer_3/intermediate/dense/mul_1, Execution Time: 0.001323 seconds Node: bert/encoder/layer_3/intermediate/dense/Tanh, Execution Time: 0.001316 seconds Add Node: bert/encoder/layer_3/intermediate/dense/add_1, Execution Time: 0.001360 seconds Node: bert/encoder/layer_3/intermediate/dense/mul_2, Execution Time: 0.001329 seconds Node: bert/encoder/layer_3/intermediate/dense/mul_3, Execution Time: 0.001352 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_3/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_3/output/dense/MatMul, Execution Time: 0.000910 seconds Add Node: bert/encoder/layer_3/output/dense/BiasAdd, Execution Time: 0.000477 seconds Add Node: bert/encoder/layer_3/output/add, Execution Time: 0.000456 seconds Node: bert/encoder/layer_3/output/LayerNorm/moments/mean, Execution Time: 0.000070 seconds Node: bert/encoder/layer_3/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000571 seconds Node: bert/encoder/layer_3/output/LayerNorm/moments/SquaredDifference__355, Execution Time: 0.000565 seconds Node: bert/encoder/layer_3/output/LayerNorm/moments/variance, Execution Time: 0.000060 seconds Add Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/add, Execution Time: 0.000056 seconds Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000064 seconds Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/Rsqrt__357, Execution Time: 0.000086 seconds Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul, Execution Time: 0.000064 seconds Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000057 seconds Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/sub, Execution Time: 0.000059 seconds Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000572 seconds Add Node: bert/encoder/layer_3/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000580 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_4/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_4/attention/self/value/MatMul, Execution Time: 0.000795 seconds Add Node: bert/encoder/layer_4/attention/self/value/BiasAdd, Execution Time: 0.000488 seconds Node: bert/encoder/layer_4/attention/self/Reshape_2, Execution Time: 0.000020 seconds Node: bert/encoder/layer_4/attention/self/transpose_2, Execution Time: 0.000460 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_4/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_4/attention/self/query/MatMul, Execution Time: 0.000605 seconds Add Node: bert/encoder/layer_4/attention/self/query/BiasAdd, Execution Time: 0.000484 seconds Node: bert/encoder/layer_4/attention/self/Reshape, Execution Time: 0.000010 seconds Node: bert/encoder/layer_4/attention/self/transpose, Execution Time: 0.000438 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_4/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_4/attention/self/key/MatMul, Execution Time: 0.000582 seconds Add Node: bert/encoder/layer_4/attention/self/key/BiasAdd, Execution Time: 0.000486 seconds Node: bert/encoder/layer_4/attention/self/Reshape_1, Execution Time: 0.000009 seconds Node: bert/encoder/layer_4/attention/self/MatMul__362, Execution Time: 0.000439 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_4/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_4/attention/self/MatMul, Execution Time: 0.000488 seconds Node: bert/encoder/layer_4/attention/self/Mul, Execution Time: 0.001312 seconds Add Node: bert/encoder/layer_4/attention/self/add, Execution Time: 0.001385 seconds Node: bert/encoder/layer_4/attention/self/Softmax, Execution Time: 0.001311 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_4/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_4/attention/self/MatMul_1, Execution Time: 0.000636 seconds Node: bert/encoder/layer_4/attention/self/transpose_3, Execution Time: 0.000449 seconds Node: bert/encoder/layer_4/attention/self/Reshape_3, Execution Time: 0.000038 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_4/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_4/attention/output/dense/MatMul, Execution Time: 0.000573 seconds Add Node: bert/encoder/layer_4/attention/output/dense/BiasAdd, Execution Time: 0.000459 seconds Add Node: bert/encoder/layer_4/attention/output/add, Execution Time: 0.000449 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/mean, Execution Time: 0.000083 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000516 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/SquaredDifference__365, Execution Time: 0.000445 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/moments/variance, Execution Time: 0.000059 seconds Add Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000056 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/Rsqrt__367, Execution Time: 0.000067 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000059 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000057 seconds Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000445 seconds Add Node: bert/encoder/layer_4/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000447 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_4/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_4/intermediate/dense/MatMul, Execution Time: 0.000721 seconds Add Node: bert/encoder/layer_4/intermediate/dense/BiasAdd, Execution Time: 0.001380 seconds Node: bert/encoder/layer_4/intermediate/dense/Pow, Execution Time: 0.001323 seconds Node: bert/encoder/layer_4/intermediate/dense/mul, Execution Time: 0.001327 seconds Add Node: bert/encoder/layer_4/intermediate/dense/add, Execution Time: 0.001417 seconds Node: bert/encoder/layer_4/intermediate/dense/mul_1, Execution Time: 0.001328 seconds Node: bert/encoder/layer_4/intermediate/dense/Tanh, Execution Time: 0.001388 seconds Add Node: bert/encoder/layer_4/intermediate/dense/add_1, Execution Time: 0.001321 seconds Node: bert/encoder/layer_4/intermediate/dense/mul_2, Execution Time: 0.001313 seconds Node: bert/encoder/layer_4/intermediate/dense/mul_3, Execution Time: 0.001348 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_4/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_4/output/dense/MatMul, Execution Time: 0.000919 seconds Add Node: bert/encoder/layer_4/output/dense/BiasAdd, Execution Time: 0.000462 seconds Add Node: bert/encoder/layer_4/output/add, Execution Time: 0.000495 seconds Node: bert/encoder/layer_4/output/LayerNorm/moments/mean, Execution Time: 0.000070 seconds Node: bert/encoder/layer_4/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000446 seconds Node: bert/encoder/layer_4/output/LayerNorm/moments/SquaredDifference__369, Execution Time: 0.000488 seconds Node: bert/encoder/layer_4/output/LayerNorm/moments/variance, Execution Time: 0.000053 seconds Add Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/add, Execution Time: 0.000041 seconds Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/Rsqrt__371, Execution Time: 0.000070 seconds Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul, Execution Time: 0.000061 seconds Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000044 seconds Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/sub, Execution Time: 0.000043 seconds Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000455 seconds Add Node: bert/encoder/layer_4/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000448 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_5/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_5/attention/self/value/MatMul, Execution Time: 0.000642 seconds Add Node: bert/encoder/layer_5/attention/self/value/BiasAdd, Execution Time: 0.000496 seconds Node: bert/encoder/layer_5/attention/self/Reshape_2, Execution Time: 0.000020 seconds Node: bert/encoder/layer_5/attention/self/transpose_2, Execution Time: 0.000448 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_5/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_5/attention/self/query/MatMul, Execution Time: 0.000588 seconds Add Node: bert/encoder/layer_5/attention/self/query/BiasAdd, Execution Time: 0.000455 seconds Node: bert/encoder/layer_5/attention/self/Reshape, Execution Time: 0.000014 seconds Node: bert/encoder/layer_5/attention/self/transpose, Execution Time: 0.000442 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_5/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_5/attention/self/key/MatMul, Execution Time: 0.000567 seconds Add Node: bert/encoder/layer_5/attention/self/key/BiasAdd, Execution Time: 0.000444 seconds Node: bert/encoder/layer_5/attention/self/Reshape_1, Execution Time: 0.000013 seconds Node: bert/encoder/layer_5/attention/self/MatMul__376, Execution Time: 0.000500 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_5/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_5/attention/self/MatMul, Execution Time: 0.000501 seconds Node: bert/encoder/layer_5/attention/self/Mul, Execution Time: 0.001309 seconds Add Node: bert/encoder/layer_5/attention/self/add, Execution Time: 0.001395 seconds Node: bert/encoder/layer_5/attention/self/Softmax, Execution Time: 0.001304 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_5/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_5/attention/self/MatMul_1, Execution Time: 0.000555 seconds Node: bert/encoder/layer_5/attention/self/transpose_3, Execution Time: 0.000481 seconds Node: bert/encoder/layer_5/attention/self/Reshape_3, Execution Time: 0.000047 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_5/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_5/attention/output/dense/MatMul, Execution Time: 0.000663 seconds Add Node: bert/encoder/layer_5/attention/output/dense/BiasAdd, Execution Time: 0.000540 seconds Add Node: bert/encoder/layer_5/attention/output/add, Execution Time: 0.000479 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/mean, Execution Time: 0.000067 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000482 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/SquaredDifference__379, Execution Time: 0.000475 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/moments/variance, Execution Time: 0.000056 seconds Add Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000054 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/Rsqrt__381, Execution Time: 0.000068 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000054 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000045 seconds Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000464 seconds Add Node: bert/encoder/layer_5/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000575 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_5/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_5/intermediate/dense/MatMul, Execution Time: 0.000763 seconds Add Node: bert/encoder/layer_5/intermediate/dense/BiasAdd, Execution Time: 0.001429 seconds Node: bert/encoder/layer_5/intermediate/dense/Pow, Execution Time: 0.001294 seconds Node: bert/encoder/layer_5/intermediate/dense/mul, Execution Time: 0.001361 seconds Add Node: bert/encoder/layer_5/intermediate/dense/add, Execution Time: 0.001307 seconds Node: bert/encoder/layer_5/intermediate/dense/mul_1, Execution Time: 0.001307 seconds Node: bert/encoder/layer_5/intermediate/dense/Tanh, Execution Time: 0.001370 seconds Add Node: bert/encoder/layer_5/intermediate/dense/add_1, Execution Time: 0.001283 seconds Node: bert/encoder/layer_5/intermediate/dense/mul_2, Execution Time: 0.001304 seconds Node: bert/encoder/layer_5/intermediate/dense/mul_3, Execution Time: 0.001364 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_5/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_5/output/dense/MatMul, Execution Time: 0.001011 seconds Add Node: bert/encoder/layer_5/output/dense/BiasAdd, Execution Time: 0.000497 seconds Add Node: bert/encoder/layer_5/output/add, Execution Time: 0.000463 seconds Node: bert/encoder/layer_5/output/LayerNorm/moments/mean, Execution Time: 0.000083 seconds Node: bert/encoder/layer_5/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000456 seconds Node: bert/encoder/layer_5/output/LayerNorm/moments/SquaredDifference__383, Execution Time: 0.000471 seconds Node: bert/encoder/layer_5/output/LayerNorm/moments/variance, Execution Time: 0.000056 seconds Add Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/add, Execution Time: 0.000051 seconds Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000049 seconds Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/Rsqrt__385, Execution Time: 0.000067 seconds Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul, Execution Time: 0.000056 seconds Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000053 seconds Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000479 seconds Add Node: bert/encoder/layer_5/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000451 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_6/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_6/attention/self/value/MatMul, Execution Time: 0.000685 seconds Add Node: bert/encoder/layer_6/attention/self/value/BiasAdd, Execution Time: 0.000451 seconds Node: bert/encoder/layer_6/attention/self/Reshape_2, Execution Time: 0.000020 seconds Node: bert/encoder/layer_6/attention/self/transpose_2, Execution Time: 0.000459 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_6/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_6/attention/self/query/MatMul, Execution Time: 0.000654 seconds Add Node: bert/encoder/layer_6/attention/self/query/BiasAdd, Execution Time: 0.000448 seconds Node: bert/encoder/layer_6/attention/self/Reshape, Execution Time: 0.000009 seconds Node: bert/encoder/layer_6/attention/self/transpose, Execution Time: 0.000467 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_6/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_6/attention/self/key/MatMul, Execution Time: 0.000576 seconds Add Node: bert/encoder/layer_6/attention/self/key/BiasAdd, Execution Time: 0.000455 seconds Node: bert/encoder/layer_6/attention/self/Reshape_1, Execution Time: 0.000010 seconds Node: bert/encoder/layer_6/attention/self/MatMul__390, Execution Time: 0.000441 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_6/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_6/attention/self/MatMul, Execution Time: 0.000488 seconds Node: bert/encoder/layer_6/attention/self/Mul, Execution Time: 0.001314 seconds Add Node: bert/encoder/layer_6/attention/self/add, Execution Time: 0.001356 seconds Node: bert/encoder/layer_6/attention/self/Softmax, Execution Time: 0.001345 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_6/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_6/attention/self/MatMul_1, Execution Time: 0.000570 seconds Node: bert/encoder/layer_6/attention/self/transpose_3, Execution Time: 0.000473 seconds Node: bert/encoder/layer_6/attention/self/Reshape_3, Execution Time: 0.000037 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_6/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_6/attention/output/dense/MatMul, Execution Time: 0.000584 seconds Add Node: bert/encoder/layer_6/attention/output/dense/BiasAdd, Execution Time: 0.000483 seconds Add Node: bert/encoder/layer_6/attention/output/add, Execution Time: 0.000607 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/mean, Execution Time: 0.000073 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000443 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/SquaredDifference__393, Execution Time: 0.000462 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds Add Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000042 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000046 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/Rsqrt__395, Execution Time: 0.000072 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000055 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000041 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000473 seconds Add Node: bert/encoder/layer_6/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000446 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_6/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_6/intermediate/dense/MatMul, Execution Time: 0.000619 seconds Add Node: bert/encoder/layer_6/intermediate/dense/BiasAdd, Execution Time: 0.001369 seconds Node: bert/encoder/layer_6/intermediate/dense/Pow, Execution Time: 0.001318 seconds Node: bert/encoder/layer_6/intermediate/dense/mul, Execution Time: 0.001365 seconds Add Node: bert/encoder/layer_6/intermediate/dense/add, Execution Time: 0.001338 seconds Node: bert/encoder/layer_6/intermediate/dense/mul_1, Execution Time: 0.001392 seconds Node: bert/encoder/layer_6/intermediate/dense/Tanh, Execution Time: 0.001564 seconds Add Node: bert/encoder/layer_6/intermediate/dense/add_1, Execution Time: 0.001328 seconds Node: bert/encoder/layer_6/intermediate/dense/mul_2, Execution Time: 0.001371 seconds Node: bert/encoder/layer_6/intermediate/dense/mul_3, Execution Time: 0.001315 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_6/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_6/output/dense/MatMul, Execution Time: 0.000912 seconds Add Node: bert/encoder/layer_6/output/dense/BiasAdd, Execution Time: 0.000472 seconds Add Node: bert/encoder/layer_6/output/add, Execution Time: 0.000454 seconds Node: bert/encoder/layer_6/output/LayerNorm/moments/mean, Execution Time: 0.000080 seconds Node: bert/encoder/layer_6/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000532 seconds Node: bert/encoder/layer_6/output/LayerNorm/moments/SquaredDifference__397, Execution Time: 0.000452 seconds Node: bert/encoder/layer_6/output/LayerNorm/moments/variance, Execution Time: 0.000054 seconds Add Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/add, Execution Time: 0.000052 seconds Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/Rsqrt__399, Execution Time: 0.000067 seconds Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/sub, Execution Time: 0.000053 seconds Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000479 seconds Add Node: bert/encoder/layer_6/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000470 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_7/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_7/attention/self/value/MatMul, Execution Time: 0.000731 seconds Add Node: bert/encoder/layer_7/attention/self/value/BiasAdd, Execution Time: 0.000454 seconds Node: bert/encoder/layer_7/attention/self/Reshape_2, Execution Time: 0.000020 seconds Node: bert/encoder/layer_7/attention/self/transpose_2, Execution Time: 0.000461 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_7/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_7/attention/self/query/MatMul, Execution Time: 0.000590 seconds Add Node: bert/encoder/layer_7/attention/self/query/BiasAdd, Execution Time: 0.000451 seconds Node: bert/encoder/layer_7/attention/self/Reshape, Execution Time: 0.000009 seconds Node: bert/encoder/layer_7/attention/self/transpose, Execution Time: 0.000524 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_7/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_7/attention/self/key/MatMul, Execution Time: 0.000639 seconds Add Node: bert/encoder/layer_7/attention/self/key/BiasAdd, Execution Time: 0.000482 seconds Node: bert/encoder/layer_7/attention/self/Reshape_1, Execution Time: 0.000009 seconds Node: bert/encoder/layer_7/attention/self/MatMul__404, Execution Time: 0.000479 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_7/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_7/attention/self/MatMul, Execution Time: 0.000487 seconds Node: bert/encoder/layer_7/attention/self/Mul, Execution Time: 0.001356 seconds Add Node: bert/encoder/layer_7/attention/self/add, Execution Time: 0.001314 seconds Node: bert/encoder/layer_7/attention/self/Softmax, Execution Time: 0.001310 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_7/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_7/attention/self/MatMul_1, Execution Time: 0.000533 seconds Node: bert/encoder/layer_7/attention/self/transpose_3, Execution Time: 0.000475 seconds Node: bert/encoder/layer_7/attention/self/Reshape_3, Execution Time: 0.000043 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_7/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_7/attention/output/dense/MatMul, Execution Time: 0.000734 seconds Add Node: bert/encoder/layer_7/attention/output/dense/BiasAdd, Execution Time: 0.000624 seconds Add Node: bert/encoder/layer_7/attention/output/add, Execution Time: 0.000640 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/mean, Execution Time: 0.000101 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000620 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/SquaredDifference__407, Execution Time: 0.000822 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/moments/variance, Execution Time: 0.000097 seconds Add Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000078 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000085 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/Rsqrt__409, Execution Time: 0.000116 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000092 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000081 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000086 seconds Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000847 seconds Add Node: bert/encoder/layer_7/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000706 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_7/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_7/intermediate/dense/MatMul, Execution Time: 0.000950 seconds Add Node: bert/encoder/layer_7/intermediate/dense/BiasAdd, Execution Time: 0.001974 seconds Node: bert/encoder/layer_7/intermediate/dense/Pow, Execution Time: 0.001916 seconds Node: bert/encoder/layer_7/intermediate/dense/mul, Execution Time: 0.002038 seconds Add Node: bert/encoder/layer_7/intermediate/dense/add, Execution Time: 0.001887 seconds Node: bert/encoder/layer_7/intermediate/dense/mul_1, Execution Time: 0.001875 seconds Node: bert/encoder/layer_7/intermediate/dense/Tanh, Execution Time: 0.002064 seconds Add Node: bert/encoder/layer_7/intermediate/dense/add_1, Execution Time: 0.001889 seconds Node: bert/encoder/layer_7/intermediate/dense/mul_2, Execution Time: 0.001939 seconds Node: bert/encoder/layer_7/intermediate/dense/mul_3, Execution Time: 0.001944 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_7/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_7/output/dense/MatMul, Execution Time: 0.001181 seconds Add Node: bert/encoder/layer_7/output/dense/BiasAdd, Execution Time: 0.000527 seconds Add Node: bert/encoder/layer_7/output/add, Execution Time: 0.000661 seconds Node: bert/encoder/layer_7/output/LayerNorm/moments/mean, Execution Time: 0.000089 seconds Node: bert/encoder/layer_7/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000520 seconds Node: bert/encoder/layer_7/output/LayerNorm/moments/SquaredDifference__411, Execution Time: 0.000544 seconds Node: bert/encoder/layer_7/output/LayerNorm/moments/variance, Execution Time: 0.000075 seconds Add Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/add, Execution Time: 0.000050 seconds Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000056 seconds Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/Rsqrt__413, Execution Time: 0.000129 seconds Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul, Execution Time: 0.000044 seconds Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000043 seconds Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/sub, Execution Time: 0.000058 seconds Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000562 seconds Add Node: bert/encoder/layer_7/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000626 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_8/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_8/attention/self/value/MatMul, Execution Time: 0.000742 seconds Add Node: bert/encoder/layer_8/attention/self/value/BiasAdd, Execution Time: 0.000571 seconds Node: bert/encoder/layer_8/attention/self/Reshape_2, Execution Time: 0.000023 seconds Node: bert/encoder/layer_8/attention/self/transpose_2, Execution Time: 0.000514 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_8/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_8/attention/self/query/MatMul, Execution Time: 0.000766 seconds Add Node: bert/encoder/layer_8/attention/self/query/BiasAdd, Execution Time: 0.000573 seconds Node: bert/encoder/layer_8/attention/self/Reshape, Execution Time: 0.000023 seconds Node: bert/encoder/layer_8/attention/self/transpose, Execution Time: 0.000567 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_8/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_8/attention/self/key/MatMul, Execution Time: 0.000825 seconds Add Node: bert/encoder/layer_8/attention/self/key/BiasAdd, Execution Time: 0.000538 seconds Node: bert/encoder/layer_8/attention/self/Reshape_1, Execution Time: 0.000022 seconds Node: bert/encoder/layer_8/attention/self/MatMul__418, Execution Time: 0.000509 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_8/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_8/attention/self/MatMul, Execution Time: 0.000715 seconds Node: bert/encoder/layer_8/attention/self/Mul, Execution Time: 0.001661 seconds Add Node: bert/encoder/layer_8/attention/self/add, Execution Time: 0.001515 seconds Node: bert/encoder/layer_8/attention/self/Softmax, Execution Time: 0.001514 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_8/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_8/attention/self/MatMul_1, Execution Time: 0.000726 seconds Node: bert/encoder/layer_8/attention/self/transpose_3, Execution Time: 0.000521 seconds Node: bert/encoder/layer_8/attention/self/Reshape_3, Execution Time: 0.000063 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_8/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_8/attention/output/dense/MatMul, Execution Time: 0.000861 seconds Add Node: bert/encoder/layer_8/attention/output/dense/BiasAdd, Execution Time: 0.000516 seconds Add Node: bert/encoder/layer_8/attention/output/add, Execution Time: 0.000510 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/mean, Execution Time: 0.000094 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000504 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/SquaredDifference__421, Execution Time: 0.000531 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/moments/variance, Execution Time: 0.000079 seconds Add Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000049 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000052 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/Rsqrt__423, Execution Time: 0.000087 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000065 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000063 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000084 seconds Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000503 seconds Add Node: bert/encoder/layer_8/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000522 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_8/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_8/intermediate/dense/MatMul, Execution Time: 0.000727 seconds Add Node: bert/encoder/layer_8/intermediate/dense/BiasAdd, Execution Time: 0.001507 seconds Node: bert/encoder/layer_8/intermediate/dense/Pow, Execution Time: 0.001634 seconds Node: bert/encoder/layer_8/intermediate/dense/mul, Execution Time: 0.001581 seconds Add Node: bert/encoder/layer_8/intermediate/dense/add, Execution Time: 0.001411 seconds Node: bert/encoder/layer_8/intermediate/dense/mul_1, Execution Time: 0.002158 seconds Node: bert/encoder/layer_8/intermediate/dense/Tanh, Execution Time: 0.002181 seconds Add Node: bert/encoder/layer_8/intermediate/dense/add_1, Execution Time: 0.002447 seconds Node: bert/encoder/layer_8/intermediate/dense/mul_2, Execution Time: 0.001522 seconds Node: bert/encoder/layer_8/intermediate/dense/mul_3, Execution Time: 0.001564 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_8/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_8/output/dense/MatMul, Execution Time: 0.001133 seconds Add Node: bert/encoder/layer_8/output/dense/BiasAdd, Execution Time: 0.000553 seconds Add Node: bert/encoder/layer_8/output/add, Execution Time: 0.000525 seconds Node: bert/encoder/layer_8/output/LayerNorm/moments/mean, Execution Time: 0.000081 seconds Node: bert/encoder/layer_8/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000554 seconds Node: bert/encoder/layer_8/output/LayerNorm/moments/SquaredDifference__425, Execution Time: 0.000521 seconds Node: bert/encoder/layer_8/output/LayerNorm/moments/variance, Execution Time: 0.000072 seconds Add Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/add, Execution Time: 0.000053 seconds Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000072 seconds Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/Rsqrt__427, Execution Time: 0.000072 seconds Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul, Execution Time: 0.000059 seconds Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000055 seconds Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/sub, Execution Time: 0.000055 seconds Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000489 seconds Add Node: bert/encoder/layer_8/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000502 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_9/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_9/attention/self/value/MatMul, Execution Time: 0.000749 seconds Add Node: bert/encoder/layer_9/attention/self/value/BiasAdd, Execution Time: 0.000525 seconds Node: bert/encoder/layer_9/attention/self/Reshape_2, Execution Time: 0.000023 seconds Node: bert/encoder/layer_9/attention/self/transpose_2, Execution Time: 0.000478 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_9/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_9/attention/self/query/MatMul, Execution Time: 0.000729 seconds Add Node: bert/encoder/layer_9/attention/self/query/BiasAdd, Execution Time: 0.000517 seconds Node: bert/encoder/layer_9/attention/self/Reshape, Execution Time: 0.000029 seconds Node: bert/encoder/layer_9/attention/self/transpose, Execution Time: 0.000518 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_9/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_9/attention/self/key/MatMul, Execution Time: 0.000738 seconds Add Node: bert/encoder/layer_9/attention/self/key/BiasAdd, Execution Time: 0.000548 seconds Node: bert/encoder/layer_9/attention/self/Reshape_1, Execution Time: 0.000026 seconds Node: bert/encoder/layer_9/attention/self/MatMul__432, Execution Time: 0.000496 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_9/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_9/attention/self/MatMul, Execution Time: 0.000644 seconds Node: bert/encoder/layer_9/attention/self/Mul, Execution Time: 0.001557 seconds Add Node: bert/encoder/layer_9/attention/self/add, Execution Time: 0.001600 seconds Node: bert/encoder/layer_9/attention/self/Softmax, Execution Time: 0.001492 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_9/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_9/attention/self/MatMul_1, Execution Time: 0.000706 seconds Node: bert/encoder/layer_9/attention/self/transpose_3, Execution Time: 0.000526 seconds Node: bert/encoder/layer_9/attention/self/Reshape_3, Execution Time: 0.000126 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_9/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_9/attention/output/dense/MatMul, Execution Time: 0.000759 seconds Add Node: bert/encoder/layer_9/attention/output/dense/BiasAdd, Execution Time: 0.000531 seconds Add Node: bert/encoder/layer_9/attention/output/add, Execution Time: 0.000754 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/mean, Execution Time: 0.000087 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000511 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/SquaredDifference__435, Execution Time: 0.000521 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/moments/variance, Execution Time: 0.000084 seconds Add Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000048 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/Rsqrt__437, Execution Time: 0.000089 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000059 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000052 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000056 seconds Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000505 seconds Add Node: bert/encoder/layer_9/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000526 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_9/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_9/intermediate/dense/MatMul, Execution Time: 0.000951 seconds Add Node: bert/encoder/layer_9/intermediate/dense/BiasAdd, Execution Time: 0.001550 seconds Node: bert/encoder/layer_9/intermediate/dense/Pow, Execution Time: 0.001605 seconds Node: bert/encoder/layer_9/intermediate/dense/mul, Execution Time: 0.001486 seconds Add Node: bert/encoder/layer_9/intermediate/dense/add, Execution Time: 0.001552 seconds Node: bert/encoder/layer_9/intermediate/dense/mul_1, Execution Time: 0.001474 seconds Node: bert/encoder/layer_9/intermediate/dense/Tanh, Execution Time: 0.001496 seconds Add Node: bert/encoder/layer_9/intermediate/dense/add_1, Execution Time: 0.001672 seconds Node: bert/encoder/layer_9/intermediate/dense/mul_2, Execution Time: 0.001510 seconds Node: bert/encoder/layer_9/intermediate/dense/mul_3, Execution Time: 0.001506 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_9/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_9/output/dense/MatMul, Execution Time: 0.000965 seconds Add Node: bert/encoder/layer_9/output/dense/BiasAdd, Execution Time: 0.000566 seconds Add Node: bert/encoder/layer_9/output/add, Execution Time: 0.000555 seconds Node: bert/encoder/layer_9/output/LayerNorm/moments/mean, Execution Time: 0.000087 seconds Node: bert/encoder/layer_9/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000504 seconds Node: bert/encoder/layer_9/output/LayerNorm/moments/SquaredDifference__439, Execution Time: 0.000708 seconds Node: bert/encoder/layer_9/output/LayerNorm/moments/variance, Execution Time: 0.000077 seconds Add Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/add, Execution Time: 0.000058 seconds Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000055 seconds Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/Rsqrt__441, Execution Time: 0.000077 seconds Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul, Execution Time: 0.000058 seconds Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/sub, Execution Time: 0.000047 seconds Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000488 seconds Add Node: bert/encoder/layer_9/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000522 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_10/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_10/attention/self/value/MatMul, Execution Time: 0.002145 seconds Add Node: bert/encoder/layer_10/attention/self/value/BiasAdd, Execution Time: 0.000565 seconds Node: bert/encoder/layer_10/attention/self/Reshape_2, Execution Time: 0.000023 seconds Node: bert/encoder/layer_10/attention/self/transpose_2, Execution Time: 0.000578 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_10/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_10/attention/self/query/MatMul, Execution Time: 0.000732 seconds Add Node: bert/encoder/layer_10/attention/self/query/BiasAdd, Execution Time: 0.000525 seconds Node: bert/encoder/layer_10/attention/self/Reshape, Execution Time: 0.000022 seconds Node: bert/encoder/layer_10/attention/self/transpose, Execution Time: 0.000506 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_10/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_10/attention/self/key/MatMul, Execution Time: 0.000711 seconds Add Node: bert/encoder/layer_10/attention/self/key/BiasAdd, Execution Time: 0.000510 seconds Node: bert/encoder/layer_10/attention/self/Reshape_1, Execution Time: 0.000021 seconds Node: bert/encoder/layer_10/attention/self/MatMul__446, Execution Time: 0.000484 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_10/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_10/attention/self/MatMul, Execution Time: 0.000691 seconds Node: bert/encoder/layer_10/attention/self/Mul, Execution Time: 0.001509 seconds Add Node: bert/encoder/layer_10/attention/self/add, Execution Time: 0.001477 seconds Node: bert/encoder/layer_10/attention/self/Softmax, Execution Time: 0.001505 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_10/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_10/attention/self/MatMul_1, Execution Time: 0.000802 seconds Node: bert/encoder/layer_10/attention/self/transpose_3, Execution Time: 0.000508 seconds Node: bert/encoder/layer_10/attention/self/Reshape_3, Execution Time: 0.000071 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_10/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_10/attention/output/dense/MatMul, Execution Time: 0.001301 seconds Add Node: bert/encoder/layer_10/attention/output/dense/BiasAdd, Execution Time: 0.000725 seconds Add Node: bert/encoder/layer_10/attention/output/add, Execution Time: 0.000648 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/mean, Execution Time: 0.000116 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000646 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/SquaredDifference__449, Execution Time: 0.000779 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/moments/variance, Execution Time: 0.000078 seconds Add Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000062 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000050 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/Rsqrt__451, Execution Time: 0.000078 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000068 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000069 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000056 seconds Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000544 seconds Add Node: bert/encoder/layer_10/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000511 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_10/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_10/intermediate/dense/MatMul, Execution Time: 0.000758 seconds Add Node: bert/encoder/layer_10/intermediate/dense/BiasAdd, Execution Time: 0.001694 seconds Node: bert/encoder/layer_10/intermediate/dense/Pow, Execution Time: 0.001672 seconds Node: bert/encoder/layer_10/intermediate/dense/mul, Execution Time: 0.001566 seconds Add Node: bert/encoder/layer_10/intermediate/dense/add, Execution Time: 0.001636 seconds Node: bert/encoder/layer_10/intermediate/dense/mul_1, Execution Time: 0.001593 seconds Node: bert/encoder/layer_10/intermediate/dense/Tanh, Execution Time: 0.001675 seconds Add Node: bert/encoder/layer_10/intermediate/dense/add_1, Execution Time: 0.001609 seconds Node: bert/encoder/layer_10/intermediate/dense/mul_2, Execution Time: 0.001731 seconds Node: bert/encoder/layer_10/intermediate/dense/mul_3, Execution Time: 0.001667 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_10/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_10/output/dense/MatMul, Execution Time: 0.001178 seconds Add Node: bert/encoder/layer_10/output/dense/BiasAdd, Execution Time: 0.000525 seconds Add Node: bert/encoder/layer_10/output/add, Execution Time: 0.000566 seconds Node: bert/encoder/layer_10/output/LayerNorm/moments/mean, Execution Time: 0.000088 seconds Node: bert/encoder/layer_10/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000522 seconds Node: bert/encoder/layer_10/output/LayerNorm/moments/SquaredDifference__453, Execution Time: 0.000492 seconds Node: bert/encoder/layer_10/output/LayerNorm/moments/variance, Execution Time: 0.000065 seconds Add Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/add, Execution Time: 0.000057 seconds Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000059 seconds Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/Rsqrt__455, Execution Time: 0.000077 seconds Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul, Execution Time: 0.000053 seconds Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/sub, Execution Time: 0.000058 seconds Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000527 seconds Add Node: bert/encoder/layer_10/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000467 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_11/attention/self/value/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_11/attention/self/value/MatMul, Execution Time: 0.000770 seconds Add Node: bert/encoder/layer_11/attention/self/value/BiasAdd, Execution Time: 0.000548 seconds Node: bert/encoder/layer_11/attention/self/Reshape_2, Execution Time: 0.000024 seconds Node: bert/encoder/layer_11/attention/self/transpose_2, Execution Time: 0.000496 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_11/attention/self/query/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_11/attention/self/query/MatMul, Execution Time: 0.000994 seconds Add Node: bert/encoder/layer_11/attention/self/query/BiasAdd, Execution Time: 0.000512 seconds Node: bert/encoder/layer_11/attention/self/Reshape, Execution Time: 0.000021 seconds Node: bert/encoder/layer_11/attention/self/transpose, Execution Time: 0.000501 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_11/attention/self/key/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_11/attention/self/key/MatMul, Execution Time: 0.000724 seconds Add Node: bert/encoder/layer_11/attention/self/key/BiasAdd, Execution Time: 0.000537 seconds Node: bert/encoder/layer_11/attention/self/Reshape_1, Execution Time: 0.000020 seconds Node: bert/encoder/layer_11/attention/self/MatMul__460, Execution Time: 0.000478 seconds Input size: (12, 256, 64, 256) No Add node related to MatMul output: bert/encoder/layer_11/attention/self/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_11/attention/self/MatMul, Execution Time: 0.000700 seconds Node: bert/encoder/layer_11/attention/self/Mul, Execution Time: 0.001564 seconds Add Node: bert/encoder/layer_11/attention/self/add, Execution Time: 0.001570 seconds Node: bert/encoder/layer_11/attention/self/Softmax, Execution Time: 0.001483 seconds Input size: (12, 256, 256, 64) No Add node related to MatMul output: bert/encoder/layer_11/attention/self/MatMul_1. Executing regular MatMul. MatMul Node: bert/encoder/layer_11/attention/self/MatMul_1, Execution Time: 0.000719 seconds Node: bert/encoder/layer_11/attention/self/transpose_3, Execution Time: 0.000530 seconds Node: bert/encoder/layer_11/attention/self/Reshape_3, Execution Time: 0.000068 seconds Input size: (None, 256, 768, 768) No Add node related to MatMul output: bert/encoder/layer_11/attention/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_11/attention/output/dense/MatMul, Execution Time: 0.000749 seconds Add Node: bert/encoder/layer_11/attention/output/dense/BiasAdd, Execution Time: 0.000514 seconds Add Node: bert/encoder/layer_11/attention/output/add, Execution Time: 0.000556 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/mean, Execution Time: 0.000100 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000525 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/SquaredDifference__463, Execution Time: 0.000520 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/moments/variance, Execution Time: 0.000067 seconds Add Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/add, Execution Time: 0.000055 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000048 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/Rsqrt__465, Execution Time: 0.000086 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul, Execution Time: 0.000049 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000046 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/sub, Execution Time: 0.000064 seconds Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000474 seconds Add Node: bert/encoder/layer_11/attention/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000592 seconds Input size: (None, 256, 768, 3072) No Add node related to MatMul output: bert/encoder/layer_11/intermediate/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_11/intermediate/dense/MatMul, Execution Time: 0.000806 seconds Add Node: bert/encoder/layer_11/intermediate/dense/BiasAdd, Execution Time: 0.001625 seconds Node: bert/encoder/layer_11/intermediate/dense/Pow, Execution Time: 0.001478 seconds Node: bert/encoder/layer_11/intermediate/dense/mul, Execution Time: 0.001571 seconds Add Node: bert/encoder/layer_11/intermediate/dense/add, Execution Time: 0.001557 seconds Node: bert/encoder/layer_11/intermediate/dense/mul_1, Execution Time: 0.001958 seconds Node: bert/encoder/layer_11/intermediate/dense/Tanh, Execution Time: 0.002749 seconds Add Node: bert/encoder/layer_11/intermediate/dense/add_1, Execution Time: 0.001997 seconds Node: bert/encoder/layer_11/intermediate/dense/mul_2, Execution Time: 0.001461 seconds Node: bert/encoder/layer_11/intermediate/dense/mul_3, Execution Time: 0.001569 seconds Input size: (None, 256, 3072, 768) No Add node related to MatMul output: bert/encoder/layer_11/output/dense/MatMul. Executing regular MatMul. MatMul Node: bert/encoder/layer_11/output/dense/MatMul, Execution Time: 0.000994 seconds Add Node: bert/encoder/layer_11/output/dense/BiasAdd, Execution Time: 0.000538 seconds Add Node: bert/encoder/layer_11/output/add, Execution Time: 0.000495 seconds Node: bert/encoder/layer_11/output/LayerNorm/moments/mean, Execution Time: 0.000099 seconds Node: bert/encoder/layer_11/output/LayerNorm/moments/SquaredDifference, Execution Time: 0.000514 seconds Node: bert/encoder/layer_11/output/LayerNorm/moments/SquaredDifference__467, Execution Time: 0.000520 seconds Node: bert/encoder/layer_11/output/LayerNorm/moments/variance, Execution Time: 0.000106 seconds Add Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/add, Execution Time: 0.000049 seconds Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/Rsqrt, Execution Time: 0.000053 seconds Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/Rsqrt__469, Execution Time: 0.000093 seconds Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul, Execution Time: 0.000062 seconds Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul_2, Execution Time: 0.000056 seconds Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/sub, Execution Time: 0.000059 seconds Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/mul_1, Execution Time: 0.000480 seconds Add Node: bert/encoder/layer_11/output/LayerNorm/batchnorm/add_1, Execution Time: 0.000476 seconds Input size: (None, 256, 768, 2) No Add node related to MatMul output: MatMul. Executing regular MatMul. MatMul Node: MatMul, Execution Time: 0.002046 seconds Add Node: BiasAdd, Execution Time: 0.000067 seconds Node: Reshape_1, Execution Time: 0.000024 seconds Node: transpose, Execution Time: 0.000048 seconds Node: unstack, Execution Time: 0.000057 seconds Node: unstack__490, Execution Time: 0.000021 seconds Node: unstack__488, Execution Time: 0.000011 seconds Node Execution Times: Total Execution Time: 0.519998 seconds Total Matmul + Add Execution Time: 0.233186 seconds Execution complete. Model outputs: {'unstack:1': array([[-4.9148726, -4.6251225, -4.132886 , -4.1499195, -4.7828836, -4.250844 , -4.77094 , -4.348463 , -2.7006364, -4.424177 , -4.510866 , -4.39433 , -4.773833 , -4.480716 , -4.7714205, -4.6485815, -3.1330094, -4.7139587, -4.7148943, -4.7223635, -4.7008233, -4.6960616, -4.7121487, -4.708615 , -4.703374 , -4.7024655, -4.687359 , -4.693113 , -4.698162 , -4.692563 , -4.711712 , -4.7003703, -4.7027717, -4.7279253, -4.709934 , -4.715551 , -4.7324576, -4.7294855, -4.7329216, -4.7218866, -4.7014203, -4.694692 , -4.6925716, -4.700892 , -4.7044754, -4.68252 , -4.679993 , -4.6824126, -4.6833754, -4.690988 , -4.695919 , -4.6797957, -4.683871 , -4.6834297, -4.680781 , -4.686977 , -4.681429 , -4.680897 , -4.694978 , -4.685382 , -4.70324 , -4.7010674, -4.693331 , -4.7089696, -4.71908 , -4.7188516, -4.70435 , -4.685466 , -4.6962924, -4.6972375, -4.691828 , -4.688009 , -4.691449 , -4.693622 , -4.6890097, -4.6876435, -4.684474 , -4.7056074, -4.6984677, -4.7068577, -4.689911 , -4.687499 , -4.6927333, -4.693831 , -4.6965637, -4.693646 , -4.693519 , -4.71067 , -4.722037 , -4.718479 , -4.729904 , -4.721483 , -4.739112 , -4.7325935, -4.7295456, -4.712435 , -4.712704 , -4.7114053, -4.712399 , -4.704262 , -4.6972833, -4.6926665, -4.717176 , -4.6937675, -4.694539 , -4.711683 , -4.685275 , -4.6935816, -4.701117 , -4.6866083, -4.6843753, -4.6876745, -4.684178 , -4.694061 , -4.6890798, -4.6861553, -4.7003927, -4.7103863, -4.710601 , -4.7194986, -4.7016277, -4.718649 , -4.743214 , -4.7109504, -4.711556 , -4.7007613, -4.7009783, -4.6995244, -4.7007017, -4.7026825, -4.706376 , -4.7061615, -4.7284904, -4.724841 , -4.7082043, -4.7080393, -4.7098503, -4.7207146, -4.733838 , -4.7125974, -4.7276387, -4.721991 , -4.7300687, -4.7229652, -4.7133346, -4.7109923, -4.71963 , -4.7312083, -4.733224 , -4.7362647, -4.739877 , -4.74243 , -4.727128 , -4.737834 , -4.74598 , -4.738839 , -4.744508 , -4.728359 , -4.726734 , -4.7255516, -4.7363386, -4.73214 , -4.7196693, -4.721826 , -4.7047076, -4.7190104, -4.7156587, -4.706273 , -4.7116737, -4.701518 , -4.6943965, -4.6903934, -4.6890545, -4.6862764, -4.6875463, -4.684304 , -4.688264 , -4.691186 , -4.7027955, -4.6910152, -4.6985803, -4.7152886, -4.723945 , -4.7293673, -4.7427354, -4.73977 , -4.7290154, -4.7378254, -4.7355986, -4.731869 , -4.724579 , -4.7262163, -4.71887 , -4.7058587, -4.7122684, -4.7009015, -4.696829 , -4.7094407, -4.703914 , -4.703702 , -4.7195215, -4.7118044, -4.709847 , -4.721358 , -4.723019 , -4.71298 , -4.7218485, -4.724691 , -4.725982 , -4.726673 , -4.7187834, -4.709004 , -4.7109466, -4.737439 , -4.7246385, -4.73252 , -4.7404885, -4.7261868, -4.734698 , -4.732445 , -4.736647 , -4.724646 , -4.73208 , -4.7321663, -4.7037077, -4.718028 , -4.726786 , -4.7345347, -4.7328334, -4.7220054, -4.7327023, -4.7200413, -4.7459936, -4.728972 , -4.7290406, -4.7259574, -4.730495 , -4.723769 , -4.7380366, -4.7268267, -4.692981 , -4.718449 , -4.6935935, -4.6961823, -4.713647 , -4.6950507, -4.700345 , -4.7232556, -4.708386 , -4.737004 , -4.7273254, -4.716681 , -4.7106347, -4.714922 , -4.7030454, -4.7468524]], dtype=float32), 'unstack:0': array([[-5.339778 , -4.878685 , -4.312428 , -4.3309417, -5.125337 , -4.442749 , -5.1271124, -4.5656004, -4.683339 , -4.6350813, -4.8042274, -4.6028423, -5.1304255, -4.7185884, -5.0999007, -4.9003377, -5.1724668, -5.1058035, -5.1073008, -5.1120396, -5.0958624, -5.092071 , -5.104314 , -5.1013465, -5.0973773, -5.0955014, -5.086265 , -5.089708 , -5.093198 , -5.089909 , -5.1028776, -5.0938663, -5.0976443, -5.1154556, -5.102868 , -5.1068664, -5.1185074, -5.1169963, -5.118672 , -5.1110716, -5.0957775, -5.0914636, -5.089892 , -5.096351 , -5.099577 , -5.084194 , -5.082636 , -5.0841656, -5.0848293, -5.089616 , -5.0918293, -5.083179 , -5.084272 , -5.0856056, -5.0826926, -5.087329 , -5.0841713, -5.0831146, -5.092702 , -5.084974 , -5.0978565, -5.0952926, -5.090936 , -5.102818 , -5.110067 , -5.1097775, -5.0976253, -5.0851665, -5.0931044, -5.093152 , -5.089941 , -5.0872903, -5.0898356, -5.0923924, -5.0875926, -5.086853 , -5.085301 , -5.100186 , -5.094749 , -5.099969 , -5.0874996, -5.0855126, -5.0895004, -5.09137 , -5.0918326, -5.0898056, -5.090782 , -5.1034665, -5.112412 , -5.109096 , -5.1174197, -5.1111536, -5.1241746, -5.1188 , -5.116848 , -5.1029363, -5.1041894, -5.103745 , -5.105212 , -5.098095 , -5.093282 , -5.090341 , -5.1087084, -5.0905395, -5.0906925, -5.1039257, -5.084995 , -5.090868 , -5.0939407, -5.0842586, -5.0840406, -5.0855136, -5.08409 , -5.089621 , -5.0858765, -5.0852404, -5.09481 , -5.1036887, -5.1036325, -5.1107006, -5.0964427, -5.109834 , -5.128194 , -5.104343 , -5.10455 , -5.0965843, -5.0981956, -5.0968714, -5.0971923, -5.096769 , -5.1019425, -5.1022315, -5.119105 , -5.116201 , -5.102627 , -5.102922 , -5.1034007, -5.111492 , -5.121706 , -5.1049304, -5.116994 , -5.111964 , -5.1179514, -5.1140733, -5.1069007, -5.1045523, -5.1113954, -5.119346 , -5.1202354, -5.1230803, -5.1247115, -5.125494 , -5.1167865, -5.1235557, -5.127506 , -5.1223035, -5.124693 , -5.116798 , -5.1166444, -5.1148844, -5.1223955, -5.1191473, -5.111838 , -5.112754 , -5.1008034, -5.1111383, -5.1085505, -5.100999 , -5.1052284, -5.0974274, -5.0922704, -5.0895066, -5.089077 , -5.086511 , -5.0866723, -5.0855794, -5.0879817, -5.0893273, -5.0967927, -5.08802 , -5.093814 , -5.1059337, -5.112577 , -5.1154685, -5.121607 , -5.12036 , -5.114813 , -5.1212907, -5.1178846, -5.117335 , -5.1129055, -5.1143084, -5.109348 , -5.100045 , -5.1053514, -5.0964003, -5.0934987, -5.102238 , -5.0983605, -5.0989766, -5.1099577, -5.10423 , -5.1023245, -5.1104093, -5.111489 , -5.1045485, -5.110909 , -5.112187 , -5.1123652, -5.113932 , -5.10867 , -5.0995913, -5.101586 , -5.1216726, -5.111117 , -5.116669 , -5.12195 , -5.112778 , -5.1199346, -5.117032 , -5.120798 , -5.11272 , -5.117168 , -5.1175523, -5.09827 , -5.1082807, -5.1146145, -5.1200075, -5.1190424, -5.112625 , -5.1200185, -5.1110024, -5.126168 , -5.1168666, -5.11615 , -5.113571 , -5.118028 , -5.1132293, -5.122775 , -5.1154203, -5.091564 , -5.1100745, -5.0914884, -5.0932784, -5.105365 , -5.092105 , -5.0959387, -5.1119223, -5.101221 , -5.1215677, -5.114091 , -5.10658 , -5.101732 , -5.105737 , -5.0961223, -5.1260395]], dtype=float32), 'unique_ids:0': array([0])} Question: What is the capital of France? Context: The capital of France is Paris. Answer: Generating '/tmp/nsys-report-dbd3.qdstrm' [1/8] [0% ] nsys-report-a359.nsys-rep [1/8] [0% ] nsys-report-a359.nsys-rep [1/8] [6% ] nsys-report-a359.nsys-rep [1/8] [9% ] nsys-report-a359.nsys-rep [1/8] [8% ] nsys-report-a359.nsys-rep [1/8] [7% ] nsys-report-a359.nsys-rep [1/8] [6% ] nsys-report-a359.nsys-rep [1/8] [5% ] nsys-report-a359.nsys-rep [1/8] [===22% ] nsys-report-a359.nsys-rep [1/8] [==20% ] nsys-report-a359.nsys-rep [1/8] [==18% ] nsys-report-a359.nsys-rep [1/8] [==19% ] nsys-report-a359.nsys-rep [1/8] [==20% ] nsys-report-a359.nsys-rep [1/8] [==21% ] nsys-report-a359.nsys-rep [1/8] [===22% ] nsys-report-a359.nsys-rep [1/8] [===23% ] nsys-report-a359.nsys-rep [1/8] [===24% ] nsys-report-a359.nsys-rep [1/8] [====25% ] nsys-report-a359.nsys-rep [1/8] [====26% ] nsys-report-a359.nsys-rep [1/8] [====27% ] nsys-report-a359.nsys-rep [1/8] [====28% ] nsys-report-a359.nsys-rep [1/8] [=====29% ] nsys-report-a359.nsys-rep [1/8] [=====30% ] nsys-report-a359.nsys-rep [1/8] [=====31% ] nsys-report-a359.nsys-rep [1/8] [======34% ] nsys-report-a359.nsys-rep [1/8] [=======37% ] nsys-report-a359.nsys-rep [1/8] [=========45% ] nsys-report-a359.nsys-rep [1/8] [===========53% ] nsys-report-a359.nsys-rep [1/8] [============54% ] nsys-report-a359.nsys-rep [1/8] [==============62% ] nsys-report-a359.nsys-rep [1/8] [===============66% ] nsys-report-a359.nsys-rep [1/8] [================70% ] nsys-report-a359.nsys-rep [1/8] [==================76% ] nsys-report-a359.nsys-rep [1/8] [==================77% ] nsys-report-a359.nsys-rep [1/8] [==================78% ] nsys-report-a359.nsys-rep [1/8] [===================79% ] nsys-report-a359.nsys-rep [1/8] [===================80% ] nsys-report-a359.nsys-rep [1/8] [=====================87% ] nsys-report-a359.nsys-rep [1/8] [=======================94% ] nsys-report-a359.nsys-rep [1/8] [========================98% ] nsys-report-a359.nsys-rep [1/8] [========================100%] nsys-report-a359.nsys-rep [1/8] [========================100%] nsys-report-a359.nsys-rep [2/8] [0% ] nsys-report-4332.sqlite [2/8] [1% ] nsys-report-4332.sqlite [2/8] [2% ] nsys-report-4332.sqlite [2/8] [3% ] nsys-report-4332.sqlite [2/8] [4% ] nsys-report-4332.sqlite [2/8] [5% ] nsys-report-4332.sqlite [2/8] [6% ] nsys-report-4332.sqlite [2/8] [7% ] nsys-report-4332.sqlite [2/8] [8% ] nsys-report-4332.sqlite [2/8] [9% ] nsys-report-4332.sqlite [2/8] [10% ] nsys-report-4332.sqlite [2/8] [11% ] nsys-report-4332.sqlite [2/8] [12% ] nsys-report-4332.sqlite [2/8] [13% ] nsys-report-4332.sqlite [2/8] [14% ] nsys-report-4332.sqlite [2/8] [=15% ] nsys-report-4332.sqlite [2/8] [=16% ] nsys-report-4332.sqlite [2/8] [=17% ] nsys-report-4332.sqlite [2/8] [==18% ] nsys-report-4332.sqlite [2/8] [==19% ] nsys-report-4332.sqlite [2/8] [==20% ] nsys-report-4332.sqlite [2/8] [==21% ] nsys-report-4332.sqlite [2/8] [===22% ] nsys-report-4332.sqlite [2/8] [===23% ] nsys-report-4332.sqlite [2/8] [===24% ] nsys-report-4332.sqlite [2/8] [====25% ] nsys-report-4332.sqlite [2/8] [====26% ] nsys-report-4332.sqlite [2/8] [====27% ] nsys-report-4332.sqlite [2/8] [====28% ] nsys-report-4332.sqlite [2/8] [=====29% ] nsys-report-4332.sqlite [2/8] [=====30% ] nsys-report-4332.sqlite [2/8] [=====31% ] nsys-report-4332.sqlite [2/8] [=====32% ] nsys-report-4332.sqlite [2/8] [======33% ] nsys-report-4332.sqlite [2/8] [======34% ] nsys-report-4332.sqlite [2/8] [======35% ] nsys-report-4332.sqlite [2/8] [=======36% ] nsys-report-4332.sqlite [2/8] [=======37% ] nsys-report-4332.sqlite [2/8] [=======38% ] nsys-report-4332.sqlite [2/8] [=======39% ] nsys-report-4332.sqlite [2/8] [========40% ] nsys-report-4332.sqlite [2/8] [========41% ] nsys-report-4332.sqlite [2/8] [========42% ] nsys-report-4332.sqlite [2/8] [=========43% ] nsys-report-4332.sqlite [2/8] [=========44% ] nsys-report-4332.sqlite [2/8] [=========45% ] nsys-report-4332.sqlite [2/8] [=========46% ] nsys-report-4332.sqlite [2/8] [==========47% ] nsys-report-4332.sqlite [2/8] [==========48% ] nsys-report-4332.sqlite [2/8] [==========49% ] nsys-report-4332.sqlite [2/8] [===========50% ] nsys-report-4332.sqlite [2/8] [===========51% ] nsys-report-4332.sqlite [2/8] [===========52% ] nsys-report-4332.sqlite [2/8] [===========53% ] nsys-report-4332.sqlite [2/8] [============54% ] nsys-report-4332.sqlite [2/8] [============55% ] nsys-report-4332.sqlite [2/8] [============56% ] nsys-report-4332.sqlite [2/8] [============57% ] nsys-report-4332.sqlite [2/8] [=============58% ] nsys-report-4332.sqlite [2/8] [=============59% ] nsys-report-4332.sqlite [2/8] [=============60% ] nsys-report-4332.sqlite [2/8] [==============61% ] nsys-report-4332.sqlite [2/8] [==============62% ] nsys-report-4332.sqlite [2/8] [==============63% ] nsys-report-4332.sqlite [2/8] [==============64% ] nsys-report-4332.sqlite [2/8] [===============65% ] nsys-report-4332.sqlite [2/8] [===============66% ] nsys-report-4332.sqlite [2/8] [===============67% ] nsys-report-4332.sqlite [2/8] [================68% ] nsys-report-4332.sqlite [2/8] [================69% ] nsys-report-4332.sqlite [2/8] [================70% ] nsys-report-4332.sqlite [2/8] [================71% ] nsys-report-4332.sqlite [2/8] [=================72% ] nsys-report-4332.sqlite [2/8] [=================73% ] nsys-report-4332.sqlite [2/8] [=================74% ] nsys-report-4332.sqlite [2/8] [==================75% ] nsys-report-4332.sqlite [2/8] [==================76% ] nsys-report-4332.sqlite [2/8] [==================77% ] nsys-report-4332.sqlite [2/8] [==================78% ] nsys-report-4332.sqlite [2/8] [===================79% ] nsys-report-4332.sqlite [2/8] [===================80% ] nsys-report-4332.sqlite [2/8] [===================81% ] nsys-report-4332.sqlite [2/8] [===================82% ] nsys-report-4332.sqlite [2/8] [====================83% ] nsys-report-4332.sqlite [2/8] [====================84% ] nsys-report-4332.sqlite [2/8] [====================85% ] nsys-report-4332.sqlite [2/8] [=====================86% ] nsys-report-4332.sqlite [2/8] [=====================87% ] nsys-report-4332.sqlite [2/8] [=====================88% ] nsys-report-4332.sqlite [2/8] [=====================89% ] nsys-report-4332.sqlite [2/8] [======================90% ] nsys-report-4332.sqlite [2/8] [======================91% ] nsys-report-4332.sqlite [2/8] [======================92% ] nsys-report-4332.sqlite [2/8] [=======================93% ] nsys-report-4332.sqlite [2/8] [=======================94% ] nsys-report-4332.sqlite [2/8] [=======================95% ] nsys-report-4332.sqlite [2/8] [=======================96% ] nsys-report-4332.sqlite [2/8] [========================97% ] nsys-report-4332.sqlite [2/8] [========================98% ] nsys-report-4332.sqlite [2/8] [========================99% ] nsys-report-4332.sqlite [2/8] [========================100%] nsys-report-4332.sqlite [2/8] [========================100%] nsys-report-4332.sqlite [3/8] Executing 'nvtx_sum' stats report [4/8] Executing 'osrt_sum' stats report Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name -------- --------------- --------- ------------- ------------- ----------- ----------- ------------ ---------------------- 53.0 5,379,923,819 68 79,116,526.8 100,140,946.5 1,120 195,318,656 44,171,310.2 poll 44.3 4,501,072,548 9 500,119,172.0 500,089,739.0 500,083,210 500,365,743 92,611.1 pthread_cond_timedwait 1.7 169,966,424 5,645 30,109.2 790.0 290 156,348,110 2,080,925.3 read 0.7 75,547,694 3,053 24,745.4 7,400.0 210 13,567,883 347,835.0 ioctl 0.1 9,689,995 3,189 3,038.6 2,760.0 1,100 47,310 1,529.4 open64 0.0 5,062,449 1 5,062,449.0 5,062,449.0 5,062,449 5,062,449 0.0 nanosleep 0.0 3,655,800 135,467 27.0 20.0 20 6,820 46.9 pthread_cond_signal 0.0 3,051,371 139 21,952.3 5,090.0 1,990 1,588,811 135,272.1 mmap64 0.0 970,652 10 97,065.2 55,206.0 16,790 336,714 113,411.1 sem_timedwait 0.0 896,861 13 68,989.3 60,501.0 54,070 102,672 14,607.2 sleep 0.0 527,226 583 904.3 50.0 20 69,801 5,637.4 fgets 0.0 379,756 8 47,469.5 34,285.5 27,340 90,271 23,307.7 pthread_create 0.0 334,766 27 12,398.7 6,731.0 1,890 79,391 16,644.2 mmap 0.0 306,232 31 9,878.5 6,580.0 590 51,641 13,059.2 write 0.0 298,423 12 24,868.6 9,260.0 2,420 73,561 28,255.3 munmap 0.0 221,827 44 5,041.5 2,970.5 960 24,511 5,539.6 fopen 0.0 129,122 133 970.8 800.0 491 3,360 520.2 pread64 0.0 126,131 1 126,131.0 126,131.0 126,131 126,131 0.0 pthread_cond_wait 0.0 92,441 1 92,441.0 92,441.0 92,441 92,441 0.0 waitpid 0.0 58,821 41 1,434.7 1,120.0 620 4,630 883.9 fclose 0.0 55,951 15 3,730.1 3,190.0 1,820 6,870 1,786.7 open 0.0 55,646 1,622 34.3 30.0 20 5,050 150.7 pthread_cond_broadcast 0.0 35,250 2 17,625.0 17,625.0 9,240 26,010 11,858.2 connect 0.0 30,919 133 232.5 269.0 20 1,020 125.4 sigaction 0.0 30,130 1,211 24.9 20.0 20 230 8.2 flockfile 0.0 29,160 6 4,860.0 4,095.0 2,020 10,640 3,356.3 pipe2 0.0 27,791 4 6,947.8 6,830.0 3,010 11,121 4,054.6 socket 0.0 22,113 68 325.2 295.5 180 1,191 168.8 fcntl 0.0 19,880 6 3,313.3 2,584.5 1,211 7,190 2,139.7 fopen64 0.0 17,775 192 92.6 110.0 20 430 49.7 pthread_mutex_trylock 0.0 15,640 3 5,213.3 5,310.0 1,670 8,660 3,496.0 fread 0.0 6,840 2 3,420.0 3,420.0 1,580 5,260 2,602.2 bind 0.0 3,360 2 1,680.0 1,680.0 1,030 2,330 919.2 fwrite 0.0 2,670 30 89.0 30.0 20 860 174.5 fflush 0.0 2,641 10 264.1 260.0 200 340 53.7 dup 0.0 1,440 2 720.0 720.0 450 990 381.8 dup2 0.0 900 1 900.0 900.0 900 900 0.0 getc 0.0 750 1 750.0 750.0 750 750 0.0 listen [5/8] Executing 'cuda_api_sum' stats report Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name -------- --------------- --------- --------- --------- -------- ---------- ----------- --------------------------------- 69.5 554,863,571 1,998 277,709.5 60,631.0 2,210 2,639,775 418,746.2 cudaMemcpyAsync 15.4 123,069,139 1,998 61,596.2 11,050.5 650 266,844 76,253.9 cudaStreamSynchronize 9.6 76,880,061 804 95,622.0 7,510.0 2,640 16,544,823 873,475.2 cudaLaunchKernel 1.4 10,995,963 3,012 3,650.7 2,935.0 490 130,372 3,635.0 cudaDeviceSynchronize 1.2 9,791,616 98 99,914.4 86,646.0 3,490 325,235 87,991.0 cuCtxSynchronize 1.0 7,914,589 3,012 2,627.7 1,610.0 1,180 17,040 2,147.4 cudaEventRecord 0.8 6,255,114 25 250,204.6 900.0 280 6,234,124 1,246,649.9 cudaStreamIsCapturing_v10000 0.4 2,854,366 22 129,743.9 138,422.0 74,141 180,112 27,463.6 cudaMalloc 0.3 2,139,262 3,012 710.2 610.0 250 12,520 550.6 cudaEventCreateWithFlags 0.2 1,402,750 98 14,313.8 13,155.0 8,100 53,211 5,475.2 cuLaunchKernel 0.1 1,129,074 3,012 374.9 320.0 170 4,860 210.7 cudaEventDestroy 0.0 289,084 4 72,271.0 73,451.0 55,660 86,522 13,790.2 cuModuleLoadData 0.0 277,234 50 5,544.7 5,391.0 3,000 11,160 2,037.8 cudaMemsetAsync 0.0 271,102 1,149 235.9 200.0 50 5,130 255.2 cuGetProcAddress_v2 0.0 161,892 1 161,892.0 161,892.0 161,892 161,892 0.0 cudaGetDeviceProperties_v2_v12000 0.0 3,320 1 3,320.0 3,320.0 3,320 3,320 0.0 cuMemFree_v2 0.0 3,320 3 1,106.7 1,340.0 480 1,500 548.6 cuInit 0.0 770 1 770.0 770.0 770 770 0.0 cuCtxSetCurrent 0.0 670 3 223.3 250.0 60 360 151.8 cuModuleGetLoadingMode [6/8] Executing 'cuda_gpu_kern_sum' stats report Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name -------- --------------- --------- -------- -------- -------- -------- ----------- ---------------------------------------------------------------------------------------------------- 82.8 9,405,772 97 96,966.7 83,200.0 11,008 319,904 88,359.2 cutlass_tensorop_s1688tf32gemm_256x128_16x3_tt_align4 3.8 427,746 148 2,890.2 2,399.5 1,568 4,993 1,038.5 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::… 3.1 349,890 125 2,799.1 2,368.0 1,312 7,937 1,488.7 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::… 2.9 328,035 196 1,673.6 1,280.0 768 3,104 708.1 void at::native::vectorized_elementwise_kernel<(int)4, at::native::FillFunctor<float>, at::detail::… 1.6 178,464 50 3,569.3 3,520.0 3,488 3,968 108.2 void at::native::reduce_kernel<(int)512, (int)1, at::native::ReduceOp<float, at::native::MeanOps<fl… 1.3 144,578 88 1,642.9 960.0 863 4,353 1,127.0 void at::native::vectorized_elementwise_kernel<(int)4, at::native::CUDAFunctor_add<float>, at::deta… 1.2 131,327 48 2,736.0 2,368.0 2,304 3,968 652.0 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::… 0.9 103,808 12 8,650.7 8,640.0 8,608 8,673 21.0 void at::native::elementwise_kernel<(int)128, (int)2, void at::native::gpu_kernel_impl<at::native::… 0.8 96,448 37 2,606.7 1,824.0 1,760 4,384 1,175.8 void at::native::vectorized_elementwise_kernel<(int)4, at::native::BinaryFunctor<float, float, floa… 0.6 71,105 12 5,925.4 5,936.0 5,856 6,016 66.5 void <unnamed>::softmax_warp_forward<float, float, float, (int)8, (bool)0, (bool)0>(T2 *, const T1 … 0.4 45,790 12 3,815.8 3,808.0 3,712 3,936 65.5 void at::native::vectorized_elementwise_kernel<(int)4, at::native::tanh_kernel_cuda(at::TensorItera… 0.2 25,120 25 1,004.8 992.0 991 1,024 16.0 void at::native::vectorized_elementwise_kernel<(int)4, at::native::sqrt_kernel_cuda(at::TensorItera… 0.2 24,896 25 995.8 992.0 960 1,056 26.6 void at::native::vectorized_elementwise_kernel<(int)4, at::native::reciprocal_kernel_cuda(at::Tenso… 0.2 22,528 25 901.1 896.0 864 928 20.0 void at::native::vectorized_elementwise_kernel<(int)4, at::native::AUnaryFunctor<float, float, floa… 0.1 5,728 1 5,728.0 5,728.0 5,728 5,728 0.0 cutlass_tensorop_s1688tf32gemm_256x128_16x3_tt_align2 0.0 1,600 1 1,600.0 1,600.0 1,600 1,600 0.0 void at::native::<unnamed>::CatArrayBatchedCopy_aligned16_contig<int, unsigned int, (int)1, (int)12… [7/8] Executing 'cuda_gpu_mem_time_sum' stats report Time (%) Total Time (ns) Count Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Operation -------- --------------- ----- --------- --------- -------- --------- ----------- ---------------------------- 55.0 205,251,787 1,254 163,677.7 119,681.0 287 2,364,355 253,334.2 [CUDA memcpy Host-to-Device] 45.0 167,735,526 744 225,451.0 117,216.0 960 1,134,081 287,579.0 [CUDA memcpy Device-to-Host] 0.0 24,832 50 496.6 320.0 287 1,088 261.5 [CUDA memset] [8/8] Executing 'cuda_gpu_mem_size_sum' stats report Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation ---------- ----- -------- -------- -------- -------- ----------- ---------------------------- 1,328.322 1,254 1.059 0.786 0.000 9.437 1.597 [CUDA memcpy Host-to-Device] 811.321 744 1.090 0.786 0.000 3.146 1.160 [CUDA memcpy Device-to-Host] 0.000 50 0.000 0.000 0.000 0.000 0.000 [CUDA memset] Generated: /tmp/nsys-report-a359.nsys-rep /tmp/nsys-report-4332.sqlite
Leave a Comment