cutlass_warmup_separete_profiling_results
Running for input size: (1, 3, 416, 416) Average total time over 100 runs for each input size: 337.9050 ms Node 'Mul': 0.1551 ms Node 'convolution': 0.5733 ms Node 'activation': 0.5342 ms Node 'pooling': 0.3352 ms Node 'convolution1': 0.3765 ms Node 'activation1': 0.2801 ms Node 'pooling1': 0.1993 ms Node 'convolution2': 0.2840 ms Node 'activation2': 0.1576 ms Node 'pooling2': 0.1192 ms Node 'convolution3': 0.2320 ms Node 'activation3': 0.1064 ms Node 'pooling3': 0.0812 ms Node 'convolution4': 0.3795 ms Node 'activation4': 0.0735 ms Node 'pooling4': 0.0595 ms Node 'convolution5': 0.3166 ms Node 'activation5': 0.0532 ms Node 'pooling5': 0.0636 ms Node 'convolution6': 0.6744 ms Node 'activation6': 0.0763 ms Node 'convolution7': 1.1853 ms Node 'activation7': 0.0815 ms Node 'convolution8': 0.2243 ms ==================================================== Running for input size: (1, 3, 832, 832) Average total time over 100 runs for each input size: 1075.5562 ms Node 'Mul': 0.5912 ms Node 'convolution': 2.5961 ms Node 'activation': 2.7474 ms Node 'pooling': 1.5055 ms Node 'convolution1': 1.4064 ms Node 'activation1': 1.2594 ms Node 'pooling1': 0.8148 ms Node 'convolution2': 0.8713 ms Node 'activation2': 0.6582 ms Node 'pooling2': 0.4622 ms Node 'convolution3': 0.6529 ms Node 'activation3': 0.3833 ms Node 'pooling3': 0.2822 ms Node 'convolution4': 0.7199 ms Node 'activation4': 0.2361 ms Node 'pooling4': 0.1837 ms Node 'convolution5': 0.6882 ms Node 'activation5': 0.1620 ms Node 'pooling5': 0.1901 ms Node 'convolution6': 1.4530 ms Node 'activation6': 0.2600 ms Node 'convolution7': 2.5117 ms Node 'activation7': 0.2507 ms Node 'convolution8': 0.5070 ms ==================================================== Running for input size: (1, 3, 1664, 1664) Average total time over 100 runs for each input size: 3628.5349 ms Node 'Mul': 2.1358 ms Node 'convolution': 10.6580 ms Node 'activation': 11.9717 ms Node 'pooling': 6.5254 ms Node 'convolution1': 5.7377 ms Node 'activation1': 5.8658 ms Node 'pooling1': 3.1068 ms Node 'convolution2': 2.8133 ms Node 'activation2': 2.7624 ms Node 'pooling2': 1.6426 ms Node 'convolution3': 1.6196 ms Node 'activation3': 1.3315 ms Node 'pooling3': 0.9019 ms Node 'convolution4': 1.3671 ms Node 'activation4': 0.7316 ms Node 'pooling4': 0.5176 ms Node 'convolution5': 1.2895 ms Node 'activation5': 0.4364 ms Node 'pooling5': 0.4828 ms Node 'convolution6': 2.6588 ms Node 'activation6': 0.7528 ms Node 'convolution7': 4.5647 ms Node 'activation7': 0.7528 ms Node 'convolution8': 0.9994 ms ==================================================== Running for input size: (5, 3, 416, 416) Average total time over 100 runs for each input size: 4476.7466 ms Node 'Mul': 2.6608 ms Node 'convolution': 12.5797 ms Node 'activation': 14.3026 ms Node 'pooling': 8.0154 ms Node 'convolution1': 6.9119 ms Node 'activation1': 7.0723 ms Node 'pooling1': 3.9030 ms Node 'convolution2': 3.5400 ms Node 'activation2': 3.4052 ms Node 'pooling2': 2.0742 ms Node 'convolution3': 2.0819 ms Node 'activation3': 1.6766 ms Node 'pooling3': 1.1479 ms Node 'convolution4': 1.7612 ms Node 'activation4': 0.9313 ms Node 'pooling4': 0.6613 ms Node 'convolution5': 1.6853 ms Node 'activation5': 0.5571 ms Node 'pooling5': 0.6227 ms Node 'convolution6': 3.4477 ms Node 'activation6': 0.9554 ms Node 'convolution7': 5.9221 ms Node 'activation7': 0.9534 ms Node 'convolution8': 1.3097 ms ==================================================== Running for input size: (10, 3, 416, 416) Average total time over 100 runs for each input size: 6064.1758 ms Node 'Mul': 3.5691 ms Node 'convolution': 17.3962 ms Node 'activation': 19.8791 ms Node 'pooling': 10.8230 ms Node 'convolution1': 9.4174 ms Node 'activation1': 9.7821 ms Node 'pooling1': 5.3681 ms Node 'convolution2': 4.6853 ms Node 'activation2': 4.6296 ms Node 'pooling2': 2.8781 ms Node 'convolution3': 2.7989 ms Node 'activation3': 2.3176 ms Node 'pooling3': 1.5734 ms Node 'convolution4': 2.2970 ms Node 'activation4': 1.2913 ms Node 'pooling4': 0.9080 ms Node 'convolution5': 2.1625 ms Node 'activation5': 0.7547 ms Node 'pooling5': 0.8421 ms Node 'convolution6': 4.4564 ms Node 'activation6': 1.2981 ms Node 'convolution7': 7.5987 ms Node 'activation7': 1.2834 ms Node 'convolution8': 1.7039 ms ==================================================== Running for input size: (15, 3, 416, 416) Average total time over 100 runs for each input size: 8429.7783 ms Node 'Mul': 4.9464 ms Node 'convolution': 24.6652 ms Node 'activation': 28.3661 ms Node 'pooling': 15.5010 ms Node 'convolution1': 13.3157 ms Node 'activation1': 13.9849 ms Node 'pooling1': 7.5245 ms Node 'convolution2': 6.8280 ms Node 'activation2': 6.5654 ms Node 'pooling2': 3.9953 ms Node 'convolution3': 3.7297 ms Node 'activation3': 3.2070 ms Node 'pooling3': 2.1619 ms Node 'convolution4': 2.9553 ms Node 'activation4': 1.7590 ms Node 'pooling4': 1.2257 ms Node 'convolution5': 2.7248 ms Node 'activation5': 1.0218 ms Node 'pooling5': 1.1219 ms Node 'convolution6': 5.6751 ms Node 'activation6': 1.7566 ms Node 'convolution7': 9.5799 ms Node 'activation7': 1.7722 ms Node 'convolution8': 2.1868 ms ====================================================
Leave a Comment