Untitled
plain_text
a month ago
1.6 kB
1
Indexable
Never
(mlc-prebuilt) cfruan@catalyst-fleet:/ssd1/cfruan/mlc-llm$ python3 build.py --model=WizardMath-7B-V1.0 --quantization=q4f32_1 --target=webgpu --use-cache=0 Using path "dist/models/WizardMath-7B-V1.0" for model "WizardMath-7B-V1.0" Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256 Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_89 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32 Start computing and quantizing weights... This may take a while. Finish computing and quantizing weights. Total param size: 3.9250688552856445 GB Start storing to cache dist/WizardMath-7B-V1.0-q4f32_1/params [0327/0327] saving param_326 All finished, 132 total shards committed, record saved to dist/WizardMath-7B-V1.0-q4f32_1/params/ndarray-cache.json Finish exporting chat config to dist/WizardMath-7B-V1.0-q4f32_1/params/mlc-chat-config.json [10:00:57] /workspace/tvm/include/tvm/topi/transform.h:1076: Warning: Fast mode segfaults when there are out-of-bounds indices. Make sure input indices are in bound [10:00:58] /workspace/tvm/include/tvm/topi/transform.h:1076: Warning: Fast mode segfaults when there are out-of-bounds indices. Make sure input indices are in bound Save a cached module to dist/WizardMath-7B-V1.0-q4f32_1/mod_cache_before_build.pkl. [10:01:07] /workspace/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32 Finish exporting to dist/WizardMath-7B-V1.0-q4f32_1/WizardMath-7B-V1.0-q4f32_1-webgpu.wasm