Untitled

(mlc-prebuilt) cfruan@catalyst-fleet:/ssd1/cfruan/mlc-llm$ python3 build.py --model=WizardMath-7B-V1.0 --quantization=q4f32_1 --target=webgpu --use-cache=0
Using path "dist/models/WizardMath-7B-V1.0" for model "WizardMath-7B-V1.0"
Target configured: webgpu -keys=webgpu,gpu -max_num_threads=256
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_89 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Start computing and quantizing weights... This may take a while.
Finish computing and quantizing weights.
Total param size: 3.9250688552856445 GB
Start storing to cache dist/WizardMath-7B-V1.0-q4f32_1/params
[0327/0327] saving param_326
All finished, 132 total shards committed, record saved to dist/WizardMath-7B-V1.0-q4f32_1/params/ndarray-cache.json
Finish exporting chat config to dist/WizardMath-7B-V1.0-q4f32_1/params/mlc-chat-config.json
[10:00:57] /workspace/tvm/include/tvm/topi/transform.h:1076: Warning: Fast mode segfaults when there are out-of-bounds indices. Make sure input indices are in bound
[10:00:58] /workspace/tvm/include/tvm/topi/transform.h:1076: Warning: Fast mode segfaults when there are out-of-bounds indices. Make sure input indices are in bound
Save a cached module to dist/WizardMath-7B-V1.0-q4f32_1/mod_cache_before_build.pkl.
[10:01:07] /workspace/tvm/src/target/llvm/codegen_llvm.cc:185: Warning: Set native vector bits to be 128 for wasm32
Finish exporting to dist/WizardMath-7B-V1.0-q4f32_1/WizardMath-7B-V1.0-q4f32_1-webgpu.wasm
Editor is loading...