[Blackhole] Llama3-8B end-to-end testing #18135

mtairum · 2025-02-21T12:01:01Z

Describe the bug
Now that all llama3 ops are supported in BH (see #16013), test the full Llama3 model on Blackhole.

Mainly testing Llama3-8B but also running 1B and 3B since they should just work as well.

Plan: Make sure the individual modules work before running the full model.

Reproduce tests

The code below should download the HF weights if not existent, and run the test.

export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct
pytest models/demos/llama3/tests/<test_name>

Issues

Current Llama3 sharding specs do not work on BH.

DRAM-sharded matmul is failing on MLP module. DRAM width sharded tensors failing on blackhole #18144

Modules status

test_llama_mlp.py -k 32-1
- [8B]: Bounds-Error -- Logical_core=(x=8,y=0) is outside of logical_grid_size=(x=8,y=1)
- [1B]: tt-metal/ttnn/cpp/ttnn/operations/data_movement/sharded/interleaved_to_sharded/device/interleaved_to_sharded_op.cpp:21: this->output_mem_config.buffer_type == BufferType::L1
test_llama_attention.py -k paged
- [1B] when creating wqkv weights: interleaved_to_sharded_op.cpp:21: this->output_mem_config.buffer_type == BufferType::L1
test_llama_rms_norm.py
test_llama_decoder.py
- Needs the other modules to work first!
test_lm_head.py
- [8B]: Bounds-Error -- Logical_core=(x=8,y=0) is outside of logical_grid_size=(x=8,y=1)

The text was updated successfully, but these errors were encountered:

mtairum · 2025-02-21T16:03:38Z

For the MLP sharding, there's a good proxy unit test that's currently failing on BH:

pytest tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_llama_mlp_width_sharded_to_interleaved_pcc_err

_________________________________________________________________________ test_llama_mlp_width_sharded_to_interleaved_pcc_err[32] __________________________________________________________________________
tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py:2468: in test_llama_mlp_width_sharded_to_interleaved_pcc_err
    tt_w1 = as_sharded_tensor(w1.t(), ttnn.bfloat8_b, dim=-1, mem_config=w1_w3_mem_config)
tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py:2460: in <lambda>
    as_sharded_tensor = lambda w, type, dim, mem_config: ttnn.as_tensor(
ttnn/ttnn/decorators.py:333: in __call__
    return self.function(*function_args, **function_kwargs)
ttnn/ttnn/operations/core.py:600: in as_tensor
    return torch_to_ttnn(tensor, dtype, layout, device, memory_config, mesh_mapper)
ttnn/ttnn/operations/core.py:589: in torch_to_ttnn
    tensor = ttnn.from_torch(
ttnn/ttnn/decorators.py:333: in __call__
    return self.function(*function_args, **function_kwargs)
ttnn/ttnn/operations/core.py:237: in from_torch
    tensor = ttnn.to_device(tensor, device, memory_config=memory_config, cq_id=cq_id)
ttnn/ttnn/decorators.py:333: in __call__
    return self.function(*function_args, **function_kwargs)
E   RuntimeError: TT_FATAL @ /localdev/mtairum/tt-metal/ttnn/cpp/ttnn/operations/data_movement/sharded/interleaved_to_sharded/device/interleaved_to_sharded_op.cpp:21: this->output_mem_config.buffer_type == BufferType::L1
E   info:
E   Error
E   backtrace:
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xe96b09) [0x7ff698547b09]
E    --- ttnn::operations::data_movement::InterleavedToShardedDeviceOperation::validate(std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&) const
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(_ZN4ttnn16device_operation6detail23launch_on_worker_threadIN2tt8tt_metal9operation23OldInfraDeviceOperationINSt3__16vectorINS4_6TensorENS7_9allocatorIS9_EEEEEENS3_3stl10StrongTypeIhNS_10QueueIdTagEEElNS5_15DeviceOperationISC_EENSD_13tensor_args_tESC_PNS4_2v07IDeviceEEEvT0_T1_RKT2_RKT3_RT4_RT5_+0x398) [0x7ff69a7a51e8]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(_ZN4ttnn16device_operation6detail23launch_on_single_deviceIN2tt8tt_metal9operation23OldInfraDeviceOperationINSt3__16vectorINS4_6TensorENS7_9allocatorIS9_EEEEEEEENT_21tensor_return_value_tENS3_3stl10StrongTypeIhNS_10QueueIdTagEEERKNSE_22operation_attributes_tERKNSE_13tensor_args_tE+0xe4) [0x7ff69a7a4d24]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x30f3b9f) [0x7ff69a7a4b9f]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(_ZN4ttnn16device_operation6detail6invokeIN2tt8tt_metal9operation23OldInfraDeviceOperationINSt3__16vectorINS4_6TensorENS7_9allocatorIS9_EEEEEEEENT_21tensor_return_value_tENS3_3stl10StrongTypeIhNS_10QueueIdTagEEERKNSE_22operation_attributes_tERKNSE_13tensor_args_tE+0x1f6) [0x7ff69a7a4496]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x30f2d6f) [0x7ff69a7a3d6f]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x30f2985) [0x7ff69a7a3985]
E    --- std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> tt::tt_metal::operation::run<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>(tt::tt_metal::operation::DeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>&&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>> const&, tt::stl::StrongType<unsigned char, ttnn::QueueIdTag>)
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf1a7e3) [0x7ff6985cb7e3]
E    --- ttnn::operations::data_movement::InterleavedToShardedOperation::invoke(tt::stl::StrongType<unsigned char, ttnn::QueueIdTag>, tt::tt_metal::Tensor const&, tt::tt_metal::MemoryConfig const&, std::__1::optional<tt::tt_metal::DataType> const&, std::__1::optional<bool> const&)
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf77124) [0x7ff698628124]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf76eeb) [0x7ff698627eeb]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf76e42) [0x7ff698627e42]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x3101b75) [0x7ff69a7b2b75]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x31022c7) [0x7ff69a7b32c7]
E    --- /localdev/mtairum/tt-metal/build_Release_tracy/lib/libtt_metal.so(+0x19c02f) [0x7ff696d1f02f]
E    --- void tt::tt_metal::operation::launch_op_func<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>(std::__1::function<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> (std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>> const&)> const&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>>, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>>, bool)
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf7693a) [0x7ff69862793a]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf76277) [0x7ff698627277]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf75cf8) [0x7ff698626cf8]
E    --- ttnn::operations::core::to_device(tt::tt_metal::Tensor const&, tt::tt_metal::v0::IDevice*, std::__1::optional<tt::tt_metal::MemoryConfig> const&, tt::stl::StrongType<unsigned char, ttnn::QueueIdTag>)

ejouretTT · 2025-02-21T16:19:33Z

Initial testing shows ops issue in Matrix Multiply with DRAM sharding.

@mtairum to open issue and assign @bbradelTT.

bbradelTT · 2025-02-21T16:48:50Z

Initial testing shows ops issue in Matrix Multiply with DRAM sharding.

@mtairum to open issue and assign @bbradelTT.

The error is in i2s, not matmul. I re-assigned the issue to @llongTT

mtairum added the bug Something isn't working label Feb 21, 2025

mtairum self-assigned this Feb 21, 2025

mtairum added P0 blackhole llama3 and removed bug Something isn't working labels Feb 21, 2025

mtairum mentioned this issue Feb 21, 2025

Llama3 model family - list of required ops for blackhole #16013

Open

mtairum removed their assignment Feb 21, 2025

mtairum changed the title ~~[Blackhole] Llama3-8B bringup~~ [Blackhole] Llama3-8B end-to-end testing Feb 21, 2025

mtairum mentioned this issue Feb 21, 2025

DRAM width sharded tensors failing on blackhole #18144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Blackhole] Llama3-8B end-to-end testing #18135

[Blackhole] Llama3-8B end-to-end testing #18135

mtairum commented Feb 21, 2025 •

edited

Loading

mtairum commented Feb 21, 2025 •

edited

Loading

ejouretTT commented Feb 21, 2025

bbradelTT commented Feb 21, 2025

[Blackhole] Llama3-8B end-to-end testing #18135

[Blackhole] Llama3-8B end-to-end testing #18135

Comments

mtairum commented Feb 21, 2025 • edited Loading

Reproduce tests

Issues

Modules status

mtairum commented Feb 21, 2025 • edited Loading

ejouretTT commented Feb 21, 2025

bbradelTT commented Feb 21, 2025

mtairum commented Feb 21, 2025 •

edited

Loading

mtairum commented Feb 21, 2025 •

edited

Loading