DRAM width sharded tensors failing on blackhole #18144

mtairum · 2025-02-21T16:28:48Z

Part of the Llama3 end-to-end testing on blackhole: #18135

We rely on DRAM sharded matmuls for our 3 MLP MMs.

Our codebase parametrizes most of our memory configurations, and for DRAM-sharded, it expects 12 DRAM cores, but I think BH only has 8.
Changing this I see the following error when creating tensor with sharded specs:

E RuntimeError: TT_FATAL @ /localdev/mtairum/tt-metal/ttnn/cpp/ttnn/operations/data_movement/sharded/interleaved_to_sharded/device/interleaved_to_sharded_op.cpp:21: this->output_mem_config.buffer_type == BufferType::L1

How to reproduce

I found we have a unit test that's a good proxy for the issue above:

pytest tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_llama_mlp_width_sharded_to_interleaved_pcc_err

The dram_core_range_set is set to 12 cores, so some changes will be needed here. Below is a change to the test above, with core range to be 8 cores, and the shardSpecs to shard over the 8 cores.

    dram_core_range_set = ttnn.CoreRangeSet(
        {
            ttnn.CoreRange(
                ttnn.CoreCoord(0, 0),
                ttnn.CoreCoord(device.dram_grid_size().x - 1, device.dram_grid_size().y - 1),
                # ttnn.CoreCoord(11, 0),
            ),
        }
    )
    w1_w3_mem_config = ttnn.MemoryConfig(
        ttnn.TensorMemoryLayout.WIDTH_SHARDED,
        ttnn.BufferType.DRAM,
        ttnn.ShardSpec(dram_core_range_set, (4096, 448), ttnn.ShardOrientation.ROW_MAJOR),
    )
    w2_mem_config = ttnn.MemoryConfig(
        ttnn.TensorMemoryLayout.WIDTH_SHARDED,
        ttnn.BufferType.DRAM,
        ttnn.ShardSpec(dram_core_range_set, (3584, 512), ttnn.ShardOrientation.ROW_MAJOR),
    )

When the above is passing, you can test the full MLP block used in Llama.
Will need to change conftest to accept blackhole, example in this commit: main...mtairum/llama3-blackhole

export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct
#export HF_MODEL=meta-llama/Llama-3.2-1B-Instruct  # Also try 1B/3B, has it has different shapes and should also be supported. 

# Run the decode test
pytest models/demos/llama3/tests/test_llama_mlp.py -k 32-1
# Run the prefill test
pytest models/demos/llama3/tests/test_llama_mlp.py -k 32768-1

Full error traceback

_________________________________________________________________________ test_llama_mlp_width_sharded_to_interleaved_pcc_err[32] __________________________________________________________________________
tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py:2468: in test_llama_mlp_width_sharded_to_interleaved_pcc_err
    tt_w1 = as_sharded_tensor(w1.t(), ttnn.bfloat8_b, dim=-1, mem_config=w1_w3_mem_config)
tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py:2460: in <lambda>
    as_sharded_tensor = lambda w, type, dim, mem_config: ttnn.as_tensor(
ttnn/ttnn/decorators.py:333: in __call__
    return self.function(*function_args, **function_kwargs)
ttnn/ttnn/operations/core.py:600: in as_tensor
    return torch_to_ttnn(tensor, dtype, layout, device, memory_config, mesh_mapper)
ttnn/ttnn/operations/core.py:589: in torch_to_ttnn
    tensor = ttnn.from_torch(
ttnn/ttnn/decorators.py:333: in __call__
    return self.function(*function_args, **function_kwargs)
ttnn/ttnn/operations/core.py:237: in from_torch
    tensor = ttnn.to_device(tensor, device, memory_config=memory_config, cq_id=cq_id)
ttnn/ttnn/decorators.py:333: in __call__
    return self.function(*function_args, **function_kwargs)
E   RuntimeError: TT_FATAL @ /localdev/mtairum/tt-metal/ttnn/cpp/ttnn/operations/data_movement/sharded/interleaved_to_sharded/device/interleaved_to_sharded_op.cpp:21: this->output_mem_config.buffer_type == BufferType::L1
E   info:
E   Error
E   backtrace:
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xe96b09) [0x7ff698547b09]
E    --- ttnn::operations::data_movement::InterleavedToShardedDeviceOperation::validate(std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&) const
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(_ZN4ttnn16device_operation6detail23launch_on_worker_threadIN2tt8tt_metal9operation23OldInfraDeviceOperationINSt3__16vectorINS4_6TensorENS7_9allocatorIS9_EEEEEENS3_3stl10StrongTypeIhNS_10QueueIdTagEEElNS5_15DeviceOperationISC_EENSD_13tensor_args_tESC_PNS4_2v07IDeviceEEEvT0_T1_RKT2_RKT3_RT4_RT5_+0x398) [0x7ff69a7a51e8]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(_ZN4ttnn16device_operation6detail23launch_on_single_deviceIN2tt8tt_metal9operation23OldInfraDeviceOperationINSt3__16vectorINS4_6TensorENS7_9allocatorIS9_EEEEEEEENT_21tensor_return_value_tENS3_3stl10StrongTypeIhNS_10QueueIdTagEEERKNSE_22operation_attributes_tERKNSE_13tensor_args_tE+0xe4) [0x7ff69a7a4d24]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x30f3b9f) [0x7ff69a7a4b9f]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(_ZN4ttnn16device_operation6detail6invokeIN2tt8tt_metal9operation23OldInfraDeviceOperationINSt3__16vectorINS4_6TensorENS7_9allocatorIS9_EEEEEEEENT_21tensor_return_value_tENS3_3stl10StrongTypeIhNS_10QueueIdTagEEERKNSE_22operation_attributes_tERKNSE_13tensor_args_tE+0x1f6) [0x7ff69a7a4496]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x30f2d6f) [0x7ff69a7a3d6f]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x30f2985) [0x7ff69a7a3985]
E    --- std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> tt::tt_metal::operation::run<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>(tt::tt_metal::operation::DeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>&&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>> const&, tt::stl::StrongType<unsigned char, ttnn::QueueIdTag>)
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf1a7e3) [0x7ff6985cb7e3]
E    --- ttnn::operations::data_movement::InterleavedToShardedOperation::invoke(tt::stl::StrongType<unsigned char, ttnn::QueueIdTag>, tt::tt_metal::Tensor const&, tt::tt_metal::MemoryConfig const&, std::__1::optional<tt::tt_metal::DataType> const&, std::__1::optional<bool> const&)
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf77124) [0x7ff698628124]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf76eeb) [0x7ff698627eeb]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf76e42) [0x7ff698627e42]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x3101b75) [0x7ff69a7b2b75]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x31022c7) [0x7ff69a7b32c7]
E    --- /localdev/mtairum/tt-metal/build_Release_tracy/lib/libtt_metal.so(+0x19c02f) [0x7ff696d1f02f]
E    --- void tt::tt_metal::operation::launch_op_func<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>(std::__1::function<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> (std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>> const&)> const&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>>, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>>, bool)
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf7693a) [0x7ff69862793a]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf76277) [0x7ff698627277]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf75cf8) [0x7ff698626cf8]
E    --- ttnn::operations::core::to_device(tt::tt_metal::Tensor const&, tt::tt_metal::v0::IDevice*, std::__1::optional<tt::tt_metal::MemoryConfig> const&, tt::stl::StrongType<unsigned char, ttnn::QueueIdTag>)

The text was updated successfully, but these errors were encountered:

bbradelTT · 2025-02-21T16:47:33Z

The failure is in i2s. Reassigning to @llongTT on @ntarafdar 's team.

llongTT · 2025-02-21T16:50:33Z

Looking into it.

mtairum · 2025-02-21T17:08:49Z

Added changes to tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_llama_mlp_width_sharded_to_interleaved_pcc_err to use how many cores the target architecture has, and updated the shard specs to account for 8 dram cores instead of 12.

When this test is passing will need to update it to come up with the shard spec based on the #dram_cores

mtairum added the bug Something isn't working label Feb 21, 2025

This was referenced Feb 21, 2025

[Blackhole] Llama3-8B end-to-end testing #18135

Open

Llama3 model family - list of required ops for blackhole #16013

Open

mtairum assigned mtairum and bbradelTT and unassigned mtairum Feb 21, 2025

mtairum added llama3 P0 blackhole labels Feb 21, 2025

bbradelTT assigned llongTT and unassigned bbradelTT Feb 21, 2025

mtairum changed the title ~~DRAM-sharded matmul failing on blackhole~~ Creating sharded tensor failing on blackhole Feb 21, 2025

mtairum changed the title ~~Creating sharded tensor failing on blackhole~~ DRAM width sharded tensors failing on blackhole Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAM width sharded tensors failing on blackhole #18144

DRAM width sharded tensors failing on blackhole #18144

mtairum commented Feb 21, 2025 •

edited

Loading

bbradelTT commented Feb 21, 2025

llongTT commented Feb 21, 2025

mtairum commented Feb 21, 2025

DRAM width sharded tensors failing on blackhole #18144

DRAM width sharded tensors failing on blackhole #18144

Comments

mtairum commented Feb 21, 2025 • edited Loading

How to reproduce

Full error traceback

bbradelTT commented Feb 21, 2025

llongTT commented Feb 21, 2025

mtairum commented Feb 21, 2025

mtairum commented Feb 21, 2025 •

edited

Loading