Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blackhole] Llama3-8B end-to-end testing #18135

Open
1 of 6 tasks
mtairum opened this issue Feb 21, 2025 · 3 comments
Open
1 of 6 tasks

[Blackhole] Llama3-8B end-to-end testing #18135

mtairum opened this issue Feb 21, 2025 · 3 comments

Comments

@mtairum
Copy link
Contributor

mtairum commented Feb 21, 2025

Describe the bug
Now that all llama3 ops are supported in BH (see #16013), test the full Llama3 model on Blackhole.

Mainly testing Llama3-8B but also running 1B and 3B since they should just work as well.

Plan: Make sure the individual modules work before running the full model.

Reproduce tests

The code below should download the HF weights if not existent, and run the test.

export HF_MODEL=meta-llama/Llama-3.1-8B-Instruct
pytest models/demos/llama3/tests/<test_name>

Issues

Current Llama3 sharding specs do not work on BH.

Modules status

  • test_llama_mlp.py -k 32-1
    • [8B]: Bounds-Error -- Logical_core=(x=8,y=0) is outside of logical_grid_size=(x=8,y=1)
    • [1B]: tt-metal/ttnn/cpp/ttnn/operations/data_movement/sharded/interleaved_to_sharded/device/interleaved_to_sharded_op.cpp:21: this->output_mem_config.buffer_type == BufferType::L1
  • test_llama_attention.py -k paged
    • [1B] when creating wqkv weights: interleaved_to_sharded_op.cpp:21: this->output_mem_config.buffer_type == BufferType::L1
  • test_llama_rms_norm.py
  • test_llama_decoder.py
    • Needs the other modules to work first!
  • test_lm_head.py
    • [8B]: Bounds-Error -- Logical_core=(x=8,y=0) is outside of logical_grid_size=(x=8,y=1)
@mtairum mtairum added the bug Something isn't working label Feb 21, 2025
@mtairum mtairum self-assigned this Feb 21, 2025
@mtairum mtairum added P0 blackhole llama3 and removed bug Something isn't working labels Feb 21, 2025
@mtairum mtairum removed their assignment Feb 21, 2025
@mtairum mtairum changed the title [Blackhole] Llama3-8B bringup [Blackhole] Llama3-8B end-to-end testing Feb 21, 2025
@mtairum
Copy link
Contributor Author

mtairum commented Feb 21, 2025

For the MLP sharding, there's a good proxy unit test that's currently failing on BH:

pytest tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py::test_llama_mlp_width_sharded_to_interleaved_pcc_err

_________________________________________________________________________ test_llama_mlp_width_sharded_to_interleaved_pcc_err[32] __________________________________________________________________________
tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py:2468: in test_llama_mlp_width_sharded_to_interleaved_pcc_err
    tt_w1 = as_sharded_tensor(w1.t(), ttnn.bfloat8_b, dim=-1, mem_config=w1_w3_mem_config)
tests/tt_eager/python_api_testing/unit_testing/misc/test_sharded.py:2460: in <lambda>
    as_sharded_tensor = lambda w, type, dim, mem_config: ttnn.as_tensor(
ttnn/ttnn/decorators.py:333: in __call__
    return self.function(*function_args, **function_kwargs)
ttnn/ttnn/operations/core.py:600: in as_tensor
    return torch_to_ttnn(tensor, dtype, layout, device, memory_config, mesh_mapper)
ttnn/ttnn/operations/core.py:589: in torch_to_ttnn
    tensor = ttnn.from_torch(
ttnn/ttnn/decorators.py:333: in __call__
    return self.function(*function_args, **function_kwargs)
ttnn/ttnn/operations/core.py:237: in from_torch
    tensor = ttnn.to_device(tensor, device, memory_config=memory_config, cq_id=cq_id)
ttnn/ttnn/decorators.py:333: in __call__
    return self.function(*function_args, **function_kwargs)
E   RuntimeError: TT_FATAL @ /localdev/mtairum/tt-metal/ttnn/cpp/ttnn/operations/data_movement/sharded/interleaved_to_sharded/device/interleaved_to_sharded_op.cpp:21: this->output_mem_config.buffer_type == BufferType::L1
E   info:
E   Error
E   backtrace:
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xe96b09) [0x7ff698547b09]
E    --- ttnn::operations::data_movement::InterleavedToShardedDeviceOperation::validate(std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&) const
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(_ZN4ttnn16device_operation6detail23launch_on_worker_threadIN2tt8tt_metal9operation23OldInfraDeviceOperationINSt3__16vectorINS4_6TensorENS7_9allocatorIS9_EEEEEENS3_3stl10StrongTypeIhNS_10QueueIdTagEEElNS5_15DeviceOperationISC_EENSD_13tensor_args_tESC_PNS4_2v07IDeviceEEEvT0_T1_RKT2_RKT3_RT4_RT5_+0x398) [0x7ff69a7a51e8]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(_ZN4ttnn16device_operation6detail23launch_on_single_deviceIN2tt8tt_metal9operation23OldInfraDeviceOperationINSt3__16vectorINS4_6TensorENS7_9allocatorIS9_EEEEEEEENT_21tensor_return_value_tENS3_3stl10StrongTypeIhNS_10QueueIdTagEEERKNSE_22operation_attributes_tERKNSE_13tensor_args_tE+0xe4) [0x7ff69a7a4d24]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x30f3b9f) [0x7ff69a7a4b9f]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(_ZN4ttnn16device_operation6detail6invokeIN2tt8tt_metal9operation23OldInfraDeviceOperationINSt3__16vectorINS4_6TensorENS7_9allocatorIS9_EEEEEEEENT_21tensor_return_value_tENS3_3stl10StrongTypeIhNS_10QueueIdTagEEERKNSE_22operation_attributes_tERKNSE_13tensor_args_tE+0x1f6) [0x7ff69a7a4496]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x30f2d6f) [0x7ff69a7a3d6f]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x30f2985) [0x7ff69a7a3985]
E    --- std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> tt::tt_metal::operation::run<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>(tt::tt_metal::operation::DeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>&&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>> const&, tt::stl::StrongType<unsigned char, ttnn::QueueIdTag>)
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf1a7e3) [0x7ff6985cb7e3]
E    --- ttnn::operations::data_movement::InterleavedToShardedOperation::invoke(tt::stl::StrongType<unsigned char, ttnn::QueueIdTag>, tt::tt_metal::Tensor const&, tt::tt_metal::MemoryConfig const&, std::__1::optional<tt::tt_metal::DataType> const&, std::__1::optional<bool> const&)
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf77124) [0x7ff698628124]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf76eeb) [0x7ff698627eeb]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf76e42) [0x7ff698627e42]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x3101b75) [0x7ff69a7b2b75]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0x31022c7) [0x7ff69a7b32c7]
E    --- /localdev/mtairum/tt-metal/build_Release_tracy/lib/libtt_metal.so(+0x19c02f) [0x7ff696d1f02f]
E    --- void tt::tt_metal::operation::launch_op_func<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>(std::__1::function<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> (std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>> const&)> const&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>>, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>>, bool)
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf7693a) [0x7ff69862793a]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf76277) [0x7ff698627277]
E    --- /localdev/mtairum/tt-metal/ttnn/ttnn/_ttnn.so(+0xf75cf8) [0x7ff698626cf8]
E    --- ttnn::operations::core::to_device(tt::tt_metal::Tensor const&, tt::tt_metal::v0::IDevice*, std::__1::optional<tt::tt_metal::MemoryConfig> const&, tt::stl::StrongType<unsigned char, ttnn::QueueIdTag>)    

@ejouretTT
Copy link

Initial testing shows ops issue in Matrix Multiply with DRAM sharding.

@mtairum to open issue and assign @bbradelTT.

@bbradelTT
Copy link
Contributor

Initial testing shows ops issue in Matrix Multiply with DRAM sharding.

@mtairum to open issue and assign @bbradelTT.

The error is in i2s, not matmul. I re-assigned the issue to @llongTT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants