Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ollama: add reasoning model support (e.g. deepseek) #29689

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BobMerkus
Copy link
Contributor

Description

This PR adds reasoning model support for langchain-ollama by extracting reasoning token blocks, like those used in deepseek. It was inspired by ollama-deep-researcher, specifically the parsing of thinking blocks:

  # TODO: This is a hack to remove the <think> tags w/ Deepseek models 
  # It appears very challenging to prompt them out of the responses 
  while "<think>" in running_summary and "</think>" in running_summary:
      start = running_summary.find("<think>")
      end = running_summary.find("</think>") + len("</think>")
      running_summary = running_summary[:start] + running_summary[end:]

This notes that it is very hard to remove the reasoning block from prompting, but we actually want the model to reason in order to increase model performance. This implementation extracts the thinking block, so the client can still expect a proper message to be returned by ChatOllama (and use the reasoning content separately when desired).

This implementation takes the same approach as ChatDeepseek, which adds the reasoning content to chunk.additional_kwargs.reasoning_content;

  if hasattr(response.choices[0].message, "reasoning_content"):  # type: ignore
      rtn.generations[0].message.additional_kwargs["reasoning_content"] = (
          response.choices[0].message.reasoning_content  # type: ignore
      )

This should probably be handled upstream in ollama + ollama-python, but this seems like a reasonably effective solution. This is a standalone example of what is happening;

async def deepseek_message_astream(
    llm: BaseChatModel,
    messages: list[BaseMessage],
    config: RunnableConfig | None = None,
    *,
    model_target: str = "deepseek-r1",
    **kwargs: Any,
) -> AsyncIterator[BaseMessageChunk]:
    """Stream responses from Deepseek models, filtering out <think> tags.

    Args:
        llm: The language model to stream from
        messages: The messages to send to the model

    Yields:
        Filtered chunks from the model response
    """
    # check if the model is deepseek based
    if (llm.name and model_target not in llm.name) or (hasattr(llm, "model") and model_target not in llm.model):
        async for chunk in llm.astream(messages, config=config, **kwargs):
            yield chunk
        return

    # Yield with a buffer, upon completing the <think></think> tags, move them to the reasoning content and start over
    buffer = ""
    async for chunk in llm.astream(messages, config=config, **kwargs):
        # start or append
        if not buffer:
            buffer = chunk.content
        else:
            buffer += chunk.content if hasattr(chunk, "content") else chunk

        # Process buffer to remove <think> tags
        if "<think>" in buffer or "</think>" in buffer:
            if hasattr(chunk, "tool_calls") and chunk.tool_calls:
                raise NotImplementedError("tool calls during reasoning should be removed?")
            if "<think>" in chunk.content or "</think>" in chunk.content:
                continue
            chunk.additional_kwargs["reasoning_content"] = chunk.content
            chunk.content = ""
        # upon block completion, reset the buffer
        if "<think>" in buffer and "</think>" in buffer:
            buffer = ""
        yield chunk

Issue

Integrating reasoning models (e.g. deepseek-r1) into existing LangChain based workflows is hard due to the thinking blocks that are included in the message contents. To avoid this, we could match the ChatOllama integration with ChatDeepseek to return the reasoning content inside message.additional_arguments.reasoning_content instead.

Dependenices

None

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 8, 2025
Copy link

vercel bot commented Feb 8, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 8, 2025 6:35pm

@BobMerkus BobMerkus force-pushed the feat/ollama-reasoning-support branch from be00618 to 052f76c Compare February 8, 2025 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

2 participants