Quadratic complexity bug in SQLite query makes @Codebase unusably slow on large repos #4255

JubilantJerry · 2025-02-20T09:07:50Z

Before submitting your bug report

I believe this is a bug. I'll try to join the Continue Discord for questions
I'm not able to find an open issue that reports the same bug
I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS: Ubuntu 24.04
- Continue version: 0.9.269
- IDE version: VSCode 1.97.2
- Model: Qwen 2.5 7b Instruct
- config:
  
{
  "models": [
    {
      "title": "Qwen 2.5 72b Instruct",
      "model": "qwen2.5:72b-instruct-q8_0",
      "provider": "ollama",
      "apiBase": "https://api.ollama.companyname.com",
      "contextLength": 16384
    },
    {
      "title": "Qwen 2.5 7b Instruct",
      "model": "qwen2.5:7b-instruct-q8_0",
      "provider": "ollama",
      "apiBase": "https://api.ollama.companyname.com",
      "contextLength": 16384
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen2.5-Coder 1.5B",
    "provider": "ollama",
    "model": "qwen2.5-coder:1.5b-base",
    "apiBase": "https://api.ollama.companyname.com"
  },
  "embeddingsProvider": {
    "title": "Nomic Embed Text",
    "provider": "ollama",
    "model": "nomic-embed-text",
    "apiBase": "https://api.ollama.companyname.com"
  },
  "reranker": {
    "name": "huggingface-tei",
    "params": {
      "apiBase": "http://192.168.187.57:11435",
      "truncate": true,
      "truncation_direction": "Right"
    }
  },
  "contextProviders": [
    {
      "name": "code",
      "params": {}
    },
    {
      "name": "docs",
      "params": {}
    },
    {
      "name": "diff",
      "params": {}
    },
    {
      "name": "terminal",
      "params": {}
    },
    {
      "name": "problems",
      "params": {}
    },
    {
      "name": "folder",
      "params": {}
    },
    {
      "name": "codebase",
      "params": {}
    }
  ],
  "slashCommands": [
    {
      "name": "share",
      "description": "Export the current chat session to markdown"
    },
    {
      "name": "cmd",
      "description": "Generate a shell command"
    },
    {
      "name": "commit",
      "description": "Generate a git commit message"
    }
  ]
}

Description

If a large repo, such as huggingface/transformers, is indexed, then all @Codebase queries will be very, very slow. For almost the full duration, there are no LLM calls, no embeddings being created, and no reranking being done. There will only be a single core running at 100% at the client end.

This happens as long as any large repo has been indexed in the past from any directory. Even if I open a small project, all @Codebase queries on the new project will still be slow.

I've identified the problem as a nested loop join in the following SQL query used for FTS, used in FullTextSearchCodebaseIndex.ts inside retrieve/buildRetrieveQuery:

SELECT fts_metadata.chunkId, fts_metadata.path, fts.content, rank
FROM fts
JOIN fts_metadata ON fts.rowid = fts_metadata.id
JOIN chunk_tags ON fts_metadata.chunkId = chunk_tags.chunkId
WHERE fts MATCH ?
AND chunk_tags.tag IN (?)
ORDER BY bm25(fts, 10)
LIMIT ?;

The query plan appears as:

SCAN fts VIRTUAL TABLE INDEX 0:M2
SEARCH fts_metadata USING INTEGER PRIMARY KEY (rowid=?)
SCAN chunk_tags
USE TEMP B-TREE FOR ORDER BY

This means for every row of fts, the database engine will loop over all rows of chunk_tags. Both tables contain a row for every chunk across all projects indexed, so the time complexity of the query is quadratic in the number of chunks.

I was able to fix the problem by marking the chunkId column as UNIQUE in the chunk_tags table. However, I'm not sure if it's possible for the same chunkId to appear more than once in the table. Marking the tag and chunkId columns as UNIQUE when put together would also solve the quadratic time complexity issue, making the @Codebase queries fast again.

To reproduce

You may want to set up a local embedding model, or disable embedding somehow, to avoid incurring a high API fee.

First, clone https://github.com/huggingface/transformers
Then, open it in VSCode and wait for the whole repository to be indexed (this took about 30 minutes for me).

Now, try any query that uses @Codebase in the chat mode. It will take about 20 minutes to finally get an answer. This long computation happens every single time @Codebase is used.

Log output

No logs appear relevant to this issue in particular.

The text was updated successfully, but these errors were encountered:

sestinj assigned Patrick-Erichsen Feb 20, 2025

github-actions bot added the "needs-triage" label Feb 20, 2025

dosubot bot added area:indexing Relates to embedding and indexing ide:vscode Relates specifically to VS Code extension kind:bug Indicates an unexpected problem or unintended behavior priority:high Indicates high priority labels Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quadratic complexity bug in SQLite query makes @Codebase unusably slow on large repos #4255

Quadratic complexity bug in SQLite query makes @Codebase unusably slow on large repos #4255

JubilantJerry commented Feb 20, 2025 •

edited

Loading

Quadratic complexity bug in SQLite query makes @Codebase unusably slow on large repos #4255

Quadratic complexity bug in SQLite query makes @Codebase unusably slow on large repos #4255

Comments

JubilantJerry commented Feb 20, 2025 • edited Loading

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

JubilantJerry commented Feb 20, 2025 •

edited

Loading