Help us learn about your current experience with the documentation. Take the survey.

Elasticsearch 迁移故障排除

Tier: Premium, Ultimate
Offering: GitLab Self-Managed, GitLab Dedicated

在使用 Elasticsearch 迁移时，你可能会遇到以下问题。

如果 elasticsearch.log 包含错误，且重试失败的迁移无效，请联系 GitLab 支持。更多信息请参阅高级搜索迁移。

错误：`Elasticsearch::Transport::Transport::Errors::BadRequest`

如果你遇到类似的异常，请确保你拥有正确的 Elasticsearch 版本并满足系统要求。你也可以使用 sudo gitlab-rake gitlab:check 命令自动检查版本。

错误：`Faraday::TimeoutError (execution expired)`

当你使用代理时，设置一个自定义的 gitlab_rails['env'] 环境变量，名为 no_proxy，并填入你的 Elasticsearch 主机的 IP 地址。

单节点 Elasticsearch 集群状态始终无法从黄色变为绿色

对于单节点 Elasticsearch 集群，其功能集群健康状态为黄色（永远不会是绿色）。原因是主分片已分配，但无法分配副本，因为没有其他节点可供 Elasticsearch 分配副本。如果你使用的是 Amazon OpenSearch 服务，同样适用。

不建议将副本数设置为 0（在 GitLab Elasticsearch 集成菜单中不允许）。如果你计划添加更多 Elasticsearch 节点（总数超过 1 个），副本数需要设置为大于 0 的整数值。否则会导致缺乏冗余（丢失一个节点会损坏索引）。

如果你想让单节点 Elasticsearch 集群显示绿色状态，请了解风险并运行以下查询将副本数设置为 0。集群将不再尝试创建任何分片副本。

curl --request PUT localhost:9200/gitlab-production/_settings --header 'Content-Type: application/json' \
     --data '{
       "index" : {
         "number_of_replicas" : 0
       }
     }'

错误：`health check timeout: no Elasticsearch node available`

如果在索引过程中，你在 Sidekiq 中收到 health check timeout: no Elasticsearch node available 错误：

Gitlab::Elastic::Indexer::Error: time="2020-01-23T09:13:00Z" level=fatal msg="health check timeout: no Elasticsearch node available"

你可能没有在 Elasticsearch 集成菜单的 “URL” 字段值中使用 http:// 或 https://。请确保在该字段中使用 http:// 或 https://，因为我们使用的 Go 语言 Elasticsearch 客户端需要 URL 前缀才能被接受为有效。

修正 URL 格式后，删除索引并重新索引实例内容。

Elasticsearch 与某些第三方插件不兼容

某些第三方插件可能会在你的集群中引入错误，或者与集成不兼容。

如果你的 Elasticsearch 集群有第三方插件且集成无法正常工作，请尝试禁用这些插件。

Elasticsearch workers 导致 Sidekiq 过载

在某些情况下，Elasticsearch 无法再连接到 GitLab，因为：

Elasticsearch 密码仅在一端更新（出现 Unauthorized [401] ... unable to authenticate user 错误）。
防火墙或网络问题影响连接（出现 Failed to open TCP connection to <ip>:9200 错误）。

这些错误记录在 gitlab-rails/elasticsearch.log 中。要检索这些错误，请使用 jq：

$ jq --raw-output 'select(.severity == "ERROR") | [.error_class, .error_message] | @tsv' \
    gitlab-rails/elasticsearch.log |
  sort | uniq -c

Elastic workers 和 Sidekiq jobs 也可能出现得更频繁，因为 Elasticsearch 在之前的作业失败时会频繁尝试重新索引。你可以使用 fast-stats 或 jq 来计算 Sidekiq 日志中的 workers 数量：

$ fast-stats --print-fields=count,score sidekiq/current
WORKER                            COUNT   SCORE
ElasticIndexBulkCronWorker          234  123456
ElasticIndexInitialBulkCronWorker   345   12345
Some::OtherWorker                    12     123
...

$ jq '.class' sidekiq/current | sort | uniq -c | sort -nr
 234 "ElasticIndexInitialBulkCronWorker"
 345 "ElasticIndexBulkCronWorker"
  12 "Some::OtherWorker"
...

在这种情况下，过载的 GitLab 节点上的 free -m 也会显示异常高的 buff/cache 使用率。

错误：`Couldn't load task status`

当你重新索引时，可能会收到 Couldn't load task status 错误。Elasticsearch 主机上也可能出现 sliceId must be greater than 0 but was [-1] 错误。作为临时解决方案，可以考虑从头开始重新索引或升级到 GitLab 16.3。

更多信息请参阅 issue 422938。

错误：`migration has failed with NoMethodError:undefined method`

在 GitLab 15.11 中，BackfillProjectPermissionsInBlobs 迁移可能在 elasticsearch.log 中出现以下错误信息而失败：

migration has failed with NoMethodError:undefined method `<<' for nil:NilClass, no retries left

如果 BackfillProjectPermissionsInBlobs 是唯一失败的迁移，你可以升级到 GitLab 16.0 的最新补丁版本，其中包含修复。否则，你可以忽略该错误，因为它不影响高级搜索的功能。

`ElasticIndexInitialBulkCronWorker` 和 `ElasticIndexBulkCronWorker` 作业卡在去重状态

在 GitLab 16.5 及更早版本中，ElasticIndexInitialBulkCronWorker 和 ElasticIndexBulkCronWorker 作业可能会卡在去重状态。这个问题可能会阻止高级搜索正确索引文档，即使创建了新索引也是如此。在 GitLab 16.6 中，执行索引的批量 cron workers 的 idempotent! 被移除。

Sidekiq 日志中可能有以下条目：

{"severity":"INFO","time":"2023-10-31T10:33:06.998Z","retry":0,"queue":"default","version":0,"queue_namespace":"cronjob","args":[],"class":"ElasticIndexInitialBulkCronWorker",
...
"idempotency_key":"resque:gitlab:duplicate:default:<value>","duplicate-of":"91e8673347d4dc84fbad5319","job_size_bytes":2,"pid":12047,"job_status":"deduplicated","message":"ElasticIndexInitialBulkCronWorker JID-5e1af9180d6e8f991fc773c6: deduplicated: until executing","deduplication.type":"until executing"}

要解决此问题：

在 Rails 控制台会话中，运行以下命令：

idempotency_key = "<idempotency_key_from_log_entry>"
duplicate_key = "resque:gitlab:#{idempotency_key}:cookie:v2"
Gitlab::Redis::Queues.with { |c| c.del(duplicate_key) }

将 <idempotency_key_from_log_entry> 替换为你日志中的实际条目。

Elasticsearch 迁移故障排除

错误：Elasticsearch::Transport::Transport::Errors::BadRequest

错误：Faraday::TimeoutError (execution expired)

单节点 Elasticsearch 集群状态始终无法从黄色变为绿色

错误：health check timeout: no Elasticsearch node available