Help us learn about your current experience with the documentation. Take the survey.

导入/导出开发文档

为了降低引入 bug 和性能问题的风险，新添加的关系应该通过功能标志（feature flag）来控制。

导入/导出功能的一般性开发指南和技巧。

本文档最初基于 YouTube 上的 Import/Export 201 演示。

安全性

导入/导出功能在不断更新（添加新的导出内容）。然而，代码已经很长时间没有重构了。我们应该进行代码审计，确保其动态特性不会增加安全问题的数量。GitLab 团队成员可以在此保密问题中查看更多信息： https://gitlab.com/gitlab-org/gitlab/-/issues/20720。

代码中的安全性

这些类中的某些为导入/导出提供了一层安全保护。

AttributeCleaner 会移除任何被禁止的键：

# AttributeCleaner
# 移除所有 `_ids` 和其他被禁止的键
    class AttributeCleaner
      ALLOWED_REFERENCES = RelationFactory::PROJECT_REFERENCES + RelationFactory::USER_REFERENCES + ['group_id']

      def clean
        @relation_hash.reject do |key, _value|
          prohibited_key?(key) || !@relation_class.attribute_method?(key) || excluded_key?(key)
        end.except('id')
      end

      ...

AttributeConfigurationSpec 检查并确认新列的添加：

# AttributeConfigurationSpec
<<-MSG
  看起来 #{relation_class}（通过项目导入/导出功能导出）有新的属性：

  如果这些属性可以被导出，请将其添加到 SAFE_MODEL_ATTRIBUTES。

  请在 IMPORT_EXPORT_CONFIG 中将属性加入黑名单，通过将其添加到相应模型的 +excluded_attributes+ 部分。

  SAFE_MODEL_ATTRIBUTES: #{File.expand_path(safe_attributes_file)}
  IMPORT_EXPORT_CONFIG: #{Gitlab::ImportExport.config_file}
MSG

ModelConfigurationSpec 检查并确认新模型的添加：

# ModelConfigurationSpec
<<-MSG
  已添加新模型 <#{new_models.join(',')}>，与 #{parent_model_name} 相关，该模型由导入/导出功能导出。

  如果您认为此模型应包含在导出中，请将其添加到 `#{Gitlab::ImportExport.config_file}`。

  请务必将其添加到 `#{File.expand_path(ce_models_yml)}`
  以表明您已处理此错误，并防止其将来再次出现。
MSG

ExportFileSpec 检测加密或敏感的列：

# ExportFileSpec
<<-MSG
  发现新的敏感词 <#{key_found}>，它是哈希 #{parent.inspect} 的一部分
  如果您认为此信息不应被导出，请在 IMPORT_EXPORT_CONFIG 中排除该模型或属性。

  否则，请使用 #{sensitive_word} 作为键，相应的哈希或模型作为值，将异常添加到 CURRENT_SPEC 的 +safe_list+ 中。

  此外，如果属性是生成的唯一令牌，如果需要重置（防止导入到同一实例时出现重复列问题），请将其添加到 RelationFactory::TOKEN_RESET_MODELS。

  IMPORT_EXPORT_CONFIG: #{Gitlab::ImportExport.config_file}
  CURRENT_SPEC: #{__FILE__}
MSG

版本控制

导入/导出不使用严格的 SemVer，因为在单个 GitLab 发布期间经常有频繁的常量变更。当发生破坏性变更时，确实需要更新版本。

# ImportExport
module Gitlab
  module ImportExport
    extend self

    # 每次版本更新时，必须保持 import_export.md 中的历史记录是最新的。
    VERSION = '0.2.4'

兼容性

在导入和导出项目时，请检查兼容性。

何时需要提升版本

如果我们重命名模型/列或执行任何格式更改，我们需要提升 JSON 结构或归档文件文件结构中的版本修改。

在以下任何情况下，我们都不需要提升版本：

添加新列或模型
删除列或模型（除非存在数据库约束）
导出新内容（如新的上传类型）

每次我们提升版本时，集成规范都会失败，可以通过以下方式修复：

bundle exec rake gitlab:import_export:bump_version

代码快速入门

导入/导出配置 (`import_export.yml`)

主配置文件 import_export.yml 定义了可以导出/导入的模型。

要在项目导入/导出中包含的模型关系：

project_tree:
  - labels:
    - :priorities
  - milestones:
    - events:
      - :push_event_payload
  - issues:
    - events:
    # ...

仅包含指定模型的以下属性：

included_attributes:
  user:
    - :id
    - :public_email
  # ...

不要包含指定模型的以下属性：

excluded_attributes:
  project:
    - :name
    - :path
    - ...

导出时要调用的额外方法：

# Methods
methods:
  labels:
    - :type
  label:
    - :type

自定义模型关系的导出顺序：

# 为给定关系指定自定义导出重排序
# 例如对于 issues，我们使用 relative_position 进行自定义导出重排序，这样在导入时，我们可以重置
# 相对位置值，但仍保持 issues 的顺序与导出项目中的 issues 顺序一致。
# 默认情况下，关系的排序按 PK 进行。
# column - 指定用于重排序的列，默认为关系的 PK
# direction - 指定排序方向 :asc 或 :desc，默认为 :asc
# nulls_position - 指定空值的位置。因为自定义排序列可能包含空值，我们
#                  还需要指定空值的位置。可以是 :nulls_last 或 :nulls_first，默认
#                  为 :nulls_last

export_reorders:
  project:
    issues:
      column: :relative_position
      direction: :asc
      nulls_position: :nulls_last

条件导出

当关联资源来自项目外部时，您可能需要验证导出项目或组的用户是否可以访问这些关联。include_if_exportable 接受资源的关联数组。在导出期间，会调用资源上的 exportable_association? 方法，传入关联名称和用户，以验证关联资源是否可以包含在导出中。

例如：

include_if_exportable:
  project:
    issues:
      - epic_issue

此定义：

调用 issue 的 exportable_association?(:epic_issue, current_user: current_user) 方法。
如果该方法返回 true，则包含 issue 的 epic_issue 关联。

导入

导入作业状态从 none 移动到 finished 或 failed，进入不同的状态： import_status: none -> scheduled -> started -> finished/failed

当状态为 started 时，Importer 代码处理导入所需的每个步骤。

# ImportExport::Importer
module Gitlab
  module ImportExport
    class Importer
      def execute
        if import_file && check_version! && restorers.all?(&:restore) && overwrite_project
          project
        else
          raise Projects::ImportService::Error.new(@shared.errors.join(', '))
        end
      rescue => e
        raise Projects::ImportService::Error.new(e.message)
      ensure
        remove_import_file
      end

      def restorers
        [repo_restorer, wiki_restorer, project_tree, avatar_restorer,
         uploads_restorer, lfs_restorer, statistics_restorer]
      end

导出服务类似于 Importer，但它是恢复数据而不是保存数据。

导出

# ImportExport::ExportService
module Projects
  module ImportExport
    class ExportService < BaseService

      def save_all!
        if save_services
          Gitlab::ImportExport::Saver.save(project: project, shared: @shared, user: user)
          notify_success
        else
          cleanup_and_notify_error!
        end
      end

      def save_services
        [version_saver, avatar_saver, project_tree_saver, uploads_saver, repo_saver,
           wiki_repo_saver, lfs_saver].all?(&:save)
      end

测试固件

导入/导出规范中使用的固件位于 spec/fixtures/lib/gitlab/import_export。既有 Project 固件，也有 Group 固件。

每个固件都有两个版本：

一个包含所有对象的可读单个 JSON 文件，名为 project.json 或 group.json。
一个名为 tree 的文件夹，包含 ndjson 格式的文件树。除非绝对必要，否则不要手动编辑此文件夹下的文件。

从可读 JSON 文件生成 NDJSON 树的工具位于 gitlab-org/cloud-connector-team/team-tools 项目中。

项目

使用 legacy-project-json-to-ndjson.sh 生成 NDJSON 树。

NDJSON 树如下所示：

tree
├── project
│   ├── auto_devops.ndjson
│   ├── boards.ndjson
│   ├── ci_cd_settings.ndjson
│   ├── ci_pipelines.ndjson
│   ├── container_expiration_policy.ndjson
│   ├── custom_attributes.ndjson
│   ├── error_tracking_setting.ndjson
│   ├── external_pull_requests.ndjson
│   ├── issues.ndjson
│   ├── labels.ndjson
│   ├── merge_requests.ndjson
│   ├── milestones.ndjson
│   ├── pipeline_schedules.ndjson
│   ├── project_badges.ndjson
│   ├── project_feature.ndjson
│   ├── project_members.ndjson
│   ├── protected_branches.ndjson
│   ├── protected_tags.ndjson
│   ├── releases.ndjson
│   ├── services.ndjson
│   ├── snippets.ndjson
│   └── triggers.ndjson
└── project.json

组

使用 legacy-group-json-to-ndjson.rb 生成 NDJSON 树。