Skip to content

Conversation

@ianton-ru
Copy link

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

DataLakeCatalog namespace filter

Documentation entry for user-facing changes

New setting namespaces for DataLakeCatalog with comma-separated list of namespaces.
Supports rest, glue and unity types.

namepsaces='foo,bar'

rest type can have nested namespaces, supports next rules:

  • foo - tables from namespace foo, but not from nested namespaces
  • foo.bar - tables from nested namespace, but not from base namespace
  • foo.* - tables from al; nested namespaces, but not from base namespace
    When tables from both namespaces (base and nested) are required, need to use both: foo,foo.*

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@ianton-ru
Copy link
Author

@codex review

@github-actions
Copy link

github-actions bot commented Jan 23, 2026

Workflow [PR], commit [7891aa4]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8b92a48e5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 529 to +533
DB::Names RestCatalog::getTables(const std::string & base_namespace, size_t limit) const
{
if (!allowed_namespaces.isNamespaceAllowed(base_namespace, /*nested*/ false))
throw DB::Exception(DB::ErrorCodes::OUT_OF_SCOPE,
"Namespace {} is filtered by `namespaces` database parameter", base_namespace);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip filtered namespaces when probing emptiness

This new OUT_OF_SCOPE throw means RestCatalog::getTables() now fails for any namespace excluded by the namespaces filter. RestCatalog::empty() still iterates all namespaces returned by the catalog and calls getTables() in its stop condition without checking the filter, so any filtered namespace will now throw and bubble up. That makes DatabaseDataLake::empty() (used by DatabaseCatalog::detachDatabase when dropping/detaching DBs) fail whenever the catalog contains namespaces outside the filter (e.g., namespaces='dev' but prod exists). Consider skipping filtered namespaces in empty() or swallowing OUT_OF_SCOPE there.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

DataLakeStorageSettingsPtr settings_,
std::optional<std::string> catalog_namespaces_ = std::nullopt)
: settings(settings_)
, catalog_namespaces(catalog_namespaces_ ? *catalog_namespaces_ : std::string("*")) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider std::optional::value_or()

{
return allowed_namespaces.contains("*") || allowed_namespaces.contains(namespace_);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but may be make this method virtual and move this implementation to ICatalog to avoid duplication?
Feel free to ignore if you've considered this option and decided that it is better to keep it simple.

glue_client = std::make_unique<Aws::Glue::GlueClient>(chain, endpoint_provider, client_configuration);
}

boost::split(allowed_namespaces, settings.namespaces, [](char c){ return c == ','; });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider boost::is_any_of instead of lambda.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, am I right that 'aaa, bbb' would not work because of space after comma?
Is it ok?
To fix it one can
...split ... is_any_of(", "), boost::token_compress_on)

M(754, UDF_EXECUTION_FAILED) \
M(755, TOO_LARGE_LIGHTWEIGHT_UPDATES) \
M(756, CANNOT_PARSE_PROMQL_QUERY) \
M(757, OUT_OF_SCOPE) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, though may be it is better to either use existing error code (e.g. DATALAKE_DATABASE_ERROR) or make up a more specific name, e.g. CATALOG_NAMESPACE_DISABLED ?

| `aws_access_key_id` | AWS access key ID for S3/Glue access (if not using vended credentials) |
| `aws_secret_access_key` | AWS secret access key for S3/Glue access (if not using vended credentials) |
| `region` | AWS region for the service (e.g., `us-east-1`) |
| `namespaces` | Comma-separated list of namespaces, supported types: `rest`, `glue` and `unity` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought that rest, glue and unity are supported namespaces. Probably target audience of this feature would not have this problem, but may be '... implemented for catalog types: ..'

boost::split(list_of_nested_namespaces, ns, [](char c){ return c == '.'; });

size_t len = list_of_nested_namespaces.size();
if (!len)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I understand what it actually means.
That 'ns' is an empty string? Is it possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants