Skip to content

feat(databricks_zerobus sink): add Arrow IPC compression option#25586

Open
flaviofcruz wants to merge 2 commits into
vectordotdev:masterfrom
flaviofcruz:add-compression-zerobus
Open

feat(databricks_zerobus sink): add Arrow IPC compression option#25586
flaviofcruz wants to merge 2 commits into
vectordotdev:masterfrom
flaviofcruz:add-compression-zerobus

Conversation

@flaviofcruz
Copy link
Copy Markdown
Contributor

@flaviofcruz flaviofcruz commented Jun 5, 2026

Summary

Adds an optional Arrow IPC compression setting to the databricks_zerobus sink. Users can now compress Arrow Flight payloads before they are sent to the Zerobus ingestion service by setting stream_options.compression to lz4_frame or zstd. When the option is omitted, batches are sent uncompressed (the SDK default), so behavior is unchanged for existing configurations.

This wires through the ipc_compression option that the databricks-zerobus-ingest-sdk Arrow stream builder already exposes.

Vector configuration

Vector configuration

  sinks:
    my_zerobus_sink:
      type: databricks_zerobus
      ingestion_endpoint: https://ingest.dev.databricks.com
      unity_catalog_endpoint: https://your-workspace.cloud.databricks.com
      table_name: main.default.vector_logs
      auth:
        strategy: oauth
        client_id: ${DATABRICKS_CLIENT_ID}
        client_secret: ${DATABRICKS_CLIENT_SECRET}
      stream_options:
        compression: zstd   # or lz4_frame; omit for no compression

How did you test this PR?

Unit tests and a smoke test to ensure we can ingest compressed data.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

Expose an optional `stream_options.compression` setting on the
databricks_zerobus sink that enables Arrow IPC compression of Arrow
Flight payloads. Accepts `lz4_frame` or `zstd`; defaults to no
compression (the SDK default).

The codec maps to the SDK's `ipc_compression` Arrow stream builder
option, applied only when the user sets it so the uncompressed default
is preserved.

Co-authored-by: Isaac
@github-actions github-actions Bot added docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: sinks Anything related to the Vector's sinks domain: external docs Anything related to Vector's external, public documentation and removed docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. labels Jun 5, 2026
@flaviofcruz flaviofcruz marked this pull request as ready for review June 5, 2026 18:57
@flaviofcruz flaviofcruz requested review from a team as code owners June 5, 2026 18:57
@github-actions github-actions Bot added the docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. label Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant