Skip to content

Unknown buffer disk limitation #19759

Description

@m1cha3lf

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Hi,

We have multiple vector instances which are sending thier data to another central vector instance where I do some transformation. Basically N small vector instances are sending their requests to a big instance.

I have configured a large disk buffer to ensure no data is lost. Everything was working fine until 19:42 where everything stopped out of the sudden. The process itself was still up but nothing was processed anymore.

Vector logs were showing errors like that

Jan 31 19:42:13 vector-s3writer-bm vector[88628]: 2024-01-31T19:42:13.252148Z ERROR source{component_kind="source" component_id=requests10 component_type=vector}:grpc-request{grpc_service="vector.Vector" grpc_method="PushEvents"}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=2798 reason="Source send cancelled." internal_log_rate_limit=true

$ ls -lah /var/lib/vector/buffer/v2/requests-vector-internal/
total 1.3G
drwxr-xr-x 2 vector vector 4.0K Jan 31 19:41 .
drwxr-xr-x 9 vector vector 4.0K Jan 30 15:19 ..
-rw-r----- 1 vector vector 127M Jan 30 15:19 buffer-data-0.dat
-rw-r----- 1 vector vector 126M Jan 30 15:19 buffer-data-1.dat
-rw-r----- 1 vector vector 127M Jan 30 15:19 buffer-data-10.dat
-rw-r----- 1 vector vector 126M Jan 30 15:19 buffer-data-12.dat
-rw-r----- 1 vector vector 128M Jan 30 15:19 buffer-data-2.dat
-rw-r----- 1 vector vector 127M Jan 30 15:19 buffer-data-20.dat
-rw-r----- 1 vector vector 125M Jan 30 15:19 buffer-data-23.dat
-rw-r----- 1 vector vector 126M Jan 30 15:19 buffer-data-3.dat
-rw-r----- 1 vector vector 128M Jan 30 15:19 buffer-data-4.dat
-rw-r----- 1 vector vector 127M Jan 31 19:41 buffer-data-65534.dat
-rw-r----- 1 vector vector 24 Jan 31 19:52 buffer.db
-rw-r--r-- 1 vector vector 0 Jan 31 19:52 buffer.lock

The time vector stopped working is exact the time it would reach the file number 65535 which sounds like reaching a configured limit.

A restart from vector didn't solve the issue. I had to delete the buffer files and start again to bring it back.

I hope that helps to find the root cause.

Regards,

Michael

Configuration

requests-vector-internal:
    type: vector
    compression: true
    healthcheck: false
    request:
      retry_attempts: 100
      timeout_secs: 60
      retry_max_duration_secs: 10
    batch:
      max_bytes: 10000000
      max_events: 30000
      timeout_secs: 10
    buffer:
      type: "disk"
      max_size: 159684354880
      when_full: "drop_newest"
    inputs:
      - requests10
    address: "127.0.0.1:9011"

Version

$ vector --version vector 0.34.2 (x86_64-unknown-linux-gnu d685a16 2024-01-02 14:59:54.890517436)

Debug Output

No response

Example Data

No response

Additional Context

I assume the reason is this
pub fn get_data_file_path(&self, file_id: u16) -> PathBuf {

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    domain: buffersAnything related to Vector's memory/disk buffers

    Type

    Fields

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions