Skip to content

Make task and asset store row size limits configurable#68133

Merged
amoghrajesh merged 7 commits into
apache:mainfrom
astronomer:aip-103-backlog-max-length-cap
Jun 15, 2026
Merged

Make task and asset store row size limits configurable#68133
amoghrajesh merged 7 commits into
apache:mainfrom
astronomer:aip-103-backlog-max-length-cap

Conversation

@amoghrajesh

@amoghrajesh amoghrajesh commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

What's being solved?

The task and asset store had a hardcoded 64 KB size limit that in core API datamodels that could not be adjusted without code changes. Users with legitimate large-payload use case had no way to raise it, and the limit wasn't enforced consistently too, the core API validated it but the execution API (worker path) did not.

Current behaviour

  • Core API rejected values over 64 KB with a hardcoded limit.
  • Execution API (worker writes) had no size check at all.
  • No config key existed to adjust the limit.

Proposed change

Adds [state_store] max_value_storage_bytes (default 65535) to control the limit:

  • Core API (task and asset store): rejects oversized values with HTTP 400. Set to 0 to disable the limit entirely. DB column limits then apply (~1 GB on PostgreSQL, 16 MB on MySQL).
  • Task SDK (task_store.set() and asset_store.set()): logs a warning in task logs when the serialized value exceeds the limit, but allows the write through to not interrupt the execution mid run. The warning suggests configuring a custom [state_store] backend for large payloads.

Testing

Running example:

Task Store:

    @task
    def my_task(**context: Context):
        task_state = context["task_store"]
        value = "x" * 700000
        task_state.set("str_value", value)
image

Asset store:

with DAG(
    dag_id="aip103_asset_producer",
    start_date=pendulum.datetime(2026, 1, 1, tz="UTC"),
    schedule=None,
    catchup=False,
    tags=["aip-103", "asset-state-test"],
):

    @task(outlets=[aip103_test_asset])
    def produce(**context):
        print(f"Producer running for {aip103_test_asset.uri!r}")
        w = "w" * 700000
        context["asset_store"].set("my_asset_value_watermark", w)

    produce()
image
  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@amoghrajesh amoghrajesh requested a review from kaxil June 7, 2026 06:36
@amoghrajesh amoghrajesh self-assigned this Jun 8, 2026
@amoghrajesh

Copy link
Copy Markdown
Contributor Author

@kaxil appreciate another round of review from you here? WDYT?

@amoghrajesh amoghrajesh requested a review from jroachgolf84 June 10, 2026 05:58
@amoghrajesh amoghrajesh force-pushed the aip-103-backlog-max-length-cap branch from df0fadf to b0692bc Compare June 10, 2026 11:35
Comment thread airflow-core/src/airflow/config_templates/config.yml Outdated
Comment thread airflow-core/src/airflow/api_fastapi/core_api/datamodels/task_store.py Outdated
Comment thread task-sdk/src/airflow/sdk/execution_time/context.py Outdated
Comment thread airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_task_store.py Outdated
Comment thread airflow-core/src/airflow/config_templates/config.yml
@amoghrajesh amoghrajesh requested a review from kaxil June 12, 2026 04:10

@jason810496 jason810496 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM overall.

Comment thread task-sdk/src/airflow/sdk/execution_time/context.py Outdated
Comment thread task-sdk/src/airflow/sdk/execution_time/context.py Outdated
Co-authored-by: Jason(Zhe-You) Liu <68415893+jason810496@users.noreply.github.com>
@amoghrajesh amoghrajesh merged commit 0e72905 into apache:main Jun 15, 2026
@amoghrajesh amoghrajesh deleted the aip-103-backlog-max-length-cap branch June 15, 2026 08:57
@amoghrajesh amoghrajesh added this to the Airflow 3.3.0 milestone Jun 15, 2026
pgagnon pushed a commit to pgagnon/airflow that referenced this pull request Jun 15, 2026
imrichardwu pushed a commit to imrichardwu/airflow that referenced this pull request Jun 16, 2026
dingo4dev pushed a commit to dingo4dev/airflow that referenced this pull request Jun 16, 2026
RulerChen pushed a commit to RulerChen/airflow that referenced this pull request Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

5 participants