NVIDIA cuDF: GPU-Accelerated Structured Data Processing

NVIDIA cuDF is an open source NVIDIA CUDA-X™ data processing toolkit for structured data that delivers massive speedups and cost savings for data engines and libraries. Built on highly optimized NVIDIA® CUDA® primitives, cuDF taps into GPU parallelism and memory bandwidth to accelerate data processing and analytics workflows.

Documentation

How cuDF Works

cuDF provides components to GPU-accelerate query engines, including I/O and SQL operations—like joins, aggregations, sorting, and shuffles. Built on Apache Arrow’s columnar memory format, cuDF libraries dispatch highly parallel kernels across thousands of GPU cores simultaneously. Memory management tools optimize costly memory transfers between CPU and GPU.

See which data engines use cuDF today.

Architectural diagram illustrating the query execution flow from a Structured Data Engine into the core subsystems of cuDF, which processes operations via a primary GPU path or routes them through a CPU Fallback mechanism. — *Figure 1: Query execution flow from a data engine into the core components of cuDF.*

User Experience With cuDF

When cuDF is used to accelerate data engines, the user experiences GPU execution with the data engine, while the data engine helps route operations without GPU support to the CPU to ensure the user’s workflow won’t be interrupted.

Figure 2: Example decision logic, where a query passing through the query engine API and execution engine layer is evaluated to run on the GPU if the operation and datatype are supported, or otherwise defaults to a CPU fallback before returning the final result.

Key Features

Maximize Performance With NVIDIA GPUs

cuDF maximizes performance of gigabyte- to petabyte-scale workloads by optimizing core SQL and DataFrame operations with low-level CUDA primitives that fully leverage the parallelism and memory bandwidth of NVIDIA GPUs.

Built for Latency Sensitive Workloads

With faster time to result, cuDF unblocks latency-sensitive workloads like interactive analytics and agentic AI querying to enable the next generation of data analytics.

Reduce Infrastructure Costs

By reducing runtime of data operations, workloads can process the same data volumes on far fewer nodes, cutting infrastructure costs and reducing the data center footprint.

Minimize Data Movement

Built on the Apache Arrow format, cuDF utilizes highly efficient columnar data structures and zero-copy interfaces with other accelerated libraries, minimizing data movement overhead.

Out-of-Core Scalability

With NVIDIA’s memory tools and primitives, cuDF helps accelerated engines process datasets and memory-intensive operations like joins and groupbys that exceed GPU memory.

Install and Deploy in Your Environment

To get a sense of what comes with cuDF, download the package using your preferred install method.
The package includes Python and C++ interfaces as well as zero-code-change accelerators.

Quick Install

Deployment Guides

Integrate cuDF directly into your environment. Follow these steps to get started.

Install cuDF

Quick Install With conda

1. If not installed, download and run the install script. This will install the latest miniforge:

wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh

2. Then install with:

conda create -n rapids-26.04 -c rapidsai -c conda-forge  \
    cudf=26.04 python=3.14 'cuda-version>=13.0,<=13.1'

Quick Install With pip

pip install \
    "cudf-cu13==26.4.*"

Deploy Locally

Use this guide to install and build with conda, pip, Docker, or WSL2 on your local machine.

Read the Local Deployment Guide

Deploy on Platforms

Deploy CUDA-X Data Science libraries on your platform of choice, including Kubernetes, Databricks, and Google Colab.

Read the Platforms Guide

Deploy in the Cloud

Run CUDA-X Data Science libraries in AWS, Azure, GCP, and more.

Read the Cloud Deployment Guide

See the complete install selector for Docker, WSL2, and individual libraries.

Install Selector

Try Accelerated Data Engines and Tools

Tool	Ecosystem Plug-Ins	Get Started
Velox	Velox on GPU (experimental)	GitHub
Apache Spark	cuDF plugin for Apache Spark Drop-In Extension: spark.conf.set('spark.rapids.sql.enabled','true')	NVIDIA Docs
Presto	Presto-GPU	Benchmark Presto With GPU
Polars	Polars GPU Engine Drop-In Extension: .collect(engine="gpu")	Polars Docs
DuckDB	SiriusDB Drop-In Extension: LOAD 'sirius.duckdb_extension';	GitHub
Pandas	Cudf.pandas Drop-In Extension: %load_ext cudf.pandas import pandas as pd ...	Pandas Docs

Starter Kits

Starter Kit: Build Data Engines With cuDF

Learn about cuDF’s application in large-scale data processing workloads.

On-Demand GTC Session: The Era of GPU Data Processing: From SQL to Search and Back Again
On-Demand GTC Session: Shatter the Memory Wall: Composable Building Blocks for Massive Analytics
GitHub: GPU Query Execution Blueprint

Starter Kit: Build Data Engines With Velox on GPUs up to 6x Faster

Learn how Velox, a C++ execution engine, on GPUs accelerates data engine execution for terabyte-sized workloads.

Conference Talk: Bringing GPU-Acceleration to Presto With Velox
Blog: Bringing GPU Execution to Velox
GitHub: Velox on GPU Backend (Experimental)

Starter Kit: Run Apache Spark Workloads 5x Faster with 10x Cost Savings

Learn how GPUs accelerate enterprise-scale Apache Spark workflows to drive cost savings.

On-Demand GTC Workshop: Transform Enterprise and Edge Infra With NVIDIA RTX PRO™ 4500 Blackwell Server Edition
On-Demand GTC Workshop: Automate and Simplify Apache Spark Workload Migration From CPU to GPU
On-Demand GTC Session: Accelerate Big Data Analytics on GPUs With the NVIDIA RAPIDS™ Accelerator for Apache Spark (01:27:34)
Blog: Predicting Performance on Apache Spark With GPUs
User Guide: RAPIDS Accelerator for Apache Spark

Starter Kit: Run Presto on GPUs up to 6x Faster With 5x Cost Savings

The kit describes how interactive analytics engine Presto leverages Velox execution on cuDF to accelerate analytics execution.

Video: IBM Reinvents Data Processing With NVIDIA
On-Demand GTC Session: Unlock Fast, Cost-Effective Interactive Analytics on Massive Lakehouses
Blog: Presto's Performance With Velox-cuDF
GitHub: Presto Benchmarking in Velox Testing

Starter Kit: Run Polars GPU Engine for up to 10x Speed Up

This kit demonstrates how the Polars GPU engine can process 100 million rows in under two seconds.

Video: Processing 100M Rows of Data in Under Two Seconds With the Polars GPU Engine (00:28)
Notebook: Intro to the Polars GPU Engine

Starter Kit: Use SiriusDB to Accelerate DuckDB by up to 8x

Learn how SiriusDB, #1 on ClickBench, cost-effectively accelerates DuckDB workloads.

On-Demand GTC Session: Achieving 8x Lower Cost Analytics With GPU-Accelerated DuckDB
Blog: Read About SiriusDB's Clickbench Record
GitHub: SiriusDB
Academic Paper: Rethinking Analytical Processing in the GPU Era

Starter Kit: Accelerate Pandas by up to 50x

This kit gets you started with accelerating pandas on Google Colab.

Blog: NVIDIA cuDF accelerates pandas up to 50x on Google Colab
Notebook: 10 Minutes to cudf.pandas

Join the Community

Join NVIDIA CUDA-X Data Science Libraries Slack Community

Join Our Community on Slack

Sign Up for the Data Science Newsletter

Ethical AI

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting team to ensure their application meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Get started with NVIDIA cuDF today.

Documentation