Tags: mudler/parakeet.cpp
Tags
ci(release): ship the C-API shared library and header (#25) (#29) The release bundles only carried parakeet-cli, so integrating via ctypes/dlopen meant compiling from source. Add a separate shared-library bundle per (platform, backend, arch) alongside each CLI bundle. The lib is built in its own build dir with PARAKEET_SHARED=ON and PARAKEET_BUILD_CLI=OFF, so the CLI bundles stay single self-contained binaries. Each lib bundle carries libparakeet.{so,dylib,dll}, the parakeet_capi.h header, LICENSE and README (plus the same cudart/cublas on CUDA). The release job already globs dist/*, so they attach with no further change. Two things that would have silently broken it: - BUILD_SHARED_LIBS=OFF builds ggml's static objects without -fPIC, so folding them into a shared lib fails to link on Linux. The shared-lib configure steps set CMAKE_POSITION_INDEPENDENT_CODE=ON. - The C-API header marks symbols with plain extern "C", no dllexport, so an MSVC DLL would export nothing. Set WINDOWS_EXPORT_ALL_SYMBOLS on the parakeet target when PARAKEET_SHARED is on. No effect on Linux/macOS. Verified on Linux CPU: a self-contained libparakeet.so depending only on standard system libs, with the parakeet_capi_* symbols exported. Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fix(server): build the model fetcher on Windows (MSVC) (#28) model_fetch.cpp included POSIX-only <sys/wait.h>/<unistd.h> and used fork/execvp/waitpid, so the Windows release build failed with 'Cannot open include file: sys/wait.h'. Add a Windows path using _spawnvp (_P_WAIT), which runs curl/wget directly without a shell, keeping the no-shell-injection property of the POSIX exec path. have_tool now uses 'where' on Windows and 'command -v' on POSIX. Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
OpenAI-compatible HTTP server + docker image (#8) * feat(server): OpenAI-compatible transcription server example Squashed rebase of the worktree-openai-server branch (PR #8) onto current master. Adds examples/server (httplib-based POST /v1/audio/transcriptions), model resolver/fetcher, OpenAI response formatter, and unit + e2e tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(docker): publish a dedicated parakeet-server image Split the Dockerfile into a shared build stage plus two runtime targets: runtime (cli, default, unchanged) and runtime-server (entrypoint parakeet-server --host 0.0.0.0, EXPOSE 8080, curl for alias fetch). The docker workflow now builds and publishes both ghcr images, cli and server, for each (variant, arch); the server build reuses the cli build stage so ggml compiles once per job. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: document the OpenAI server and docker images; point to LocalAI for production Add an OpenAI-compatible server section to the main README (build, curl and OpenAI-client usage) and extend the Docker section to cover the new parakeet.cpp-server image alongside the cli. Note LocalAI as the production path in both the main README and examples/server/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci: add server-e2e job driving the real parakeet-server over HTTP Builds parakeet-server and runs tests/server_e2e.sh (PARAKEET_SERVER_E2E=1): fetches the ~125 MB tdt_ctc-110m-q4_k model by alias, starts the server, and hits POST /v1/audio/transcriptions with a real WAV in json/text/verbose_json (plus word timestamps), checking the transcription and the 400 paths. Runs on pull_request and workflow_dispatch, like closed-loop; no NeMo/Python venv. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ci: pre-built release binaries for linux, macos and windows (#22) * ci: pre-built release binaries for linux, macos and windows (#21) Adds a release workflow that builds self-contained parakeet-cli bundles for every v* tag: linux x64 (cpu, vulkan, cuda) and arm64 (cpu), macos arm64 (metal) and x64 (cpu), windows x64 (cpu, vulkan, cuda) plus a separate cudart runtime zip. Assets attach to the GitHub release for the tag, creating a draft release when none exists yet. Fixes #21 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs: point the README at the pre-built release bundles Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * ci: capture the usage banner before grepping in the smoke tests parakeet-cli exits 2 when invoked bare; under the runner's bash -e -o pipefail that exit code fails the pipeline even though grep matched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * ci: drop the temporary branch trigger used for matrix validation Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * ci: let ggml pick the CUDA architectures, like llama.cpp releases Dropping the hand-rolled CMAKE_CUDA_ARCHITECTURES lists lets ggml's curated non-native default apply: PTX for the datacenter generations (75, 80, 90), real code for the common consumer cards (86, 89, 120a), and 121a-real for GB10 on CUDA 13. Smaller binaries, faster builds, and the list stays current with submodule bumps. Temporarily re-adds the branch trigger to validate the CUDA builds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
ci: build and publish the parakeet-cli container image to ghcr (#7) * ci: build and publish the parakeet-cli container image to ghcr Add a multi-stage Dockerfile and a docker workflow that builds the parakeet-cli image and pushes it to ghcr.io/<owner>/parakeet.cpp-cli. - Dockerfile: fat build stage compiles parakeet-cli plus the ggml backends, a slim runtime stage carries only the binary and the ggml .so files. One Dockerfile, CPU and CUDA variants selected via BUILD_BASE / RUNTIME_BASE / CMAKE_EXTRA_ARGS build args. GGML_NATIVE=OFF so the image is portable across x86-64 hosts. The ggml submodule is re-inited as a throwaway git repo in the build stage so the CMake-driven patch step (git apply) works regardless of how the submodule arrived in the context. - docker.yml: matrix over cpu/cuda, builds on every push/PR (build-only gate on PRs), pushes to ghcr on master + tags + dispatch. Tags via metadata-action: latest / sha / vX.Y.Z, with a -cuda suffix for the CUDA variant. Uses GITHUB_TOKEN, gha build cache. - .dockerignore keeps the context small (excludes .git, build dirs, models, benchmark media) while keeping the ggml source. - README: Docker section with CPU and CUDA run examples. Verified the CPU image end to end: builds at 127 MB, parakeet-cli runs, and transcribing tests/fixtures/speech.wav with a mounted q5_k 110m model yields the exact NeMo reference transcript. Assisted-by: Claude:claude-opus-4-8 [Claude Code] * ci(docker): build multi-arch (amd64 + arm64) images for both variants Publish each variant (cpu, cuda) as a multi-arch manifest covering linux/amd64 and linux/arm64. The arm64 CUDA image runs natively on Grace / GB10-class hosts. Every arch is built natively, no QEMU: amd64 on ubuntu-24.04, arm64 on the ubuntu-24.04-arm hosted runner (free for public repos). Emulated nvcc builds would be far too slow. The per-arch images are pushed by digest and a merge job stitches them into one manifest per variant, tagged via metadata-action. Verified the arm64 CPU image builds and runs (aarch64) under emulation locally, and confirmed the ubuntu and nvidia/cuda base images all ship arm64. Assisted-by: Claude:claude-opus-4-8 [Claude Code] * ci(docker): build CUDA images on CUDA 13 for Blackwell / GB10 (DGX Spark) CUDA 12.6 tops out at sm_90, so the CUDA images would not run on GB10 / Grace-Blackwell. The vendored ggml's CUDA CMake adds 120a-real at CUDA >= 12.8 and 121a-real (GB10 / DGX Spark / Thor) at CUDA >= 12.9, all under our GGML_NATIVE=OFF default. Bumping both arches to nvidia/cuda:13.0.1 therefore compiles Turing through Blackwell with no manual arch list: amd64 picks up Hopper / Ada / RTX 50, arm64 picks up GH200 (sm_90 PTX) and GB10 (sm_121). Assisted-by: Claude:claude-opus-4-8 [Claude Code] * ci(docker): fix CUDA link (GGML_CUDA_NO_VMM), trim arm64 archs, CPU-only PR gate The CUDA builds failed at link: libggml-cuda.so had undefined references to the CUDA driver API (cuMemCreate, cuMemMap, cuDeviceGet, ...). Those come from ggml's VMM memory pool, which links libcuda -- a lib a GPU-less build container does not have. Build with -DGGML_CUDA_NO_VMM=ON: every cuMem* call is under #if defined(GGML_USE_VMM), which this flag disables, so the symbols and the libcuda link dependency both go away. Verified locally: the amd64 CUDA image now links clean, ships libggml-cuda.so, and resolves libcudart / libcublas from the CUDA 13 runtime base. Also cut build time, which had blown out to 43 min on the arm64 CUDA job: - arm64 CUDA targets only Grace GPUs now (CUDA_ARCHS=90;121-real -> GH200 + GB10/Spark) instead of ggml's full 7-arch list. Added a dedicated quoted CUDA_ARCHS build-arg so the ';' list separator survives the shell (the unquoted CMAKE_EXTRA_ARGS would split it as a command separator). - pull_request now builds the CPU variant only (fast Dockerfile gate) via a dynamic matrix from a setup job. CUDA builds only on push / tag / dispatch, which also publish. Use workflow_dispatch to exercise CUDA before merging. Assisted-by: Claude:claude-opus-4-8 [Claude Code]
parakeet.cpp: C++/ggml port of NVIDIA NeMo Parakeet ASR Self-contained snapshot. FastConformer TDT / CTC / RNNT / hybrid models with a log-mel front-end, CPU and GPU (CUDA / HIP / Vulkan / Metal) ggml graphs, quantization (f16, q8_0, q6_k, q5_k, q4_k), a CLI, and a flat C-API (include/parakeet_capi.h) consumed by the LocalAI parakeet-cpp backend. Includes the NeMo parity suite and HF publishing tooling (scripts/publish_hf.py -> mudler/parakeet-cpp-gguf). Assisted-by: Claude:claude-opus-4-8 [Claude Code]
parakeet.cpp: C++/ggml port of NVIDIA NeMo Parakeet ASR Self-contained snapshot. FastConformer TDT / CTC / RNNT / hybrid models with a log-mel front-end, CPU and GPU (CUDA / HIP / Vulkan / Metal) ggml graphs, quantization (f16, q8_0, q6_k, q5_k, q4_k), a CLI, and a flat C-API (include/parakeet_capi.h) consumed by the LocalAI parakeet-cpp backend. Includes the NeMo parity suite and HF publishing tooling (scripts/publish_hf.py -> mudler/parakeet-cpp-gguf). Assisted-by: Claude:claude-opus-4-8 [Claude Code]
parakeet.cpp: C++/ggml port of NVIDIA NeMo Parakeet ASR Self-contained snapshot. FastConformer TDT / CTC / RNNT / hybrid models with a log-mel front-end, CPU and GPU (CUDA / HIP / Vulkan / Metal) ggml graphs, quantization (f16, q8_0, q6_k, q5_k, q4_k), a CLI, and a flat C-API (include/parakeet_capi.h) consumed by the LocalAI parakeet-cpp backend. Includes the NeMo parity suite and HF publishing tooling (scripts/publish_hf.py -> mudler/parakeet-cpp-gguf). Assisted-by: Claude:claude-opus-4-8 [Claude Code]