Hebrew ML Datasets Navigator
Verified91/100Navigate the fragmented landscape of Hebrew and Yiddish ML datasets and models. Covers ivrit.ai (22K+ hours of Hebrew audio, whisper-large-v3 ASR variants, Yiddish models), Dicta (DictaLM 3.0 LLM family, DictaBERT variants, HeQ reading comprehension), the Israeli National NLP Program / NNLP-IL (HebrewSentiment, HebNLI), AlephBERT, and Knesset Plenums. Helps researchers and ML engineers pick the right dataset for a task by use case, license (commercial vs research), Hebrew register coverage, and model-dataset pairing. Use when choosing training data for a Hebrew NLP or ASR project, verifying license compatibility for a commercial product, finding a baseline model for a Hebrew downstream task, or exploring Yiddish ML resources. Do NOT use for Arabic NLP, general HuggingFace dataset discovery, or Hebrew OCR dataset selection (use hebrew-ocr-forms).
Trust score 91/100 (Verified) · 7+ installs · 3 GitHub contributors · MIT license
The Israeli ML community punches above its weight, but the datasets and models are scattered. ivrit.ai publishes world-class Hebrew speech corpora on one HuggingFace org, Dicta publishes Hebrew LLMs and BERT variants on another, the Israeli National NLP Program maintains benchmarks under HebArabNlpProject. Licenses vary from fully commercial-friendly to research-only. A researcher trying to pick the right combination for fine-tuning a Hebrew sentiment classifier on customer support chat for a commercial product has to hunt across five orgs and read every dataset card.
npx skills-il add skills-il/developer-tools --skill hebrew-ml-datasets-navigator -a claude-codeInstall on Claude.ai, Claude Desktop, ChatGPT, Manus, or other platforms
- 1. Click "Download ZIP" to download the skill files.
- 2. Open Claude Desktop and go to Customize > Skills.
- 3. Click "+" and select "Upload a skill", then upload the ZIP file.
- 4. Start a new conversation. The skill will activate automatically when relevant.
When to Apply
- When choosing training data for a Hebrew NLP or ASR project
- When verifying license compatibility for commercial use of a dataset
- When looking for a baseline model for a specific Hebrew task
- When building a Hebrew transcription stack and need to know what ivrit.ai offers
- When researching or building something in Yiddish and need to find resources
Try These Prompts
I want to train a sentiment classifier on Hebrew customer support chat for a commercial SaaS product. Which dataset should I use, which starting model, and what does the license say about attribution?
I am building a Hebrew podcast transcription product. What does ivrit.ai offer, which ASR model should I use in production with low latency, and how do I handle multiple speakers?
I need a Hebrew LLM that runs on consumer hardware (16GB VRAM max) for a Hebrew product. What does Dicta offer, what are the size differences, and what are the upstream licenses?
I am researching Yiddish and looking for datasets and models for speech recognition and text processing. What is available in 2026 and what are the licenses?
Frequently Asked Questions
Related Skills
Best practices for programmatic video creation using HyperFrames, plain HTML compositions with GSAP animations rendered to MP4, with full Hebrew and RTL support. Covers composition authoring, data-* timing attributes, GSAP timeline contract, layout-before-animation methodology, visual identity gate, Hebrew fonts via Google Fonts auto-fetch (Heebo, Rubik, Assistant), RTL text with dir="rtl", Hebrew captions via Whisper, Hebrew voiceover via external TTS (Kokoro doesn't support Hebrew), audio-reactive visuals, scene transitions, and bidirectional text with <bdi>. Use when building HTML-based video content or Hebrew social/marketing videos without React. Do NOT use for Remotion or general React video work.
Build Zapier Zaps that connect Israeli business apps (Morning/Green Invoice, Cardcom, Tranzila, iCount, Grow by Meshulam) with global services for billing, payment, and workflow automation. Covers Hebrew text handling, ILS formatting, bimonthly VAT logic, Israel Invoice Reform 2026, and Zapier AI features (Copilot, Agents, MCP Server). Includes WhatsApp Business via third-party providers (Twilio, WATI), Zapier Tables as Google Sheets alternative, and comparison with Make.com/n8n. Do NOT use for n8n workflows (use n8n-hebrew-workflows), Make.com scenarios (use make-com-israeli-automations), or custom code automation without Zapier.
Manage JFrog Artifactory repositories, artifacts, Docker registry, build info, and Xray security scanning for DevOps workflows. Use when user asks about JFrog, Artifactory, Xray, artifact management, "deploy artifact", Docker registry with Artifactory, build promotion, vulnerability scanning with Xray, or DevOps artifact pipeline. Covers REST API operations, JFrog CLI usage, Docker registry configuration, and security scanning patterns. Do NOT use for general Docker or CI/CD questions unrelated to JFrog.
Use at your own risk. Terms of Use · Security
Want to build your own skill? Try the Skill Creator · Submit a Skill