What is the difference between ivrit.ai and Dicta?

ivrit.ai focuses on Hebrew speech: large audio corpora, Whisper-based ASR models, diarization. Dicta focuses on Hebrew NLP for text: LLMs (DictaLM 3.0 family), BERTs (DictaBERT family), benchmarks. Both are leading Israeli organizations, complementary, not competitors.

Can I use ivrit.ai data for a commercial product?

Yes. ivrit.ai explicitly licenses its resources to allow commercial use. That is their stated mission: to enable commercial support for Hebrew AI. Always confirm on the specific dataset card and plan attribution.

Why is HeQ on HuggingFace under pig4431, not NNLP-IL?

pig4431/HeQ_v1 is a community-maintained HuggingFace mirror. The canonical source is NNLP-IL/Hebrew-Question-Answering-Dataset on GitHub. Use the HuggingFace ID for loading but cite Cohen et al. EMNLP 2023 in publications.

Do DictaLM 24B and DictaLM 12B share the same license?

No. DictaLM-3.0-24B derives from Mistral-Small-3.1 (Mistral license), and DictaLM-3.0-Nemotron-12B derives from NVIDIA Nemotron Nano V2 (NVIDIA license). Plus Dicta has its own license on the derived work. Read both before commercial use.

Can I train a Hebrew model on Yiddish data?

Not without explicit cross-lingual transfer planning. Yiddish and Hebrew share an alphabet but are different languages with different vocabulary, grammar, and morphology. ivrit.ai maintains separate yi-whisper models for exactly this reason.

Before deciding whether to install, talk to the skill

Hebrew ML Datasets Navigator

Trusted88/100

Before deciding whether to install, talk to the skill

Navigate the fragmented landscape of Hebrew and Yiddish ML datasets and models. Covers ivrit.ai (22K+ hours of Hebrew audio, whisper-large-v3 ASR variants, Yiddish models), Dicta (DictaLM 3.0 LLM family, DictaBERT variants, HeQ reading comprehension), the Israeli National NLP Program / NNLP-IL (HebrewSentiment, HebNLI), AlephBERT, and Knesset Plenums. Helps researchers and ML engineers pick the right dataset for a task by use case, license (commercial vs research), Hebrew register coverage, and model-dataset pairing. Use when choosing training data for a Hebrew NLP or ASR project, verifying license compatibility for a commercial product, finding a baseline model for a Hebrew downstream task, or exploring Yiddish ML resources. Do NOT use for Arabic NLP, general HuggingFace dataset discovery, or Hebrew OCR dataset selection (use hebrew-ocr-forms).

The Problem

The Israeli ML community punches above its weight, but the datasets and models are scattered. ivrit.ai publishes world-class Hebrew speech corpora on one HuggingFace org, Dicta publishes Hebrew LLMs and BERT variants on another, the Israeli National NLP Program maintains benchmarks under HebArabNlpProject. Licenses vary from fully commercial-friendly to research-only. A researcher trying to pick the right combination for fine-tuning a Hebrew sentiment classifier on customer support chat for a commercial product has to hunt across five orgs and read every dataset card.

skills-il Developer Tools|7installs70views

0Write a Review

1.0.2MITGitHub

7installs70views

0Write a Review

Updated: April 20, 2026|Tags:datasets ml hebrew yiddish huggingface ivrit-ai dicta nnlp-il licensing israel

npx skills-il add skills-il/developer-tools --skill hebrew-ml-datasets-navigator -a claude-code

Install on Claude.ai, Claude Desktop, ChatGPT, Manus, or other platforms

1. Click "Download ZIP" to download the skill files.
2. Open Claude Desktop and go to Customize > Skills.
3. Click "+" and select "Upload a skill", then upload the ZIP file.
4. Start a new conversation. The skill will activate automatically when relevant.

Not sure how? Read the guide

When to Apply

When choosing training data for a Hebrew NLP or ASR project
When verifying license compatibility for commercial use of a dataset
When looking for a baseline model for a specific Hebrew task
When building a Hebrew transcription stack and need to know what ivrit.ai offers
When researching or building something in Yiddish and need to find resources

Try These Prompts

Commercial sentiment

I want to train a sentiment classifier on Hebrew customer support chat for a commercial SaaS product. Which dataset should I use, which starting model, and what does the license say about attribution?

Hebrew podcast transcription

I am building a Hebrew podcast transcription product. What does ivrit.ai offer, which ASR model should I use in production with low latency, and how do I handle multiple speakers?

Small Hebrew LLM

I need a Hebrew LLM that runs on consumer hardware (16GB VRAM max) for a Hebrew product. What does Dicta offer, what are the size differences, and what are the upstream licenses?

Yiddish ML

I am researching Yiddish and looking for datasets and models for speech recognition and text processing. What is available in 2026 and what are the licenses?

Frequently Asked Questions

Related Skills

Base44 SDK

Trusted77/100

Author: base44

v1.0.0PopularTrending

Build full-stack apps on the Base44 platform using the JavaScript SDK. Covers CRUD operations, authentication, AI agents, backend functions, integrations, and real-time subscriptions.

Ask the Skill

0.030195

Claude CodeCursorGitHub Copilot+6

Telegram Bot Builder

Trusted75/100

Author: skills-il

v1.0.0PopularTrending

Build Telegram bots with grammY, Telegraf, or python-telegram-bot. Covers Bot API v9.5 webhooks vs polling, inline keyboards, commands, middleware patterns, payments API, Mini Apps, and Hebrew message handling with RTL support. Use when building a Telegram bot, setting up webhooks, handling Hebrew messages in a bot, or integrating Telegram payments. Do NOT use for WhatsApp bots (use israeli-whatsapp-business), voice bots (use hebrew-voice-bot-builder), or general chatbot design patterns (use hebrew-chatbot-builder).

Ask the Skill

0.029148

Claude CodeOpenClawCursor+4

Hebrew LLM Eval Suite

Trusted85/100

Author: skills-il

v1.0.0NewTrending

Benchmark and compare LLMs on Hebrew reasoning, comprehension, sentiment, translation, and Israeli cultural knowledge. Wraps the HuggingFace Open Hebrew LLM Leaderboard tasks (HeQ, HebrewSentiment, Hebrew Winograd, translation) plus DictaLM 3.0 benchmark tasks (Summarization, Nikud, Israeli Trivia) into a reproducible evaluation harness. Runs evals against Claude, GPT, Gemini, AI21 Jamba, DictaLM, Llama, and local HuggingFace models. Produces comparison scorecards in JSON and markdown. Use when choosing an LLM for a Hebrew product, answering procurement questions about Hebrew performance, validating a fine-tuned Hebrew model, or tracking Hebrew regressions after a model upgrade. Do NOT use for Arabic NLP, ASR benchmarking, or general English benchmarks.

Ask the Skill

0.0743

Claude CodeCursorCodex

Found an issue with this skill?

Want to build your own skill? Try the Skill Creator · Submit a Skill

Reviews (0)

No reviews yet. Be the first to write one!

Before deciding whether to install, talk to the skill

Hebrew ML Datasets Navigator

Trusted88/100

Before deciding whether to install, talk to the skill

The Problem

skills-il Developer Tools|7installs70views

0Write a Review

1.0.2MITGitHub

7installs70views

0Write a Review

Updated: April 20, 2026|Tags:datasets ml hebrew yiddish huggingface ivrit-ai dicta nnlp-il licensing israel

npx skills-il add skills-il/developer-tools --skill hebrew-ml-datasets-navigator -a claude-code

Install on Claude.ai, Claude Desktop, ChatGPT, Manus, or other platforms

1. Click "Download ZIP" to download the skill files.
2. Open Claude Desktop and go to Customize > Skills.
3. Click "+" and select "Upload a skill", then upload the ZIP file.
4. Start a new conversation. The skill will activate automatically when relevant.

Not sure how? Read the guide

When to Apply

When choosing training data for a Hebrew NLP or ASR project
When verifying license compatibility for commercial use of a dataset
When looking for a baseline model for a specific Hebrew task
When building a Hebrew transcription stack and need to know what ivrit.ai offers
When researching or building something in Yiddish and need to find resources

Try These Prompts

Commercial sentiment

Hebrew podcast transcription

I am building a Hebrew podcast transcription product. What does ivrit.ai offer, which ASR model should I use in production with low latency, and how do I handle multiple speakers?

Small Hebrew LLM

I need a Hebrew LLM that runs on consumer hardware (16GB VRAM max) for a Hebrew product. What does Dicta offer, what are the size differences, and what are the upstream licenses?

Yiddish ML

I am researching Yiddish and looking for datasets and models for speech recognition and text processing. What is available in 2026 and what are the licenses?

Frequently Asked Questions

Related Skills

Base44 SDK

Trusted77/100

Author: base44

v1.0.0PopularTrending

Build full-stack apps on the Base44 platform using the JavaScript SDK. Covers CRUD operations, authentication, AI agents, backend functions, integrations, and real-time subscriptions.

Ask the Skill

0.030195

Claude CodeCursorGitHub Copilot+6

Telegram Bot Builder

Trusted75/100

Author: skills-il

v1.0.0PopularTrending

Ask the Skill

0.029148

Claude CodeOpenClawCursor+4

Hebrew LLM Eval Suite

Trusted85/100

Author: skills-il

v1.0.0NewTrending

Ask the Skill

0.0743

Claude CodeCursorCodex

Found an issue with this skill?

Want to build your own skill? Try the Skill Creator · Submit a Skill

Reviews (0)

No reviews yet. Be the first to write one!

Hebrew ML Datasets Navigator

When to Apply

Try These Prompts

Developer & AI Agent Instructions

Security Analysis

Quality Score

Performance Data

Frequently Asked Questions

What is the difference between ivrit.ai and Dicta?

What is the difference between ivrit.ai and Dicta?

Can I use ivrit.ai data for a commercial product?

Can I use ivrit.ai data for a commercial product?

Why is HeQ on HuggingFace under pig4431, not NNLP-IL?

Why is HeQ on HuggingFace under pig4431, not NNLP-IL?

Do DictaLM 24B and DictaLM 12B share the same license?

Do DictaLM 24B and DictaLM 12B share the same license?

Can I train a Hebrew model on Yiddish data?

Can I train a Hebrew model on Yiddish data?

Related Skills

Base44 SDK

Telegram Bot Builder

Hebrew LLM Eval Suite

Reviews (0)

Hebrew ML Datasets Navigator

When to Apply

Try These Prompts

Developer & AI Agent Instructions

Security Analysis

Quality Score

Performance Data

Frequently Asked Questions

What is the difference between ivrit.ai and Dicta?

What is the difference between ivrit.ai and Dicta?

Can I use ivrit.ai data for a commercial product?

Can I use ivrit.ai data for a commercial product?

Why is HeQ on HuggingFace under pig4431, not NNLP-IL?

Why is HeQ on HuggingFace under pig4431, not NNLP-IL?

Do DictaLM 24B and DictaLM 12B share the same license?

Do DictaLM 24B and DictaLM 12B share the same license?

Can I train a Hebrew model on Yiddish data?

Can I train a Hebrew model on Yiddish data?

Related Skills

Base44 SDK

Telegram Bot Builder

Hebrew LLM Eval Suite

Reviews (0)