Velvet-14B
A balanced foundational LLM, featuring 6 languages, for a wider range of applications, and with a strong focus on reasoning and text comprehension.
The essence of Velvet-14B
Velvet-14B is an instruct model fine-tuned from Velvet-14B-base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems.
Languages
Italian, English, Spanish, Brazilian Portuguese, German, French.
Efforts were specifically made to balance all the languages, with particular emphasis on Italian, which represents approximately 23% of the training data.
Architecture
Auto-regressive language model with a transformer-based causal decoder-only design, with 50 transformer layers.
Training Dataset
The training process started with over 10 trillion tokens and ended with more than 4 trillion tokens.
Context Window
128K Tokens.
This ensures the capability to process extensive documents, even exceeding 400 pages.
Parameters
14 Billion parameters.
Vocabulary
127K Tokens.
Safety
50K Instructions.
Specialization
2B Examples.
Coding
Over 400 billion tokens from more than 100 programming languages to facilitate more structured inferences.
Data Freshness
The pretraining data has a cutoff between August and October 2024.
License
Open weight with Apache 2.0 license.
Training Infrastructure
Built from scratch on a dense architecture, it was trained on Italy’s Leonardo supercomputer, hosted by CINECA.
Capabilities
Natural Language Inference
Information Extraction
Multistep Reasoning
Common Sense Reasoning
Function
Calling
Machine Translation
Textual
Entailment
Text
Classification
Question Answering
Multiturn Conversation
Text
Completion
Summarization
Paraphrasing
RAG
Performance Evaluation
An Independent Evaluation Board compared Velvet with other models under 30B parameters built from scratch, using several metrics to assess the model’s logical reasoning, problem-solving capabilities, and ability to go beyond statistical correlations.
Italian language
| Category | Benchmark | Velvet-14B |
|---|---|---|
| General | MMLU (5-shot) | 58.6 |
| Commonsense | Hellaswag (0-shot) | 72.7 |
| WinoGrande ITA-bench (0-shot) | 73.2 | |
| PIQA ITA-bench (0-shot) | 71.7 | |
| SciQ ITA-bench (0-shot) with p. | 91.9 | |
| Reasoning | ARC-Challenge (0-shot) | 55.2 |
EU languages
| Category | Benchmark | Velvet-14B |
|---|---|---|
| General | MMLU (5-shot) | 56.4 |
| Instruction Following | IFEval (0-shot) - en | 65.4 |
| Commonsense | Hellaswag (10-shot) | 72.8 |
| WinoGrande (0-shot) - en | 72.5 | |
| Reasoning | ARC-Challenge (25-shot) | 57.3 |
| MUSR (0-shot) - en | 12.3 | |
| Function Calling | BFCL (AST summary) - en | 67.5 |
These metrics evaluate its scientific reasoning, capacity to generate plausible, contextually relevant responses based on common sense, and overall understanding across multiple subjects, focusing on providing accurate and informed answers.