Telegram Group & Telegram Channel
Excited to share that we have released RuBLiMP (Russian Benchmark of Linguistic Minimal Pairs), a novel benchmark for evaluating Russian language models (LMs).

RuBLiMP consists of 45,000 minimal pairs and includes 12 grammatical phenomena well-represented in Russian linguistics, covering morphology, syntax, and semantics. A minimal pair consists of a grammatical and an ungrammatical sentence (e.g., The cat is on the mat / *The cat are on the mat), and an LM is expected to prefer the grammatical one based on the scoring function.

Our approach allows to:
🔸generate minimal pairs at scale from any text domain
🔸estimate if a grammatical sentence appears in the LM's pretraining corpus

💡RuBLiMP can be used for evaluating the sensitivity of LMs to grammatical phenomena in Russian and for developing ranking and grammatical error detection methods.

🔸 Read more in our pre-print: https://arxiv.org/abs/2406.19232
🔸 HuggingFace: https://huggingface.co/datasets/RussianNLP/rublimp
🔸 GitHub: https://github.com/RussianNLP/RuBLiMP



tg-me.com/nlp_seminar/130
Create:
Last Update:

Excited to share that we have released RuBLiMP (Russian Benchmark of Linguistic Minimal Pairs), a novel benchmark for evaluating Russian language models (LMs).

RuBLiMP consists of 45,000 minimal pairs and includes 12 grammatical phenomena well-represented in Russian linguistics, covering morphology, syntax, and semantics. A minimal pair consists of a grammatical and an ungrammatical sentence (e.g., The cat is on the mat / *The cat are on the mat), and an LM is expected to prefer the grammatical one based on the scoring function.

Our approach allows to:
🔸generate minimal pairs at scale from any text domain
🔸estimate if a grammatical sentence appears in the LM's pretraining corpus

💡RuBLiMP can be used for evaluating the sensitivity of LMs to grammatical phenomena in Russian and for developing ranking and grammatical error detection methods.

🔸 Read more in our pre-print: https://arxiv.org/abs/2406.19232
🔸 HuggingFace: https://huggingface.co/datasets/RussianNLP/rublimp
🔸 GitHub: https://github.com/RussianNLP/RuBLiMP

BY исследовано




Share with your friend now:
tg-me.com/nlp_seminar/130

View MORE
Open in Telegram


NLP Seminar Telegram | DID YOU KNOW?

Date: |

A Telegram spokesman declined to comment on the bond issue or the amount of the debt the company has due. The spokesman said Telegram’s equipment and bandwidth costs are growing because it has consistently posted more than 40% year-to-year growth in users.

Telegram Gives Up On Crypto Blockchain Project

Durov said on his Telegram channel today that the two and a half year blockchain and crypto project has been put to sleep. Ironically, after leaving Russia because the government wanted his encryption keys to his social media firm, Durov’s cryptocurrency idea lost steam because of a U.S. court. “The technology we created allowed for an open, free, decentralized exchange of value and ideas. TON had the potential to revolutionize how people store and transfer funds and information,” he wrote on his channel. “Unfortunately, a U.S. court stopped TON from happening.”

NLP Seminar from us


Telegram исследовано
FROM USA