---
annotations_creators:
- no-annotation
language_creators:
- found
language:
- aa
- ab
- ace
- acu
- ada
- ady
- af
- agr
- aii
- ajg
- als
- alt
- am
- amc
- ame
- ami
- amr
- ar
- arl
- arn
- ast
- auc
- ay
- az
- az-Cyrl
- az-Latn
- ban
- bax
- bba
- bci
- be
- bem
- bfa
- bg
- bho
- bi
- bik
- bin
- blt
- bm
- bn
- bo
- boa
- br
- bs
- bs-Cyrl
- bs-Latn
- buc
- bug
- bum
- ca
- cab
- cak
- cbi
- cbr
- cbs
- cbt
- cbu
- ccp
- ceb
- cfm
- ch
- chj
- chk
- chr
- cic
- cjk
- cjs
- cjy
- ckb
- ckb-Latn
- cnh
- cni
- cnr
- co
- cof
- cot
- cpu
- crh
- cri
- crs
- cs
- csa
- csw
- ctd
- cy
- da
- dag
- ddn
- de
- de-1901
- de-1996
- dga
- dip
- duu
- dv
- dyo
- dyu
- dz
- ee
- el
- el-monoton
- el-polyton
- en
- eo
- es
- ese
- et
- eu
- eve
- evn
- fa
- fa-AF
- fat
- fi
- fj
- fkv
- fo
- fon
- fr
- fuf
- fuf-Adlm
- fur
- fuv
- fvr
- fy
- ga
- gaa
- gag
- gan
- gd
- gjn
- gkp
- gl
- gld
- gn
- gsw
- gu
- guc
- guu
- gv
- gyr
- ha
- hak
- ha-NE
- ha-NG
- haw
- he
- hi
- hil
- hlt
- hmn
- hms
- hna
- hni
- hnj
- hns
- hr
- hsb
- hsn
- ht
- hu
- hus
- huu
- hy
- ia
- ibb
- id
- idu
- ig
- ii
- ijs
- ilo
- io
- is
- it
- iu
- ja
- jiv
- jv
- jv-Java
- ka
- kaa
- kbd
- kbp
- kde
- kdh
- kea
- kek
- kg
- kg-AO
- kha
- kjh
- kk
- kkh
- kkh-Lana
- kl
- km
- kmb
- kn
- ko
- koi
- koo
- kqn
- kqs
- kr
- kri
- krl
- ktu
- ku
- kwi
- ky
- la
- lad
- lah
- lb
- lg
- lia
- lij
- lld
- ln
- lns
- lo
- lob
- lot
- loz
- lt
- lua
- lue
- lun
- lus
- lv
- mad
- mag
- mai
- mam
- man
- maz
- mcd
- mcf
- men
- mfq
- mg
- mh
- mi
- mic
- min
- miq
- mk
- ml
- mn
- mn-Cyrl
- mnw
- mor
- mos
- mr
- mt
- mto
- mxi
- mxv
- my
- mzi
- nan
- nb
- nba
- nds
- ne
- ng
- nhn
- nio
- niu
- niv
- njo
- nku
- nl
- nn
- not
- nr
- nso
- nv
- ny
- nym
- nyn
- nzi
- oaa
- oc
- ojb
- oki
- om
- orh
- os
- ote
- pa
- pam
- pap
- pau
- pbb
- pcd
- pcm
- pis
- piu
- pl
- pon
- pov
- ppl
- prq
- ps
- pt
- pt-BR
- pt-PT
- qu
- quc
- qug
- quh
- quy
- qva
- qvc
- qvh
- qvm
- qvn
- qwh
- qxn
- qxu
- rar
- rgn
- rm
- rmn
- rm-puter
- rm-rumgr
- rm-surmiran
- rm-sursilv
- rm-sutsilv
- rm-vallader
- rn
- ro
- ru
- rup
- rw
- sa
- sa-Gran
- sah
- sc
- sco
- se
- sey
- sg
- shk
- shn
- shp
- si
- sk
- skr
- sl
- slr
- sm
- sn
- snk
- snn
- so
- sr
- sr-Cyrl
- sr-Latn
- srr
- ss
- st
- su
- suk
- sus
- sv
- sw
- swb
- ta
- taj
- ta-LK
- tbz
- tca
- tdt
- te
- tem
- tet
- tg
- th
- ti
- tiv
- tk
- tk-Cyrl
- tk-Latn
- tl
- tly
- tn
- to
- tob
- toi
- toj
- top
- tpi
- tr
- ts
- tsz
- tt
- tw-akuapem
- tw-asante
- ty
- tyv
- tzh
- tzm
- tzo
- udu
- ug
- ug-Arab
- ug-Latn
- uk
- umb
- und
- ur
- ura
- uz
- uz-Cyrl
- uz-Latn
- vai
- ve
- vec
- vep
- vi
- vi-Hani
- vmw
- wa
- war
- wo
- wuu
- wwa
- xh
- xsm
- yad
- yao
- yap
- yi
- ykg
- yo
- yrk
- yua
- yue
- za
- zam
- zdj
- zgh
- zh
- zh-Hant
- zlm
- zlm-Arab
- zlm-Latn
- zro
- ztu
- zu
license:
- unknown
multilinguality:
- multilingual
size_categories:
- n<1K
source_datasets:
- original
task_categories:
- translation
task_ids: []
paperswithcode_id: null
pretty_name: The Universal Declaration of Human Rights (UDHR)
---

# Dataset Card for The Universal Declaration of Human Rights (UDHR)

## Table of Contents
- [Dataset Description](#dataset-description)
  - [Dataset Summary](#dataset-summary)
  - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
  - [Languages](#languages)
- [Dataset Structure](#dataset-structure)
  - [Data Instances](#data-instances)
  - [Data Fields](#data-fields)
  - [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
  - [Curation Rationale](#curation-rationale)
  - [Source Data](#source-data)
  - [Annotations](#annotations)
  - [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
  - [Social Impact of Dataset](#social-impact-of-dataset)
  - [Discussion of Biases](#discussion-of-biases)
  - [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
  - [Dataset Curators](#dataset-curators)
  - [Licensing Information](#licensing-information)
  - [Citation Information](#citation-information)
  - [Contributions](#contributions)

## Dataset Description

- **Homepage:** https://www.ohchr.org/en/universal-declaration-of-human-rights, https://unicode.org/udhr/index.html
- **Repository:** https://github.com/unicode-org/udhr
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**

### Dataset Summary

The Universal Declaration of Human Rights (UDHR) is a milestone document in the history of human rights. Drafted by
representatives with different legal and cultural backgrounds from all regions of the world, it set out, for the
first time, fundamental human rights to be universally protected. The Declaration was adopted by the UN General
Assembly in Paris on 10 December 1948 during its 183rd plenary meeting.

© 1996 – 2009 The Office of the High Commissioner for Human Rights

This plain text version prepared by the "UDHR in Unicode" project, https://www.unicode.org/udhr.

### Supported Tasks and Leaderboards

[More Information Needed]

### Languages

The dataset includes translations of the document in over 400 languages and dialects. The list of languages can be found
[here](https://unicode.org/udhr/translations.html).

## Dataset Structure

### Data Instances

Each instance corresponds to a different language and includes information about the language and the full document
text.

### Data Fields

- `text`: The full document text with each line of text delimited by a newline (`\n`).
- `lang_key`: The unique identifier of a given translation.
- `lang_name`: The textual description of language/dialect.
- `iso639-3`: The [iso639-3](https://iso639-3.sil.org/) language identifier.
- `iso15924`: The [iso15924](https://unicode.org/iso15924/iso15924-codes.html) language identifier.
- `bcp47`: The [BCP 47](https://www.rfc-editor.org/info/bcp47) language identifier.

### Data Splits

Only a `train` split included which includes the full document in all languages.

|                    | train |
|--------------------|------:|
| Number of examples |   488 |

## Dataset Creation

### Curation Rationale

In addition to its social significance, the document set a world record in 1999 for being the most translated
document in the world and as such can be useful for settings requiring paired text between many languages.

### Source Data

#### Initial Data Collection and Normalization

[More Information Needed]

#### Who are the source language producers?

[More Information Needed]

### Annotations

#### Annotation process

[More Information Needed]

#### Who are the annotators?

[More Information Needed]

### Personal and Sensitive Information

[More Information Needed]

## Considerations for Using the Data

### Social Impact of Dataset

In addition to the social and political significance of the United Nations' Universal Declaration of Human Rights,
the document set a world record in 1999 for being the most translated document in the world and as such can be useful
for settings requiring paired text between many languages including those that are low resource and significantly
underrepresented in NLP research.

### Discussion of Biases

[More Information Needed]

### Other Known Limitations

Although the document is translated into a very large number of languages, the text is very short and therefore may
have limited usefulness for most types of modeling and evaluation.

## Additional Information

### Dataset Curators

The txt/xml data files used here were compiled by The Unicode Consortium, which can be found
[here](https://unicode.org/udhr/index.html). The original texts can be found on the
[United Nations website](https://www.ohchr.org/EN/UDHR/Pages/UDHRIndex.aspx).

### Licensing Information

Source text © 1996 – 2022 The Office of the High Commissioner for Human Rights

The [Unicode license](https://www.unicode.org/license.txt) applies to these translations.


### Citation Information

United Nations. (1998). The Universal Declaration of Human Rights, 1948-1998. New York: United Nations Dept. of Public Information.

### Contributions

Thanks to [@joeddav](https://github.com/joeddav) for adding this dataset. Updated May 2022 [@leondz](https://github.com/leondz).
