Turkish Data Depository (TDD) is a platform for sharing data resources and software such as datasets, language models, corpora, and natural language processing tools for Turkish Natural Language Processing and Computational Linguistics. TDD is a non-profit platform driven by the research community. Current efforts of TDD are focused on three main projects:
Data Depository is a collection of Turkish datasets presented with detailed metadata information. Including annotation, size, data type, source, license type, and other information. Each dataset is pre-checked by the TDD team before uploading to the platform. This thorough check is done to assure the dataset’s usability and quality. Moreover, if a citable DOI is not available for the dataset, we provide a DOI for each dataset associated with a well-documented dataset card.
Mukayese platform is a benchmarking platform for various Turkish NLP tools and tasks, from spell checking to Natural Language Understanding tasks (NLU). Each benchmark is provided with a leaderboard, contains one or more datasets, and has two or more baseline models.
Corpus interface is a user-friendly corpora exploration tool. It is intended for the use of linguists and non-technical researchers. We provide the advantage of exploring several different corpora composed of data from various domains and annotated by different attributes.
Public and Private Sector Institutions