top of page

100 Most Famous AI Data Sources

Computer Vision Datasets

  1. ImageNet Large-scale image dataset for object recognition.

  2. COCO (Common Objects in Context) Image dataset for object detection, segmentation, and captioning.

  3. MNIST Handwritten digits dataset.

  4. CIFAR-10 / CIFAR-100 Image classification datasets with 10 and 100 classes.

  5. CelebA Large-scale face attributes dataset.

  6. Pascal VOC Object detection and segmentation dataset.

  7. Fashion-MNIST Fashion product images for classification tasks.

  8. Stanford Cars Vehicle recognition dataset with car images.

  9. LFW (Labeled Faces in the Wild) Facial recognition dataset.

  10. Open Images Dataset Large image dataset annotated with object detection, segmentation, and visual relationship annotations.

  11. SUN (Scene Understanding Database) Scene recognition dataset.

  12. Cityscapes Dataset for urban scene understanding.

  13. ADE20K Semantic segmentation dataset.

  14. DeepFashion Fashion image recognition and retrieval dataset.

  15. Oxford 102 Flowers Image classification dataset of flowers.

  16. iNaturalist Large-scale species classification dataset.

  17. Caltech 256 Image recognition dataset.

  18. YouTube-8M Large-scale labeled video dataset.

  19. Kinetics-700 Large-scale action recognition dataset for videos.

  20. PlantVillage Dataset for plant disease classification.

Natural Language Processing (NLP) Datasets

  1. Wikipedia Text Dataset Corpus of text data from Wikipedia.

  2. GloVe (Global Vectors for Word Representation) Pre-trained word vectors.

  3. WordNet Lexical database of English.

  4. SQuAD (Stanford Question Answering Dataset) Reading comprehension dataset.

  5. IMDB Reviews Dataset Sentiment analysis dataset with movie reviews.

  6. 20 Newsgroups Text dataset of newsgroup documents.

  7. Common Crawl Large web crawl dataset.

  8. OpenWebText Open-source version of the WebText dataset.

  9. BookCorpus Dataset of books for text-based training.

  10. Enron Email Dataset Dataset of emails for NLP tasks.

  11. Reuters-21578 Newswire dataset for text classification.

  12. Penn Treebank Dataset for syntactic and semantic parsing tasks.

  13. AG News News classification dataset.

  14. TREC Question Dataset Dataset for question classification tasks.

  15. Quora Question Pairs Dataset for identifying duplicate questions.

  16. SNLI (Stanford Natural Language Inference Corpus) Dataset for sentence-level natural language inference.

  17. MultiNLI Multi-genre natural language inference corpus.

  18. OntoNotes Dataset for coreference resolution, named entity recognition, and other NLP tasks.

  19. Wikitext Large-scale text dataset for language modeling.

  20. CONLL-2003 Named entity recognition (NER) dataset.

Speech Recognition Datasets

  1. LibriSpeech Large-scale dataset for automatic speech recognition.

  2. TIMIT Dataset for phonetic and word recognition.

  3. Mozilla Common Voice Open-source speech dataset.

  4. VoxCeleb Dataset for speaker identification.

  5. TED-LIUM Dataset for automatic speech recognition from TED talks.

Time Series and Tabular Datasets

  1. UCI Machine Learning Repository Collection of datasets for various machine learning tasks.

  2. Kaggle Datasets Large collection of public datasets.

  3. MIMIC-III Healthcare dataset containing electronic health records.

  4. Google Trends Time series data of popular Google search queries.

  5. Yahoo Finance Dataset Financial market data.

  6. NOAA Global Temperature Data Historical weather and temperature data.

  7. Rossmann Store Sales Time series dataset of store sales data.

  8. GEFCom2014 Power consumption forecasting dataset.

Healthcare Datasets

  1. ChestX-ray14 Dataset for detecting pneumonia in chest X-rays.

  2. COVID-19 Open Research Dataset (CORD-19) Scholarly articles related to COVID-19 research.

  3. LUNA16 Lung nodule analysis in CT scans.

  4. Diabetic Retinopathy Dataset Fundus images for detecting diabetic retinopathy.

  5. PhysioNet Various biomedical datasets.

  6. UK Biobank Biomedical database containing health information.

  7. Breast Cancer Wisconsin Dataset Data on breast cancer diagnosis.

Robotics and Autonomous Driving Datasets

  1. KITTI Vision Benchmark Suite Dataset for autonomous driving.

  2. nuScenes Large-scale dataset for autonomous driving.

  3. Waymo Open Dataset Dataset for autonomous vehicle research.

  4. Baxter Robot Dataset Robotics grasping dataset.

  5. D4RL Datasets for Deep Reinforcement Learning.

Other AI Datasets

  1. Google Open Images Dataset Image recognition and segmentation.

  2. Google Audioset Audio dataset for sound event recognition.

  3. YouTube-8M Large-scale video dataset.

  4. OpenAI Gym A toolkit for developing and comparing reinforcement learning algorithms.

  5. Omniglot One-shot learning dataset for character recognition.

Graph and Networks Datasets

  1. Reddit Graph Dataset Graph dataset with hyperlink information between subreddits.

  2. Facebook Social Circles Dataset of Facebook social circles.

  3. PPI (Protein-Protein Interactions) Graph dataset of protein-protein interactions.

  4. OGB (Open Graph Benchmark) Large benchmark dataset for graph learning.

Other Notable Datasets

  1. Amazon Product Reviews Dataset Large-scale dataset of Amazon reviews.

  2. MovieLens Movie ratings dataset for recommendation systems.

  3. Flickr8k Dataset for image captioning.

  4. TED Talks Dataset TED Talks transcripts for NLP research.

  5. Olist Brazilian E-Commerce Dataset Dataset for e-commerce analytics.

Energy and Climate Datasets

  1. Global Power Plant Database Energy generation dataset.

  2. Weather Dataset Historical hourly weather data for various cities.

Cybersecurity Datasets

  1. UNSW-NB15 Dataset Network intrusion dataset.

  2. CICIDS2017 Dataset Dataset for network intrusion detection.

Education and Learning Datasets

  1. EdNet Large-scale dataset for online education systems.

  2. PIAAC Dataset Data on adult competencies.

Synthetic Datasets

  1. Scikit-Learn Datasets Collection of synthetic datasets for machine learning.

  2. Blender Synthetic Datasets Datasets generated with Blender for computer vision.

Astronomy and Physics Datasets

  1. LSST (Large Synoptic Survey Telescope) Astronomical dataset for deep space analysis.

  2. Cosmology Simulation Dataset Simulated data for cosmology research.

  3. NASA Exoplanet Archive Dataset for exoplanet discovery.

Geospatial Datasets

  1. OpenStreetMap Collaborative geospatial dataset.

  2. Global Land Cover Dataset Global land cover maps and analysis.

Government Datasets

  1. US Census Data US demographic data.

  2. World Bank Open Data Global development data.

  3. European Union Open Data Portal Collection of open data from the European Union.

  4. UNICEF Data Global dataset on child welfare.

Other Popular Datasets

  1. Million Song Dataset Music dataset for recommendation systems.

  2. YouCook2 Dataset for instructional video analysis.

  3. Instacart Market Basket Analysis Grocery transaction dataset.

  4. Twitch Users and Communities Dataset Dataset for social network analysis.

 
 
 

Comentários


bottom of page