It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Data cleaning is a hugely important part of data science, but it can be hard to find "good" messy datasets to practice your cleaning skills. This site includes datasets that need clearning/organizing/reformatting to be most useful, along with a brief overview of what needs to be fixed in each dataset.
YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. The YouTube-8M Segments dataset is an extension of the YouTube-8M dataset with human-verified segment annotations.