Tech Companies Under Fire for Using Swiped YouTube Videos to Train AI Models

The use of generative artificial intelligence (AI) has been on the rise, with tech companies constantly seeking training data to improve their models. However, a recent investigation by Proof News has revealed that some companies, including Apple, Nvidia, and Anthropic, have been using YouTube videos without permission to train their AI models.

The investigation found that these companies were utilizing a dataset called YouTube Subtitles, which contained transcripts of over 173,000 YouTube videos from various channels. These videos ranged from educational content to news sites to popular creators like MrBeast and Marques Brownlee. Despite YouTube’s rules against downloading and using content without permission, these companies went ahead and used the data for their AI models.

Marques Brownlee, a popular tech YouTuber, addressed the issue on social media, stating that Apple had sourced data from companies that scraped data/transcripts from YouTube videos, including his own. While Apple may not be directly responsible for the scraping, this revelation raises concerns about the ethical implications of using unauthorized data for AI training.

Proof News also created a tool for creators to search for their content in the dataset, allowing them to see if their videos were included without permission. While the dataset does not include imagery from the videos, it does contain translated subtitles in multiple languages.

The dataset in question was created by Eleuther AI, a non-profit AI research lab focused on promoting open science norms. The dataset, known as the Pile, includes material from various sources, including the European Parliament and English Wikipedia, and was released under a permissive license for academic and research purposes.

This investigation highlights the ongoing challenges surrounding data privacy and ethics in the AI industry. Companies must be held accountable for their data practices and ensure that they are obtaining data ethically and with proper permissions. As the use of AI continues to grow, it is crucial for tech companies to prioritize transparency and ethical data usage to build trust with users and creators.

Tech companies Apple and Nvidia utilized YouTube videos to train artificial intelligence

Identifying 3 Stocks Recommended by ChatGPT-4o to Weather an AI Bubble Burst

GitHub CEO encourages students to persevere in learning programming languages despite advancements in automated coding.

Today’s AI News – July 18, 2024

Factory AI Unveils ‘Code Droid’ with Advanced Autonomous Features for Automated and Enhanced Coding: Achieving 19.27% on SWE-bench Full and 31.67% on SWE-bench Lite

The Role of CEO Satya Nadella in Resolving the Nvidia and Microsoft AI Chip Dispute

Tech Companies Under Fire for Using Swiped YouTube Videos to Train AI Models

Identifying 3 Stocks Recommended by ChatGPT-4o to Weather an AI Bubble Burst

GitHub CEO encourages students to persevere in learning programming languages despite advancements in automated coding.

Today’s AI News – July 18, 2024

Factory AI Unveils ‘Code Droid’ with Advanced Autonomous Features for Automated and Enhanced Coding: Achieving 19.27% on SWE-bench Full and 31.67% on SWE-bench Lite

LEAVE A REPLY Cancel reply

Editor's Picks

NHS reaches breaking point after 14 years...

Navigating the Complexity of Software in Contemporary...

July 29 – Soul & Flavor Food...

Latest

Identifying 3 Stocks Recommended by ChatGPT-4o to...

GitHub CEO encourages students to persevere in...

Today’s AI News – July 18, 2024

Popular

The Renaissance Codex: A Highlight of the...

All about AI chatbots

What is Driving the Popularity of AI...

Sitemap