An AI Ethical Dilemma: What’s the Best Data Diet for LLMs?
There’s a burning controversy at the heart of large language model (LLM) development: the training data. While AI giants claim fair use after scraping the surface web and eating up copious amounts of public data, they likely haven’t bothered to check where it came from or who it belongs to. Researchers from MIT, Cornell University… Read more »