Like many of us, I’ve looked at the class action against OpenAI brought by a number of plaintiffs represented by US-based Clarkson Law Firm, filed on 28 June 2023. At its heart, the claim describes and takes aim at certain web scraping practices that it alleges form part of OpenAI’s product. This suit (together with other law suits being filed against OpenAI) raises fascinating questions that need to be resolved as soon as possible to give certainty for all participants in the AI race – their consumers and (unwitting) data subjects.
I am following the suit as someone with a keen interest in wanting as many people as possible to be engaged in satisfying and meaningful professional work, and I wonder how these new technologies and regulatory regimes will shape the future of work for us all.
Numerous plaintiffs claim their skills, expertise, and artistry have been taken and “incorporated into products that could someday result in [their] professional obsolescence”. This perspective raises the concern that the grind and professional development of individuals cultivated over decades is being misappropriated, creating a world where the ultimate payoff will be fresh-faced organizations offering free lunch to staff wearing black turtlenecks, who aggregate such data but are not responsible for its development. In doing so, and over time, the professions that accumulated this knowledge may become obsolete.
Yet, by failing to adopt this technology, soon both an individual’s skillset and a corporation’s performance will almost certainly become less productive and obsolete. Herein lies our catch-22. By refusing to participate, we give up the learnings we may only access when our own insights are aggregated with those of millions of others. The real value to be unlocked by these technologies is one of aggregation.
For most of us, the question as to whether the generative AI payoff is worth the sacrifice of our intellectual property and privacy is answered in the affirmative (even if begrudgingly).
Take digital artist Lapine, who realized in September last year that her private medical photographs were scraped by Common Crawl (a massive collection of web pages derived from large-scale scraping). This is a tool which the class action alleges is also utilized by OpenAI. Despite Lapine noting in her Twitter account at the time that her private image ended up in the LAION dataset (which is derived from the Common Crawl and available for machine learning tools to utilize), she continued to use and promote tools such as GPT and DALL-E to her Twitter followers.
Similarly, the Italian privacy regulator expressed various privacy concerns about OpenAI earlier this year, including that there was an absence of any legal basis justifying its massive collection and storage of personal data which it used to train its software, and the false information the system could generate. While OpenAI ultimately made some changes to its sign-up process following discussions with the regulator, it’s difficult to see how it allayed the regulator’s concerns regarding mass data collection and false information. In fact, in its response OpenAI said that while it introduced mechanisms to enable data subjects to erase inaccurate information, rectifying inaccuracies is technically impossible. And yet Italy has since agreed to allow ChatGPT to continue offering services to its citizens and residents. If it didn’t, it would surely lose the AI arms race, resulting in a blow to its economic prosperity.
Sam Altman, OpenAI’s CEO, has predicted that the future deployment of generative AI products may eliminate so many jobs that such companies would need to fund a universal basic income (UBI) for their entire country to ensure the fair distribution of profit and maintenance of the standard of living. But what this class action highlights is a distribution of wealth the other way around to date; OpenAI’s utilization of individuals’ data without any compensation to create groundbreaking products and retain the rewards. While the concept of UBI is for another article, the irony is not lost on me as I read about this action, and particularly the unjust enrichment claim.
This court action will come as no surprise to OpenAI. Legal concerns about the underlying dataset hidden in the ‘black box’ of generative AI have been highly publicized since ChatGPT spread like wildfire late last year. What’s more, there is a very specific precedent on privacy law’s response to this type of conduct. In recent years, ClearView AI scraped billions of publicly available photos from websites and social platforms without consent in order to train its AI program. It did this without registering as a data broker under California or Vermont law. Clearview suffered various lawsuits and regulatory fines (although whether the fines were paid, and jurisdiction accepted is another matter) and has since registered as a data broker. OpenAI has not registered as a data broker to date.
In summary, it is inevitable that groundbreaking technology will not fit neatly into existing categories of law. Very often, legal and social change is precipitated by bold technologies that are willing to operate in a gray zone, and in doing so force a regulatory overhaul in their tailwinds. An overhaul that may never have happened without them. This approach is a feature, not a bug.
That in itself is not my concern. My preference is that regulatory certainty is attained sooner rather than later. My focus is on understanding what this means for the professional work of us humans; namely for the future of work and evolution of entire professions and expertise that have informed these technologies. Instead of resisting progress, we should track the evolution of work and embrace it by evolving our own ambitions and skillsets accordingly.