This question was posed in an article today in the WSJ: AI Generated Art for a Conic Book. Human Artists are Having a Fit. Reading the article there seems to be two angles. The first is this: to be copyrighted the material created by the AI model needs to “show a modicum of creativity”. And the powers that be will have to determine if AI can create something new or novel. On the face of it, the art in question looks new or novel.
Creativity is as Creativity Does
But toward the early part of the article another angle is noted that has an impact on the main question. This more interesting angle is not explored further in the article and refers to the data used to train the model. The source data-set that was used to train the AI engine may have used copyrighted material. And that data might have been used without the copyright holders permission. In other words, the training of the machine learning engine used novel data. Just recently we read a lot about how ML models can synthesize a lot of data.
This becomes delectable. Could the copyright decision on the AI-created art be novel if it hadn’t used novel data to train it? What if the training data-set included standard, non-novel data? Does that make a difference? Could AI models create anything deemed creative if they cannot use unique data sets?
Value From the Sausage, or the Machine?
Are there not restrictions in place to limit the use of copyright data to train AI-based models? I’d have thought so, but I am not a lawyer. If this is not standard, there could be a market here. If you license some data today, such as weather data, there are restrictions on its use. Many organizations share data today. Does this mean any organization that shares data should prohibits use of that data in AI-based software tools, without additional payment to the copyright owner?
If I buy a copyrighted manuscript there are limitations to usage. I cannot reprint it in any form without permission. What about data? Since data has unique properties compared to physical assets, maybe we should all have standard clauses to limit use in derivative-based software tools such as AI. What about AI-generated code, with a model trained on copyrighted code scripts?
It will be interesting to see what happens in this case. It will be even more interesting to see how that decision impacts standard data sharing practices. If the AI-generated comic is:
- Successfully defended as copyright, the fees and controls for usage of copyrighted data inputted to AI-based models might be expected to go up significantly.
- Determined not to be copyrightable, owners of data used for training of such creations will resist inclusion of their data in that training, as their IP will be less monetizable. As such, additional controls and regulation will come to the fore.