OpenAI has informed the world that it is impossible to train large language models (LLMs) like ChatGPT without copyrighted material. Thus, the taking of copyrighted material without recompense to its creators should be allowed. Because the most important thing is that the owners of OpenAI be allowed to be rich. They try and fancy the argument up, but that is the gist of it. They claim that fair use must include training material because otherwise their businesses would never be profitable and thus not be possible. This is a nonsense argument on a few levels and, frankly, as an engineer, offends my sensibilities.
First, training data should not be considered fair use. Fair use is, like many things in America, basically what the most expensive lawyer can convince the dumbest judge they can get in front of what benefits the richest person. In other, less inflammatory words, fair use is highly contested because of the amorphous definition. However, fair use generally takes into account whether or not a use is commercial in nature, the damage to the copyright holders use of the material, and whether it is transformative in some socially meaningful way, sch as research, criticism, or parody. Training data fails all of those propositions.
It is clearly one hundred percent commercial in nature, at least as OpenAI uses it. It clearly adversely impacts the owners of the copyright — why pay for writing or art by a specific person when you can get a knock of (in theory — these still aren’t all that great at producing quality work) from an imitative AI system. It clearly is not transformative — the ease at which these systems regurgitate entire copyrighted works shows that they are simply regurgitating material, not meaningfully transforming it. These systems aren’t learning in any meaningful sense — they are just creating predictions of what piece of copyrighted material should come next based on what came next in their training data.
But just as importantly, to me, is the sheer incompetent arrogance of the supposedly engineering driven companies behind these systems. I am asked all the time to create systems and solutions that must fit within very specific parameters of cost and effort and still produce very specific results. When my initial work cannot adhere to those requirements, I don’t argue that we have to triple my budget or live without features, at least not until we have exhausted all other options. And then sometimes, if we just cannot do what’s asked in the allocated budget and time, we don’t do it. But we always try to adhere to the restraints, because those generally did not come out of nowhere. OpenAI’s refusal to be better engineers is really irritating to me.
If your systems really require free use of other people’s work to be viable, then the problem is not that other people don’t want to give you their work for free. We live in a capitalist society, and unless I missed it somewhere, no one at OpenAI has advocated for so much as a jobs guarantee or universal basic income, much less technological space communism. In that world, taking the things that people use to make money is and ought to be frowned upon. If you cannot make a product without that kind of theft, the problem is not the people keeping you from stealing. The problem is that your system isn’t viable. Any self-respecting engineer would realize this basic fact and act upon it. They should make a system that doesn’t require all of humanity’s copyrighted works in order function if you want a commercially viable product. If they can’t, that failure is on them to correct, not the rest of us.
I think this is one of the things that irritates me so much about imitative AI. It is not ready for prime time in the sense that it clearly cannot be a commercial product without massive harm to the artists and writers that allow it to exist in the first place. In any sane economy, it would have been shitcanned as a commercial product. Instead of wasting their time on imitative AI systems, the clever buggers behind the algorithms could be spending their time on more useful tasks.
Self-driving is likely a fool’s errand, except under the most constrained circumstances, but assisted driving saves lives. They should be working on those problems. Or they could work in the medical research fields, where machine learning has proven to be extremely helpful to saving lives. Or they could be working on electrical storage and transmission problems — the largest non-political barriers to getting off fossil fuels. They should not be blaming artists and writers for their own inability to create a system that cannot function without massive theft.
There is so much useful that machine learning and what we call AI could do to help, rather than hurt, people. It offends my engineering soul that we waste time on arguing that we should protect shitty, non-viable systems at the cost of creative human beings instead of working to solve real problems. OpenAI should stop whining and learn how to not suck at their jobs. Failing that, the rest of us in no way owe them a living on our backs.
i asked gpt4 for a songname in a movie and give them the time the song plays..
chat gpt is too stupid to watch a movie. it gave me five or six new wrong answers. than i shazamed it..
if it is unable for things like this it than is trash. than it is having a programmed learning disability..
seems like it is wanted to be stoopid because of monetary interests like the copyright has..
so indeed it seems to be the greedyness of the copyright laws which make GPT AI dumb..