The answer to "What does this all mean?" is rather more complex than one might think. I'm a writer (predominantly fiction) but also a software developer with some interest in the field. While I'm far from an expert, maybe I can offer some useful thoughts.
Although AI has taken several forms over many decades, most current AI systems are neural networks, which are basically designed to be simulations of the brain. They are far from duplicates of the brain, but they do simulate certain aspects of how neurons are connected and how they fire. They also have an eerie ability to mimic the output of the brain, to a point. But they have a lot of limitations. Bear in mind that the human brain is the most complex structure in the known universe. Even the largest neural networks don't come close to it yet.
These AI systems are "trained" to produce output by running sample data through them, matching their output to the expected output, and then adjusting the "weights" used in the calculations that determine when the nodes ("neurons") fire. That's a complicated process, but basically the idea is to get the system to come closer and closer to output we would expect a human to generate. The accuracy of the system depends heavily on the training data. This is why such systems can end up biased if trainers are not careful.
Another issue is that there is no such thing as "general intelligence" in these systems. They are trained for specific purposes. ChatGPT, for example, is trained to write good text. It can pull data from publicly available sources to incorporate into the text, but it has little to no capacity for evaluating the veracity of those sources. That's why such systems can "hallucinate" (and why it's a terrible idea to blindly count on it for research).
In the case of the systems you tested for your article, they were trained to attempt to distinguish between AI writing and human writing. Their accuracy varies because they were configured differently, trained on different data, and their training was overseen by different trainers. They function by assigning a probability based on that training. It's probably surprising that even one of them reached a "100% human" conclusion. I would have expected them all to produce a number less than 100%. It seems likely there should always be at least a little room for uncertainty.
Be that as it may, I find your reactions (assuming they were serious and not played up for humor) to be one of the key takeaways. People generally don't understand what these systems are and what they can--and can't--do, and yet AI has become the flavor of the month. Everyone is rushing to put it into use without knowing the implications and likely outcomes. I read somewhere that Intel did a study on AI use and found that it was lowering productivity and increasing costs. They concluded that people just need to be trained better in its use, but the author argued that the real problem was a lack of thought being put into how to effectively deploy AI. It undoubtedly has some good uses, but it probably isn't good for everything.
If publishers are now routinely using AI detection systems without understanding what they can and cannot do, that's a problem. But then, that sort of thing always seem to happen with new tech. We love shiny new stuff, are hypnotized by it, and rush to acquire it, often checking our brains at the door in the process. I suppose eventually we'll get it sorted out, but for now it is, in my view, rather a mess.