In recent years copyright holders have overloaded Google with DMCA takedown notices, targeting links to pirated content.
These requests have increased dramatically over the years. In 2008, the search engine received only a few dozen takedown notices during the entire year, but today it processes two million per day on average.
Copyright holders have used this increase to call for tougher anti-piracy actions from search engines and other intermediaries, claiming that the current system is broken. For its part, Google is concerned that the continued increase may lead to more mistakes.
This week, researchers from Columbia University’s American Assembly and Berkeley published an in-depth review of the current takedown regime, with one study zooming in on the millions of takedown requests Google receives every week.
Using data Google provides to the Lumen database, the researchers reviewed the accuracy of more than 108 million takedown requests. The vast majority of these, 99.8%, targeted Google’s web search.
According to the researchers their review shows that more 28% of all requests are “questionable.” This includes the 4.2% of notices in which supposed infringing material is not listed on the reported URL.
“Nearly a third of takedown requests (28.4%) had characteristics that raised clear questions about their validity, based solely on the facial review and comparisons we were able to conduct. Some had multiple potential issues,” the researchers write.
Among the “questionable” takedown requests are those that target websites that have been shut down over a year ago. As shown in the figure below, rightsholders such as NBC Universal continued to target websites such as Megaupload.com and BTJunkie.org long after they were gone.
“A few senders—generally targeting unauthorized file-sharing sites—continued to send requests targeting links that led to long-defunct sites, calling into question the checks they do to keep their automated algorithms accurate,” the researchers write.
Reporting dead sites
Other questionable notices were improperly formatted, included a subject matter inappropriate for DMCA takedown, or had potential fair use issues, among other things.
Joe Karaganis, co-author of the report and vice president of Columbia University’s American Assembly, informs TorrentFreak that the often automated notices are problematic because the increase in volume makes human review rather impracticable.
“The problem with automation isn’t that it gets stuff wrong. Human senders turn out to be even worse on average. It’s that automation scales the process up in ways that has made meaningful human review difficult or impossible,” Karaganis says.
“With notice sending robots talking to notice receiving robots, the step of actually looking at the targeted content often drops out of the equation. The main contribution of our study is to go back in to look at the targeted content and make those human judgments,” he adds.
The result of the high number of “questionable” takedown notices is that Google likely removes more content than it should. The company currently acts in response to 97.5% of the takedown requests, which means that the vast majority of the questionable notices are honored.
“At a minimum, Google takes a very conservative approach to these issues and yes, probably over removes content,” Karaganis says.
“They are not special in this regard. Given the risk of high statutory penalties if a service rejects a valid notice, most if not all of them err on the side of takedown. Some just categorically take down 100% of the requests they receive.”
The researchers include several policy recommendations on how the current takedown process can be improved. Among other things, they suggest making it more difficult for senders to issue questionable notices without risk.
In addition, they warn against the “notice and stay down” and automated filtering mechanisms copyright holders frequently call for, as these may increase the potential for abuse while hurting due process.
The report, first highlighted by the Washington Post, is very much in line with the position Google has taken thus far.
In that regard, it is worth highlighting that the research is in part funded by Google, who will undoubtedly deploy it in future lobbying efforts, much like the copyright industries do with the research they fund.
Google won’t have to wait long before it can put the study to use, as the U.S. Government is currently running a public consultation to evaluate the effectiveness of the DMCA’s Safe Harbor provisions. This includes issues around automated takedown requests and potential abuse, and the deadline for comments expires tomorrow.