As the quality of visual recognition software continues to improve, privacy concerns have grown concomitantly. Because we now document our lives with so many pictures posted to social media—Facebook hosts over 250 billion photos, with 350 million new photos added every day—photographs are becoming hugely important to the big data movement. Indeed, some say Facebook stores over 4 percent of all the pictures ever taken in history. What truths may lurk behind all those images—and who wants to know?
Cutting-edge visual recognition software programs now make it possible not only to identify a person in a photo on Facebook or elsewhere, but also to determine what that person is doing in the photo.
There’s already image recognition software, used by the fashion industry, that lets a shopper take a picture on his or her smartphone of a piece of clothing and then match that piece by color, pattern, and shape to the offerings of 170 retailers that sell something similar. That’s a benign use of this technology. But more ominous applications are already emerging.
My sense is that this concern is helping to fuel the growth of ephemeral social media sites such as Snapchat, where—at least in theory—photos don’t sit there in perpetuity to be exploited by data miners; they last all of 10 seconds.
After all, imagine all your online photos being processed into a data profile by advertisers or law enforcement, showing where you live, where you’ve been, with whom you hang out and what activities you’ve participated in. If a single picture is worth a thousand words, what are 250 billion photos worth?
“Web scraping” or “web harvesting”—the practice of extracting large amounts of data from publicly available websites using automated “bots” or “spiders”—accounted for 18% of site visitors and 23% of all Internet traffic in 2013. Websites targeted by scrapers may incur damages resulting from, among other things, increased bandwidth usage, network crashes, the need to employ anti-spam and filtering technology, user complaints, reputational damage and costs of mitigation that may be incurred when scrapers spam users, or worse, steal their personal data.
Though sometimes difficult to combat, scraping is quite easy to perform. A simple online search will return a large number of scraping programs, both proprietary and open source, as well as D.I.Y. tutorials. Of course, scraping can be beneficial in some cases. Companies with limited resources may use scraping to access large amounts of data, spurring innovation and allowing such companies to identify and fill areas of consumer demand. For example, Mint.com reportedly used screen scraping to aggregate information from bank websites, which allowed users to track their spending and finances. Unfortunately, not all scrapers use their powers for good. In one case on which we previously reported, the operators of the website Jerk.com allegedly scraped personal information from Facebook to create profiles labeling people “Jerk” or “not a Jerk.” According to the Federal Trade Commission (FTC), over 73 million victims, including children, were falsely told they could revise their profiles by paying $30 to the website.