About a year ago Google posted interesting research on technology enabling your computer to listen to the ambient sounds emitted from your TV, automatically determine what is being watched (recognizing the theme song to “Seinfeld” or the sounds of a football game), and deliver “relevant content” (advertising) to your web browser. Google has now filed a patent application based on this concept: Social and Interactive Applications for Mass Media.
At the time of the original post, I commented on the privacy concerns with such a system, such as ensuring the ambient sound recorded is really from the television and not a conversation among real people in the room. The patent offers no new answers to this problem and present the same solution in the original paper: compressed and irreversible statistical summaries of ambient audio snippets will be processed on the client to be sent to the network. No one will be able to reverse the statistical summary to extract the actual audio.
This sounds like a good solution, but I’m still uncomfortable due to some general uncertainties of how such a system would be implemented:
- I wonder if the user’s machine stores (either overtly or in a cache) the original audio snippets? If so, for how long? Can they be accessed (by nosy spouses or law enforcement)?
- I also wonder what kind of categories these statistical summaries will be placed into? is it something like “love scene” or “oral argument”? Imagine if there was a “domestic disturbance” category of ambient sound, and Google had a record of such a audio event from my household at a particular date & time when there was no television program with a similar audio event. Is that evidence of something that occurred in my living room?
- Even though the system promises to compress and encrypt these audio samples in such a way that they can’t be reversed, will that always be the case? Will the system have the flexibility to simply record and transmit unchanged audio? In real-time? (I’m thinking about how law enforcement has used OnStar systems in cars to eavesdrop on conversations.)
Along with these uncertainties, a new concern appears from the patent application: Google proposes using video camera surveillance to determine the number of people watching a particular program:
[A]n image capture device (e.g., digital camera, video recorder, etc.) can be used to measure how many viewers are watching or listening to a broadcast. For example, various known pattern-matching algorithms can be applied to an image or a sequence of images to determine the number of viewers present in a broadcast environment during a particular broadcast.
Nielsen has announced similar attempts/desires to measure actual people in the room, but, of course, Google brings the ability to cross-reference the “pattern-matching algorithms” from the faces in your living room with its facial recognition features of Google Image Search…<shudder>