LSE Safari-gate meeting – transcript

Under the Scalpel

A roundtable series exploring accountability in the Information Age

EVENT: THE GOOGLE SAFARI-GATE CASE: ISSUES OF GOVERNANCE AND ACCOUNTABILITY

September 17th 2012, London School of Economics

 

 

 Partial transcript of keynote section

*** Please note: This is draft transcript and should not be relied on  as a precise record of events. Please refer to the event video to confirm content

 

00:30 – Simon Davies – “Good afternoon, ladies and gentleman, welcome to the London School of Economics. Today we are here to discuss what we, at the London School of Economics believe is an important issue relating to governance and accountability of organisations and it is what’s become known as the Safarigate case, involving Google and Doubleclick and what we are attempting to do today, is unravel the background to this case – a case which has resulted in the Federal Trade Commission levying a record fine of 22.5 million dollars against Google. But more importantly, to ask key questions about what this means in terms of law and of consumer rights. What this means in terms of governance, accountability and oversight of organisations and – importantly – of product innovation and development. What can be learnt from the episode which is, I believe, is going to be ongoing and will eventually, in the not too distant future envelop Europe.”

01:40 – SD – “The 4 people around me here, in order of their appearance on this programme – Jonathan Mayer, is a PHD student at Stanford, who much to the joy of Google, actually is the person who discovered the Safari loophole and caused this tidal wave. And Jonathan’s going to give us a beginners’ 101 guide to the technology but then explain how the case became unravelled, his role in it, which believe me has some very interesting sharp twists to it and of course, the emergence then of the FTC enquiry and the outcome there. Dan Tench is a partner with Olswang. Dan has an extremely broad and deep history in litigation and law and he’s going to be giving us an analysis, again informal, of what the legal implications are of this case and where it might take us. Chrisanthi Avgerou is professor of information systems in the department of management here and head of the ISIG group.  Dr Edgar Whitley, is in information systems and an old-timer in many of these issues. They will act as our co-moderators, presenters, discussants and they will lead the rest of the programme and put everything into context. So what effectively you’re going to hear is marrying several disciplines, so I can think of no better pair to deal with that cohesion than Edgar and Chrisanthi. Can we start with you, Jonathan, and tell us how all this began.

03:30 – Jonathan Mayer – “Sure, so thanks for the invitation and the warm introduction. I’m going to briefly touch on three topics. First, a little overview of third party web tracking. Then a little explanation of the research we’ve been doing in the security lab at Stanford. And lastly, the research we’ve been doing on Safari’s cookie blocking feature and how some companies were working around it, particularly Google.

So to begin, third party web tracking, is an issue that I think has, really been raised in profile over the past couple of years. But it’s not an entirely new issue, in fact the history goes back about a decade and a half now when web browsers started to integrate more and more functionality for building sophisticated applications, so when the web was first developed it was a collection of documents that linked to each other, very static and that was it, and of course that didn’t last long. By the mid 1990’s, web browsers could run code from websites, could store information from websites and there was a recognition at the time, that with this sophisticated functionality, came the ability to learn an awful lot about what users were doing on the web and there were calls for web browsers to do something about this, for companies to be fairly restrained in how they took advantage of these new technologies. These calls ultimately, did not amount to too much. Web broswers dabbled a little bit in technical countermeasures and ultimately chose not to deploy countermeasures.”

05:03 – JM – “Meanwhile, companies started to be founded, based on the notion of “We could follow individual consumers on the web, learn what their interests are and make some predictions about what could be relevant to them and then sell that information for targeted advertising.” So this issue dates back to the mid to late 1990’s. Over the following decade, third party web tracking as a business practice has blossomed, the industry around it has fragmented. There’s now a bunch of different business models that include third party web tracking. So behavioural advertisers tend to get discussed the most, but they’re far from the only in this space. There are companies that collect analytics about what users do on websites, figuring out where they tend to go after they visit a particular website or what they do on a particular website. Social networks are now very large – the Facebook “like” button is only a few years old, but it is one of the most prevalent third party tracking elements on the web now – a business model that didn’t even really exist a few years ago, so a lot of turnover.”

06:14 – JM – “I want to take a step back and narrowly focus in on what I mean by third party web tracking here. So when I say third parties I mean, websites that the user isn’t interacting with. So if you go to read some newspapers’ website, in the jargon of the space, that newspaper is the first party, the entity that the user is saying “I want some information from”. Third parties are, the websites that the user doesn’t really expect to interact with, things like an advertising network, or a social network or so on. In some cases, content from these companies isn’t even visible on the website. By tracking, I mean the collection of users’ browser history, so the stuff you see when you go into your history, in the menu of your browser – that’s the stuff that gets collected. There are these different business models, around how that information is used. My focus tends to be on the very collection of the information. We can talk a little about that later – what the privacy issues surrounding that collection might be and what some of the privacy issues surrounding some of the uses might be.”

07:19 – JM – “So, there’s an awful lot of tracking going on now, an awful lot of companies who are getting their hands on this browser history information and there’s a lot of reasons that you might be concerned about it. The information could be sensitive, it can be identified or identifiable, users don’t have a lot of awareness of what’s going on in the space and there’s not, in general, a lot of consumer control. And responses to third party web tracking have taken a variety of forms. There have been some calls for regulation and depending on your interpretation, there may even be some regulation already in place in various jurisdictions. From a technical perspective there are three chief responses. One is, some browsers and browser extensions offer an ability to block tracking in various ways and I’m going to talk about how Safari does that a little bit. There is an industry self-regulatory programme that offers opt-out cookies and I’m going to talk about that a little bit too. And then lastly, there is this initiative called Do Not Track, which is an attempt to build a universal choice mechanism that’s in browsers, and that’s respected by companies across the third party tracking landscape and gives users a dependable control and brings some trust to the ecosystem. I’m not planning on talking about Do Not Track now, but I’m happy to talk about it later.”

08:40 – JM – “OK, so that’s the overview of third party web tracking I wanted to give. So we’ve done, we being the security lab at Stanford, an awful lot of research on policy technology issues surrounding third party web tracking. Particularly we’ve tried to do empirical work – the motivation, a little bit has been around the lines of “lies, damn lies” and statistics. It’s a space that inherently can be measured and there are wide open policy questions that I think can be significantly informed by empirical measurements so that’s what we’ve tried to do. One of our studies looked at what these online advertising opt-out cookies actually do, and as it turns out, these opt-out cookies don’t prevent the collection of users browsing history but at a technical level are nothing more than a signal to companies that provide them. Some companies do take measures to prevent collection of their users browsing history but a little over the majority of participants in these programmes don’t, and I think that led to some howls from folks who thought they had been a little misled by what they thought these programs were doing. They thought that the notion of an opt-out cookie was that a consumer could go to one of these websites that offers opt-out cookies, install it on their browser and they would no longer have their web history collected and as it turns out these opt outs are about having this information used for behavioural targeting, so the same information might get collected, might get used, except for showing you more relevant advertising. So that was one of the studies we did, another one was a look at supercookie technologies. So there’s a great jargon in this space, there’s a bunch of ways it turns out that you can track a web user. You can use ordinary cookies, you can use plain old HTTP cookies, that’s what tends to get the most attention because that’s the most common tracking technology – the rough physical world analogy, might be slapping a barcode on a user’s web browser and then scanning it at every website they go to.”

10:44 – JM – “There are a bunch of other technologies that can slap a barcode on a different part of the browser, those tend to get called supercookies. There are also technologies where you can ask questions about a browser, about various features of the browser, that independently might not be unique but taken together, actually perform a further unique fingerprint that can follow your browser around, that’s fingerprinting. And if you use a supercookie or fingerprint to resurrect a cookie that was deleted, then that becomes a zombie cookie. So, we did some looking at supercookie technology, and we identified a supercookie that Microsoft was using in conjunction with its advertising delivery. We also looked at a technology that a company called Epic Marketplace. We found they were using this means of extracting parts of a user’s browsing history from older browsers, this was a – bug, I think is probably the right term for it, in old browsers that could create invisible links and ask the browser over and over again things like, “is it blue or is it purple?” ****** – You may recall in the old days of the web, when it wasn’t all pretty, that if you had clicked a link it was purple, and if you hadn’t it was blue so, you could ask it over and over again, and that’s how you could figure out where a user had been. So we found these guys had been doing this for, I think it was over fifteen thousand URL’s, including stuff from the National Institute of Health’s website and the Mayo clinic and so on. So that was another project we worked on, trying to understand what tracking technologies are used. With the prevalence of the Advertising Choice icon, which is this little blue triangle you might have seen on some ads – the notion is that, the advertising industry can put this in enough ads, that will bring sufficient transparency, that the industry will have made a sufficiently strong case that they shouldn’t be regulated or shouldn’t be viewed with so much scrutiny. We actually measured how often this thing shows up and it turns out, it doesn’t show up nearly so much. We did some work on understanding the extent to which, these browser histories that get collected are anonymous. Anonymous is a very strong technical word, it’s tossed around a lot in this space. There are many, many ways in which a user’s identity can be associated with this information, in some cases already is associated. The way in which we focused on identifying a user’s browser history was looking at ways in which a first party website might pass that information onto third parties. In particular, for example, shoving a username or a user ID into a URL and then when that URL gets passed onto a third party, that information could be associated with the browser history. It only actually takes a little bit of this information leakage, identifying information leakage, to make web tracking identifiable. We found that it was going on all over the place. The claims about anonymity, which we already thought were pretty thin, seemed to be further undermined by this research. We’ve done some work on privacy preserving alternatives to ****** advertising, ways in which *********** – so ways in which, for example, you might show a user an ad based on sites they’ve been to before, without actually handing to any company, a list of websites the user has been to. That’s an ongoing line of research.”

14:22 – JM – “And then a little sneak preview, we’ve been doing this work, that’s going to be coming out in the next couple of weeks, looking at how users actually configure their browsers. So we’ve been looking at how many users actually use opt-out cookies and Do Not Track. It turns out that almost no-one uses opt out cookies and an awful lot of people, a surprising amount of people actually use Do Not Track. We’ve also looked at comparing internationally how much tracking is going on and it turns out that tracking in the UK and tracking in other European countries, actually looks an awful lot like tracking in the US. One of the arguments that has come out of the online advertising industry has been that these problems are primarily US problems and it’s starting to looks more like it’s the entire web’s set of problems.

OK, so this is the body of work, the Google research we did evolved out of. So one of the projects in this line of work was trying to understand which companies were placing cookies in Apple’s Safari web browser and the way in which we went about measuring that was using this approach of trying to figure out how user’s configured their web browsers. So we actually bought advertising of our own and included code in the ads we bought that measured what cookies seemed to be in place in users browsers, and we bought a bunch of ads targeted only to users of the Safari browser on IOS and looked at which advertising companies had tracking cookies in place. And for the majority of companies, very few cookies were in place – 1% territory (of users who had this cookie blocking mechanism enabled – this privacy feature that is turned on by default). We found that a couple of companies that had an inordinate number of cookies, one of which, of course, was Google. They had, in our data set, which was certainly not representative, roughly 85% of the browsers if I recall, with this privacy mechanism in place had a doubleclicker, which of course raised some red flags, so we then followed it up with some manual reverse engineering and identified three companies that were working around the Safari cookie-blocking mechanism.”

16:42 – JM – “So let me pause here, and say a little about what this mechanism is. So since, I believe it’s 2004, since Safari 1.0, there’s been a privacy feature inbuilt by default, into the Safari web browser. And this is a feature that puts a restriction on third party cookies. So, when I’ve used the terms first party, and third party so far I’ve been talking in terms of user’s expectations. What Apple’s restriction does is place some limits on cookies for third parties, based on domain names. So to be very concrete, suppose you’re on the New York Times website – nytimes.com – and the New York Times tries to set a cookie, Apple’s mechanism won’t put any restrictions on that cookie. That little bit of information that the New York Times wants to save on the user’s web browser. But if a different domain tries to set a cookie, to be concrete, suppose Yahoo is providing an ad on a website and so it is yahoo who tries to set a yahoo.com cookie on the New York Times website, that cookie is blocked from getting set and so the browser will just silently toss it away. And there are a couple of ways in which, Apple has loosened the third party cookie blocking in Safari, so they’re design reasons for what they did – I think there was significant debate over the extent to which, it should have been built in the way it was. One of the ways in which Safari cookie blocking was a little bit different from, for example the way Firefox or Chrome implements their cookie blocking feature is, if a cookie has been set for a website then that website can set any cookies in any context, so, let’s suppose you log into Facebook, when you go to facebook.com and then you go to the New York Times website, and there’s some Facebook content on the New York Times website, that Facebook content can read cookies and set cookies as it likes. So, the design rationale there, is roughly the user interacting with Facebook, they trust Facebook, they have Facebook cookies, so will let Facebook do things as a third party. There’s another limitation in the way in which the Safari cookie blocking feature worked, and the one that received a lot of attention in coverage of this line of research and in Google’s cookies in Safari browsers was the ability to submit a form and set cookies, so the notion here was, if the user filled out a little bit of information on a website and hits submit, if that information got submitted to a third party website you wouldn’t want to allow the third party to submit some cookies. There’s some very legitimate reasons for this. For one, you want to make sure the user doesn’t submit multiple times so you keep track of the fact that the user has submitted. Or there’s a little bit of an issue in that if the users explicitly interacting with some third party content that we might want that content to be ****** and remember some things about the user, cause the user maybe trusts it a little more, they’re sending some information to these guys.

20:05 – JM – But it turns out that a form on a website can be submitted, not only when a user clicks a submit button, but also when a little bit of code on a website just says let’s submit this form and so it was a known issue that a website could create an invisible form and use a little bit of code and submit and then submit cookies in response to that form, such that the user never sees anything, but third party cookies start getting set from the website that uses this form submission work around. And so we identified based on reverse engineering, three companies that were using the form submission work around – Google, Vibrant and The Media Innovation Group and then some crawling results from the Wall Street Journal lead to the identification of a fourth company. It was very clear to us from the get go, that these companies were deliberately circumventing the Safari cookie blocking mechanism. In Google’s case, we tested a bunch of different web browser user agents – so this is the string that a web browser sends to a website and says “Hey, I’m Safari, this version” or “Chrome, this version”. We tested Google’s URL’s that hosted this code with a bunch of different user agents and found that the circumvention code only appeared if you looked like a Safari web browser. It was further clear to us what Google was doing because we actually looked into the particular feature that they were circumventing, for some time before, a few months before. This was a feature that allowed linking a user’s google.com id to their doubleclick.net tracking and advertising network, so Google acquired Doubleclick a few years beforehand and had this historical divide of content between google.com services and doubleclick.net advertising services, and we found that they were building links between user identity on google.com and advertising services on doubleclick.net and this was a means of establishing a linkage for Safari users. So it was not only clear to us that Google was doing this for Safari users, but it was pretty clear to us why they were doing it. To work around this third party cookie-blocking mechanism so they could have this doubleclick.net identity linkage.”

22:28 – JM – “It wasn’t strictly necessary to design the system in that way but it was the clear that that was what the system was intended to do. Vibrant – it was a pretty clear giveaway when the circumvention code it was hosted off, a URL that included Safari in it, so that was kind of clear. And then MIG and PointRoll, their code was copied and pasted, I should say it seemed very clear to us that it was copied and pasted from the example code that was just floating around on the web. In the case of PointRoll, the reason why I and my colleagues and Stanford, didn’t initially identify in our own crawling work was that there was actually a bug in their code. It didn’t do exactly what they intended it to. It wound up still circumventing the Safari feature. The bug, was the exact same bug that was in this example code that was floating around the web so it was pretty clear where this came from. In the cases of Vibrant, MIG and PointRoll, it was very clear that the circumvention was for the purpose of placing an advertising tracking cookie. I don’t believe that any of those companies argued that that wasn’t why they were doing it. In Google’s case, it was clear that they were intending to have a social syncing feature enabled. It wasn’t so clear whether they also intended to allow their ordinary advertising cookie to get set. So, sometime earlier in my ramblings, you may recall I mentioned that once a cookie is set for a website, the Safari cookie-blocking mechanism will allow additional cookies to get set and so Google’s story, which seems plausible to me, is that they intended to set this social syncing cookie and they didn’t intend to set their ordinary ad-tracking cookie but by setting the social syncing cookie, that then allowed in the ad tracking cookie.”

24:21 – JM – “OK, so that was a little bit of a technological detail here. Google had a unique potential liability issue associated with what they were doing, in so far as that they had a website that they put up that explained this opt-out extension that they offered for certain browsers and said that if you’re a Safari user, we don’t offer this advertising opt-out extension, but don’t worry because your browser’s configured by default to block third party cookies and that effectively accomplishes the same thing as our opt-out. Google is one of the companies that I think responsibly, has an opt-out that does actually limit the information they collect. They don’t take advantage of the ability to keep on collecting users browser histories. So they made misrepresentation about what default Safari configuration would do and clearly they didn’t live up to that representation. We brought our findings to the Wall Street Journal who contacted Google on the Monday morning and the story broke in the Journal on the following Friday. We didn’t really hear from Google after that. My understanding is that I don’t get to talk to Google policy staff about this issue unless they have lawyers in the room, so there’s not really a lot of conversation about the thing. Google responded by pushing out this – it wasn’t even a press release, it was a statement that was circulated to the press – we only learnt of this statement because there was a really lazy tech blogger who copied and pasted the thing in its entirety. So, we read it and thought “This seems really misleading to us”. I think it was actually a remarkable masterpiece of a press statement. Like, if one of my classmates at Stanford Law had written this thing, they’d earn every penny of their bonus – well it wouldn’t be measured in pennies. But every sentence is literally true, but totally suggestive of something of something inaccurate. So if you’ve got a chance to skim this piece here, you might have noticed that one of the claims in there was about how the cookies don’t collect personal information. Now, this is like a masterstroke of ****** – cookies don’t collect anything. Cookies are just a little piece of information that gets saved in a browser. It’s the website that collects things. And so sure, cookies don’t collect personal information but because of their cookies, Google collected personal information. That seems ******  the response, it at least seemed to me, and seemed to many colleagues to be so misleading and so deliberately misleading. There is a possibility, based on how quickly some awfully friendly bloggers responded to the issue that there’s some responses seeded throughout the blogosphere. I’m no expert in PR – it would certainly make a lot of sense to me – but I don’t have any confirmation of it. I think Google’s response was surprisingly effective, a lot of the coverage in response to the Wall Street Journal piece was that this was some Wall Street Journal hyperbole or that this was Apple’s fault, and we’re going to throw our hands up and hey – WSJ says this and Google says this and make of it what you will. I thought that was very frustrating.”

27:24 – JM – “We wrote a piece, trying to clear up exactly what had happened and published it the following week. I wish I could say that was very effective but I’m not entirely sure. The FTC were investigating this issue and so the FDC moved towards fining Google, and in the process of the investigations and the negotiations of the possible penalty, it appears Google began to leak details of the negotiations. So for very understandable reasons, to blunt the impact of a potentially quite large fine, there were really some remarkable and precise details that were leaked, about amounts and what exactly the penalties were going to be for and some journalists made it very clear that the information was coming from Google. Then ultimately the settlement was announced, 22.5 million dollars – I’m sure we’ll talk about some of the details of that in the following conversation. There have been a number of private law suits, class actions filed against Google. Some state Attorney Generals have been open about their investigations. From Google ****** they don’t place tracking cookies in Safari any more, as I understand it so at a technical level, the issue has certainly been resolved very quickly. So in wrapping up, I guess I want to frame the following discussion with some of the concerns I’m left with. First being, how exactly did this happen? How did some Google engineering, so many Google engineers decide it was OK to circumvent a privacy feature into one of the world’s most popular web browsers. Where was the oversight over this whole code? Where was the training and internal ethical compass about it, making sure that this sort of thing didn’t happen. Where was the monitoring so that if something like this did happen it didn’t actually get so far as going to the users and even if it did go all the way to the users – how do Google make sure that it wasn’t public for very long and that they caught the problem and fixed it. It seems like this went on for roughly half a year before it was called to their attention. So are they going to open questions about how exactly this happened? I think Google’s response, also, raises a lot of troubling questions. I think Google should have accepted responsibility for what happened. I think it should have been very transparent about what exactly happened and why and I think it should have committed to internal improvements so this didn’t happen again. Google have said that its mission is to organise the world’s information including all of ours and I think if Google wants to organise the world’s information it’s going to need a lot of trust and I think we should hold it to that high standard I articulated, and instead, what I think this episode showed is a company looking to minimise the adverse impact, however possible, denying fault and significantly misleading on the underlying facts. I often get asked whether I think Google lived up to its “Don’t be evil” promise in this episode. I don’t know if all this counts as evil. I’m not going to get into the metaphysics of the whole thing, but it sure sounds like a typical company to me. And Google’s made an awful lot of how it’s not a typical company.”