Berkman Center researcher publishes 1700 students’ Facebook data: “We did not consult w/ privacy experts on how to do this, but we did think long and hard ….”

Oct 8, 2008

—

facebook logo I think I’ll let others tell the story for me …

September 25:

In collaboration with Harvard sociology graduate students Kevin Lewis and Marco Gonzalez, and with UCLA professor Andreas Wimmer and Harvard professor Nicholas Christakis, Berkman Fellow Jason Kaufman has made available a first wave of Facebook.com data through the Dataverse Network Project.

The dataset comprises machine-readable files of virtually all the information posted on approximately 1,700 FB profiles by an entire cohort of students at an anonymous, northeastern American university.

— Tastes, Ties, and Time: Facebook data release, Berkman Center for Internet and Society, Harvard University

September 29:

The â€œnon-identifiabilityâ€ of such a dataset is up for debate….Â According to the authors, the collection of the dataset was approved by the IRB, Facebook and the individual college.Â The dissemination of the dataset appears to be approved by the IRB.

— Facebook Datasets and Private Chrome, Fred Stutzman, Unit Structures

September 30:

Of course, this sounds like an AOL-search-data-release-style privacy disaster waiting to happen.

— On the â€œAnonymityâ€ of the Facebook Dataset, Michael Zimmer, michaelzimmer.org

October 2:

I think itâ€™s hard to imagine that some of this anonymity wouldnâ€™t be breached with some of the participants in the sample. For one thing, some nationalities are only represented by one person.

— Eszter Hargittai, in a comment on Unit Structures

We did not consult w/ privacy experts on how to do this, but we did think long and hard about what and how this should be done.

— Jason Kauffman, as a comment on michaelzimmer.org

OK, OK, I’ve held my tongue long enough.Â The arrogant attitude of “we’re smart and we thought about it so we didn’t bother to ask the experts” is a well-known recipe for disaster in privacy (or security or software engineerig or …).Â People like Cynthia Dwork of Microsoft Research and Latanya Sweeney of Carnegie-Mellon University have been studying data anonymization and reidentification for years; this stuff is hard.Â Â How can the Berkman Center not know that?Â And how can Facebook and Harvard be so cavalier as to share data with a research team with an attitude like this?

October 3:

Well, Iâ€™m pretty sure this â€œanonymous, northeastern American universityâ€ is Harvard College. And I didnâ€™t even have to download the dataset to figure it out. Hereâ€™s how.

— More On the â€œAnonymityâ€ of the Facebook Dataset – Itâ€™s Harvard College, Michael Zimmer, michaelzimmer.org

See, I told you this stuff is hard.

October 7:

In the comments, Jason Kaufman implies that the data really isn’t that private, asking what could go wrong, and why would someone post it to Facebook expecting it to remain private.

I have just one question on all of this. If the data isn’t private, why did they attempt to anonymize it?

I believe they attempted to anonymize it because it’s fairly obvious that the data is private, and releasing it with names obviously attached would be pretty shocking.

— Researchers Two-Faced on the Facebook Data Release, Adam Shostack, Emergent Chaos

Yeah, really.

The original research mission (to collect and analyze a set with proper safeguards) was within bounds; the follow-up distribution is the element that clearly poses risk.

— Facebook Dataset Identified, Fred Stuzman, Unit Structures

Well, except it turns out that the original research mission also clearly posed risk: for example, the proper safeguard might not be in place.Â Did the IRB (Institutional Review Board) look at this?Â Did Facebook and Harvard?

Fred goes on to make the excellent point that the researchers should have convened a panel to discuss before releasing the information, and suggests as a potential takeaway “Research that pushes the boundaries of technology and privacy provide IRBâ€™s with unique challenges.”Â True enough, and his post and the comments — along with all the other ones I’ve linkedÂ to — are well worth reading.

But it seems to me that this is letting the Berkman Center, Facebook, and Harvard off the hook a little too easily.Â They just put information about 1700 students, at least some of whom (and probably most) are likely to be identifiable, up on the internet … without even asking their permission.

It’s late at night and so maybe I’m feeling irritable but I find myself asking questions like: In what universe is this supposed to be okay?

The Berkman Center’s mission is to explore and understand cyberspace; to study its development, dynamics, norms, and standards; and to assess the need or lack thereof for laws and sanctions.

— the Berkman Center’s mission statement

The Berkman Center recently hosted a conference and gala on The Future of the Internet.Â People look to them as authorities.Â Is this the future they want to create?

As far as I know, none of the Berkman Center faculty have weighed in on this yet.Â It’ll be interesting to hear what Yochai Benkler, William Fisher, Charles Nesson, John Palfrey, Jonathan Zittrain, John Deighton, Jack Goldsmith, Alexander Keysser, Charles Ogletree and Stuart Scheiber have to say about what this episode says about the “need or lack thereof for laws and sanctions.”

And in terms of understanding, given the potential for gender-, race- and culture-based differences in attitudes towards privacy, I’m also looking forward to what they — and others — think about how events might have been influenced by the Berkman Center’s, and the research team’s, diversity.Â Â Or lack thereof.

jon

Facebook graphic from AJC1’s flickr site, licensed under Creative Commons

Comments

5 responses to “Berkman Center researcher publishes 1700 students’ Facebook data: “We did not consult w/ privacy experts on how to do this, but we did think long and hard ….””

Michael Zimmer

October 9, 2008

To be fair, while the Berkman Center appears to be the current institutional “home” for this research project, my understanding is that they had little (if anything) to do with the research design or the structure of the data release. As recent as June 2008, the PI for the project (Jason Kaufman) gave a presentation about the research at Berkman, apparently hoping that Berkman might agree to take him into their fold.

Yet, as you suggest, the folks at Berkman, assuming they were properly informed, should have seen some red flags pop up.

I know many at the Berkman Center well, I have the utmost respect for that institution, and I’m confident they are trying to find ways to deal with the issues we have raised. Perhaps we’ll hear something from them soon….

Reply
Liminal states » Petitions are soooooo 20th century: Obama supporters AGAINST Larry Summers (DRAFT)

November 8, 2008

[…] very sympathetic with, such as Facebook’s creepy and Orwellian vibe and horrible privacy practices.Â For that matter, a lot of people just plain prefer email.Â So petitions are a valuable […]

Reply
Liminal states » Open for Questions at change.gov: What about privacy?

December 14, 2008

[…] giving everybody involved the benefit of the doubt that this information can’t be disaggregated and used to identify individuals, I don’t much care for our government giving Google control over which third parties it […]

Reply
Liminal states » Facebook: all your content are belong to us. FOREVER! Protests ensue.

February 16, 2009

[…] notification or discussion and ignore feedback fits in with their overall pattern (1, 2, 3, 4, 5, 6 …).Â Presumably other commercial social networks are taking notice of the opportunities […]

Reply
jon

July 12, 2011

The Chronicle of Higher Education looks at the incident in Harvard Researchers Accused of Breaching Students’ Privacy, and has some great quotes from Jason Kaufman, including

“We faced a dilemma as researchers,” Mr. Kaufman said on tape. “What happens if a student has a privacy setting that says, ‘You can’t see me unless you’re my friend,’ and our undergraduate research assistant who is downloading the data is a friend of that person? Then can we include them in our data?”

He left that question unanswered at the time. But Mr. Kaufman talks openly about another controversial piece of his data gathering: Students were not informed of it. He discussed this with the institutional review board. Alerting students risked “frightening people unnecessarily,” he says.

“We all agreed that it was not necessary, either legally or ethically,” Mr. Kaufman says.

Kaufman, by the way, is still a Berkman Fellow. Looks like his hopes that this research would take them into their fold were realized. And the Berkman Center faculty is still all-male.

Reply

Berkman Center researcher publishes 1700 students’ Facebook data: “We did not consult w/ privacy experts on how to do this, but we did think long and hard ….”

Comments

5 responses to “Berkman Center researcher publishes 1700 students’ Facebook data: “We did not consult w/ privacy experts on how to do this, but we did think long and hard ….””

Leave a Reply Cancel reply