The availability and analysis of big data opens up enormous opportunities for research, but is not without serious dangers, faculty concluded at "The Amazing Potential and the Dark Side of Big Data," a university-wide seminar organized as part of CEU's Intellectual Themes Initiative, which brings research, teaching and outreach projects under four interdisciplinary themes, one of which is Networks.
The "data deluge" is a "gold mine for science," according to Professor Janos Kertesz of CEU's Center for Network Science (CNS), who also directs CEU's PhD program in network science. Kertesz spoke of the possibilities of mapping human behavior better than ever before, because of the sheer size of the sample group compared with traditional research methods such as questionnaires, which by nature have a limited size and the danger of subjective answers.
At CNS, records of millions of mobile phone calls, with metadata such as gender and age, are just one set of big data that allowed Kertesz and colleagues to map relationships among people over time and identify patterns of behavior among men and women in different generations.
Collection, analysis, storage of, and access to records and personal data present opportunities indeed, but also raise ethical questions regarding privacy, according to Dean of Students Chrys Margaritidis, who is also a philosopher and teaches at CEU Summer University.
"The big ethical question is that there is a set of technological possibilities, a set of what society or an organization wants, what is allowed by law, and what we want ethically. These are misaligned," Margaritidis said in his presentation, "Big Data, Big Brother, Big Challenges." "Perhaps one day they will be aligned."
The use of past data to predict future behavior or events can also be problematic from an ethical perspective, Margaritidis said. Predictive policing is based on applying artificial intelligence to the large sets of crime data, finding new patterns that weren't recognizable before.
"The goal is to predict the location of future crimes," Margaritidis said. "The caveats are, that we assume the future will follow past patterns. Second, the sources of data – are they accurate? Third, it's costly to maintain. Fourth, it may become a self-fulfilling prophecy."
Margaritidis left the audience with the thought that big data involves issues of the utmost importance - knowledge, identity, privacy, responsibility, justice, equality, democracy – and must be examined from an ethical perspective.
The idea that the data deluge makes theory and modeling obsolete is not true, according to Roberta Sinatra, assistant professor at the Center for Network Science and at the Department of Mathematics and its Applications, referring to Wired editor Chris Anderson's analysis.
"Google is the best example of a data collector today, but they got the prediction of the spread of flu a couple of years ago very wrong," Sinatra said. "The U.S. would not have been prepared [for the outbreak] if decisions had been based on that prediction. It was not done in a mechanistic way but in a black box model. For a good prediction, we need to see how people move, how they are connected."
In this way, Sinatra warned against thinking big data can make predictions all by itself. Weather phenomena such as a hurricane's trajectory can be predicted with data, indeed, but it comes from understand the laws of physics, of how nature works, she said.
"Big data by itself cannot give us an answer," she said. "We need to couple this treasure trove of information with our understanding of the systems. We need the right models. Not just statistical, but how social systems work. We need to find the mechanisms driving these social phenomena."
The presentations were followed by a discussion among the presenters, joined by Kate Coyer, director of the Civil Society and Technology Project for the Center for Media, Data and Society at the School of Public Policy, who is currently a fellow at Harvard's Berkman Klein Center for Internet & Society. Gabor Kezdi, professor in the Department of Economics, also joined.
"Who's collecting data, for what purpose, and who has access?" Coyer asked. "We have the private sector as collectors and aggregators, and what is their relationship with governments? Data becomes a benign term... it's about our identity. Who is at risk? All of us."
"Who tells whom what's a fact?" Kezdi asked. "This is a fundamentally difficult question. The academic community has an enormous responsibility to address this issue. The level of literacy in the basics of data analysis, scientific inquiry, and the basics of data security for our own digital safety is so low that it's a miracle things are still kept together."
The issue has huge implications for CEU, according to President and Rector Michael Ignatieff.
"On the educational side, we need to educate our students to some of these issues," Ignatieff said. "We need courses that introduce students to the excitement of big data but also to the ethical questions. We have direct questions – we are collecting data and need some policies. There are implications for our house."
The seminar was the second of the 2016-17 academic year, following one last month on energy and security. Three are planned for the next semester - one on witchcraft, one on political thought, and a third on religious studies.