This
course teaches the principles and practices of big data for improving
the reliability and the security of computing systems. Big data is a
technology that is changing the way we do business and the way we play.
As there is a tremendous growth in data being collected about every
process and product in operation, there is value to be mined from this
abundance of data. This means that big data is being applied in areas
where there is great commercial advantage to be had, and consequently,
attacks and failures have become a serious concern.
This course asks and answers the question: Can big data be used to
improve the reliability and the security of the processes that we rely
on in our work and personal lives? Do big data techniques introduce new
vulnerabilities that we should be aware of as we adopt big data
practices in so many aspects of our lives? And what kinds of mitigations
can be designed and deployed against such vulnerabilities?
The course is taught by a leading researcher in reliability and security and an award-winning teacher.
The course has a practical bent and introduces only the necessary
theory and in the context of its application to today’s industrial big
data context. The principles are exemplified through popular big data
frameworks, such as, Apache Spark and Spark Streaming, Flink, Mesos, and
containers.
The course first lays out the problem landscape in the context of
upsurge of data and its implications for reliability and for security.
Then it describes how we can measure the relevant attributes. Next, it
looks at the application-driven requirements and constraints for
applying big data for reliability and security. Then it presents how the
data can be processed for improving the resilience of the processes
that depend on the data. It delves into a set of techniques for
defending big data techniques against natural failures (the reliability
aspect) and against malicious attacks (the security aspect).
The different aspects are tied together through a set of challenge
programming projects that are based on novel datasets that we have
collected and curated.