What's stacking: an advanced Hadoop deployment with BigTop Presenter: Konstantin Boudnik, PhD Random vs. Sequential Presenter: Ulrich Rückert, Datameer Powered by Facebook Comments
Watch live video from kctv88 on Justin.tv
A production of software stacks is an important part of a healthy software ecosystem. This talk is about most advanced open technology for the software stacks creation and validation, provided by Apache BigTop (incubating). I am going to discuss the advantages of the project, challenges our project and community is facing, and future plans.
Hadoop helps to make big data tasks feasible by providing two important services: while HDFS introduces controlled redundancy to prevent data loss, the Map/Reduce framework encourages algorithm designers to read and write data sequentially and thus optimize throughput and resource utilization. In this talk we dive into the details of how sequential access affects performance. In the first part of the talk, we show that sequential access is important not only for hard drives, but all storage components used in today's computers. Based on this observation, we then discuss statistical techniques to improve performance of common analytical tasks. In particular, we show how randomness can be used strategically to improve speed and possibly accuracy.
What's stacking: an advanced Hadoop deployment with BigTop
Presenter: Konstantin Boudnik, PhD
Random vs. Sequential
Presenter: Ulrich Rückert, Datameer
Powered by Facebook Comments