Nick is a cofounder of Graphflow, a big data and machine learning company focused on recommendations and customer intelligence. Nick has a background in financial markets, machine learning and software development. He has worked at Goldman Sachs and as a research scientist at online ad targeting startup Cognitive Match in London, and led the Data Science and Analytics team at Mxit, Africa’s largest social network. Nick is a committer on the Apache Spark project. He is passionate about combining commercial focus with machine learning and cutting-edge technology to build intelligent systems that learn from data to add value to the bottom line.

Accepted Talks:

Large Scale Data Processing with Python and Apache Spark

Apache Spark is a fast and general engine for large-scale, distributed data processing. It offers high-level APIs in Java, Scala and Python as well as a rich set of libraries including stream processing, machine learning, and graph analytics. Spark is currently one of the most exciting and fastest-growing Apache open source projects.

This talk will give an overview of the Apache Spark project and introduce the basics of PySpark, the Python API for Spark. It will then dive a little deeper into PySpark internals, and finally show some examples and a live demo covering PySpark, Spark's SQL engine, and machine learning with Spark's built-in libraries as well as other Python libraries.



PyConZA brought to you by Praekelt Foundation