Coding for Data Science

Module: Python (20 hours)

Instructor: Nicolò Cesa-Bianchi

Goals

  • Focus on combining tools as opposed to programming
  • Understanding the geometry of data
  • Basics of data analytics, machine learning, and visualization

Syllabus

  • A short tour of Python
  • Linear algebra with Numpy
  • Pandas and Matplotlib
  • Data analysis with Scikit-learn

Course web page: http://cesa-bianchi.di.unimi.it/CDS/

Why Python

  • A full-fledged, object-oriented language with a syntax similar to other popular programming languages
  • Python's libraries for data science are quickly becoming an industry standard for data science applications
  • Dominating language in Big Data, Deep Learning, Natural Language Processing, Collaborative filtering, and more

Jupyter Notebook

  • A web-based interactive computational environment
  • In-browser editing for code, with automatic syntax highlighting, indentation, and tab completion
  • The ability to execute code from the browser, with the results of computations attached to the code which generated them
  • Displaying the result of computation using rich media representations, such as HTML, LaTeX, PNG, SVG, etc
  • In-browser editing for rich text using the Markdown markup language, which can provide commentary for the code, is not limited to plain text
  • The ability to easily include mathematical notation within markdown cells using LaTeX, and rendered natively by MathJax

Basic Jupyter elements

  • Notebook server started from the command line, jupyter notebook
  • Dashboard: for managing notebooks
  • Notebooks: documents that contain the inputs and outputs of an interactive session, as well as additional text that accompanies the code but is not meant for execution
  • Kernels: Interfaces between notebooks and programming languages. When a code cell is executed, code that it contains is sent to the kernel associated with the notebook. The results that are returned from this computation are then displayed in the notebook as the cell’s output

Installing Python

  • The Anaconda Distribution (https://www.anaconda.com/downloads) includes Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science
  • Install Python 3 version!