Wed, Nov 27, 2013 -- pandas, SciPy, and matplotlib for data analysis, statistics, and plotting ########################################################### Installing pandas on EC2 =========== Since pandas does not come installed by default on EC2, we need to install it ourselves. Run the following commands to update your EC2 node:: # Update your apt-get: apt-get update # Pre-requisities apt-get install build-essential gfortran gcc g++ curl wget python-dev # Make sure you have the latest setup tools wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py -O - | python2.7 # Get pip curl --show-error --retry 5 https://raw.github.com/pypa/pip/master/contrib/get-pip.py | python2.7 The pandas module is built off of the NumPy module. Now we need to update NumPy to the latest version of it. Unfortunately, the automatic upgrading tools on EC2 don't seem to work right, so we need to update it manually:: # download the latest version of NumPy wget --no-check-certificate https://pypi.python.org/packages/source/n/numpy/numpy-1.8.0.zip#md5=6c918bb91c0cfa055b16b13850cfcd6e # unzip the NumPy file and move into the install directory unzip numpy-1.8.0.zip cd numpy-1.8.0/ # build NumPy python setup.py build # install NumPy python setup.py install # move back to the home directory and clean up cd ~/ rm -r numpy-1.8.0/ rm numpy-1.8.0.zip Finally, we can install pandas:: pip install pandas You should be able to use pandas on your EC2 nodes now. Try it out:: # download some test data wget https://raw.github.com/rhiever/ipython-notebook-workshop/master/parasite_data.csv # import pandas and read some data from pandas import * test_data = read_csv("parasite_data.csv") print test_data print test_data["Virulence"] pandas, scipy, and matplotlib =========== We will go over some tutorials on pandas, SciPy, and matplotlib. pandas & SciPy tutorial: http://www.randalolson.com/2012/08/06/statistical-analysis-made-easy-in-python/ pandas video tutorial: http://vimeo.com/59324550 matplotlib tutorial: http://matplotlib.org/users/pyplot_tutorial.html