Pyspark with spark 2.1.0 - Python cannot be version 3.6.

To get started using spark with pyton, you can

1. Install Anaconda Python - which has all the goodies you need.

2.Download Spark and unzip into a folder.

3. After you have all these setup, next you need to issue the following command (spark only supports python 3.5)

conda create -n py35 python=3.5 anaconda

activate py35

4. Goto your spark installation folder, goto "bin" and run "pyspark".

5. You probably going to get some exceptions but still should be able to run the following scripts :

from pyspark import SparkContext
sc = SparkContext.getOrCreate()
tf = sc.textFile("j:\\tmp\\data.txt")

Please make sure you have your "data.txt" pointed correctly.

This setup looks easier than it is. Spent a lot of time today trying to get it up and running.


Popular posts from this blog

Solving Sonarqube :- Project was never analyzed. A regular analysis is required before a branch analysis

spark - pyspark reading from excel files