spark - pyspark reading from excel files
I guess a common mistake is to load the right jar file when loading excel file. Yes, you have to use version 2.11 and not 2.12, :) You can try using the following command line pyspark --packages com.crealytics:spark-excel_2.11:0.11.1 And use the following code to load an excel file in a data folder. If you have not created this folder, please create it and place an excel file in it. from com.crealytics.spark.excel import * ## using spark-submit with option to execute script from command line ## spark-submit --packages spark-excel_2.11:0.11.1 excel_email_datapipeline.py ## pyspark --packages spark-excel_2.11:0.11.1 ## pyspark --packages com.crealytics:spark-excel_2.11:0.11.1 from pyspark.sql import SparkSession spark = SparkSession.builder.appName( "excel-email-pipeline" ).getOrCreate() df = spark.read.format( "com.crealytics.spark.excel" ).option( "useHeader" , "true" ).option( "inferSchema" , ...