spark - pyspark reading from excel files
I guess a common mistake is to load the right jar file when loading excel file. Yes, you have to use version 2.11 and not 2.12, :)
You can try using the following command line
pyspark --packages com.crealytics:spark-excel_2.11:0.11.1
And use the following code to load an excel file in a data folder. If you have not created this folder, please create it and place an excel file in it.
from com.crealytics.spark.excel import *
## using spark-submit with option to execute script from command line
## spark-submit --packages spark-excel_2.11:0.11.1 excel_email_datapipeline.py
## pyspark --packages spark-excel_2.11:0.11.1
## pyspark --packages com.crealytics:spark-excel_2.11:0.11.1
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("excel-email-pipeline").getOrCreate()
df = spark.read.format("com.crealytics.spark.excel").option("useHeader", "true").option("inferSchema", "true").load("data/excel.xlsx")
df.show()
Comments