spark - pyspark reading from excel files

September 29, 2019

I guess a common mistake is to load the right jar file when loading excel file. Yes, you have to use version 2.11 and not 2.12, :)

You can try using the following command line

pyspark --packages com.crealytics:spark-excel_2.11:0.11.1

And use the following code to load an excel file in a data folder. If you have not created this folder, please create it and place an excel file in it.

from com.crealytics.spark.excel import *

## using spark-submit with option to execute script from command line

## spark-submit --packages spark-excel_2.11:0.11.1 excel_email_datapipeline.py

## pyspark --packages spark-excel_2.11:0.11.1

## pyspark --packages com.crealytics:spark-excel_2.11:0.11.1

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("excel-email-pipeline").getOrCreate()

df = spark.read.format("com.crealytics.spark.excel").option("useHeader", "true").option("inferSchema", "true").load("data/excel.xlsx")

df.show()

Comments

Unknown said…

For me it showing error plzz help

March 4, 2022 at 2:10 AM

Search This Blog

mitzen

spark - pyspark reading from excel files

Comments

Popular posts from this blog

Nextjs - How do you handle onclick which do something

The specified initialization vector (IV) does not match the block size for this algorithm

Azure function error : Missing value for AzureWebJobsStorage in local.settings.json