gcp running serverless dataproc with a hello world python script
Pyspark Pi calculation To get thing started, we need a pyspark code - this is a simple pi.py andvthen upload this to your google bucket. import sys from random import random from operator import add from pyspark . sql import SparkSession if __name__ == " __main__ " : """ Usage: pi [partitions] """ spark = SparkSession \ . builder \ . appName ( " PythonPi " )\ . getOrCreate () partitions = int ( sys . argv [ 1 ]) if len ( sys . argv ) > 1 else 2 n = 100000 * partitions def f ( _ : int ) -> float : x = random () * 2 - 1 y = random () * 2 - 1 return 1 if x ** 2 + y ** 2 <= 1 else 0 count = spark . sparkContext . parallelize ( range ( 1 , n...