using bq to create a partition sales order table


Generate Test Sales Data

To quickly get my bigquery table setup with support for partitioning by sales_date, I used the following command

 bq query \ --use_legacy_sql=false \ --format=csv \ 'SELECT order_id, product_id, user_id, inventory_item_id, sale_price, status, DATE(created_at) AS sales_date FROM bigquery-public-data.thelook_ecommerce.order_items WHERE DATE(created_at) = "2025-09-23"' \ > sales-20250923.csv

There are some reference here if you like to know more about partition table 

Creating partition table 

And then I proceed to create my partition table 

bq mk \

--table \

--time_partitioning_field=sales_date \

my_dataset.sales_order_items \

order_id:INT64,product_id:INT64,user_id:INT64,inventory_item_id:INT64,sale_price:FLOAT,status:STRING,sales_date:DATE

Using SQL 


CREATE TABLE `secondary-471605.thelooker.sales_order_items`
(
  order_id INT64,
  product_id INT64,
  user_id INT64,
  inventory_item_id INT64,
  sale_price FLOAT64,
  status STRING,
  sales_date DATE
)
PARTITION BY sales_date
OPTIONS (
  description = "Partitioned sales order items table (daily partition by sales_date)"
);

Creating staging table 

Using bq util 

bq mk --table thelooker.staging_sales_order_items order_id:INT64,product_id:INT64,user_id:INT64,inventory_item_id:INT64,sale_price:FLOAT,status:STRING,sales_date:DATE

Using SQL

CREATE TABLE `secondary-471605.thelooker.staging_sales_order_items`
(
  order_id INT64,
  product_id INT64,
  user_id INT64,
  inventory_item_id INT64,
  sale_price FLOAT64,
  status STRING,
  sales_date DATE
)
OPTIONS (
  description = "Temporary staging table for daily sales data load"
);


Truncate just in case

bq query --use_legacy_sql=false \
'TRUNCATE TABLE `thelooker.staging_sales_order_items`'


Next load up data from my google cloud storage using the following command 

bq load --source_format=CSV --autodetect --replace thelooker.staging_sales_order_items gs://bucket/thelooker/sales-data/20250922/sales-20250922.csv


Then to move data from staging environment to production (Enterprise-safe swap).  
Atomically replaces the entire partition — deletes existing rows for that date and inserts the new ones from staging. No locks, no performance degradation for other partitions. 


MERGE `secondary-471605.thelooker.sales_order_items` T
USING `secondary-471605.thelooker.staging_sales_order_items` S
ON T.sales_date = S.sales_date AND T.order_id = S.order_id
WHEN MATCHED THEN
  UPDATE SET
    product_id = S.product_id,
    user_id = S.user_id,
    inventory_item_id = S.inventory_item_id,
    sale_price = S.sale_price,
    status = S.status
WHEN NOT MATCHED THEN
  INSERT (order_id, product_id, user_id, inventory_item_id, sale_price, status, sales_date)
  VALUES (order_id, product_id, user_id, inventory_item_id, sale_price, status, sales_date);


Or you can have something like: 

MERGE `secondary-471605.thelooker.sales_order_items` T
USING (
  SELECT * FROM `secondary-471605.thelooker.staging_sales_order_items`
  WHERE sales_date = '2024-06-01'
) S
ON FALSE  -- Never match → always go to WHEN NOT MATCHED
WHEN NOT MATCHED THEN
  INSERT ROW
WHEN NOT MATCHED BY SOURCE AND T.sales_date = '2024-06-01' THEN
  DELETE;

In Azure Synapse, you can do something like:

-- Azure Synapse style (conceptual)
ALTER TABLE sales_order_items SWITCH PARTITION $PARTITION.pf_date('2024-06-01')
WITH staging_sales_order_items;

This is a metadata-only, instantaneous, zero-copy operation — it just swaps pointers under the hood. Super fast, no data movement, no locks.






To remove the partition tabl    

To show table partion 

You can use the following to show partition used in your table.


SELECT
  partition_id,
  total_rows,
  last_modified_time
FROM `projectid.thelooker.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name = 'sales_order_items'
ORDER BY partition_id;




Comments

Popular posts from this blog

gemini cli getting file not defined error

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20

vllm : Failed to infer device type