SQL and Spark examples

Common syntax used in Azure Synapse

Azure Synapse supports multiple execution models. The examples below show common SQL and Spark patterns used for querying, loading, and transforming data.

Serverless SQL query over files

SELECT TOP 100 *
FROM OPENROWSET(
    BULK 'https://storageaccount.dfs.core.windows.net/raw/sales/*.parquet',
    FORMAT = 'PARQUET'
) AS sales;

Create an external table

CREATE EXTERNAL TABLE dbo.SalesExternal
WITH (
    LOCATION = 'curated/sales/',
    DATA_SOURCE = DataLakeSource,
    FILE_FORMAT = ParquetFormat
)
AS SELECT * FROM dbo.SalesStaging;

Dedicated SQL pool table

CREATE TABLE dbo.FactSales
(
    SaleId bigint NOT NULL,
    CustomerId int NOT NULL,
    SaleDate date NOT NULL,
    Amount decimal(18,2) NOT NULL
)
WITH
(
    DISTRIBUTION = HASH(CustomerId),
    CLUSTERED COLUMNSTORE INDEX
);

Spark notebook transformation

df = spark.read.parquet('/mnt/raw/sales')

clean = df.dropDuplicates(['SaleId']) \
          .filter('Amount > 0')

clean.write.mode('overwrite') \
     .parquet('/mnt/curated/sales')

Syntax note

Exact syntax can vary depending on whether you use serverless SQL pools, dedicated SQL pools, Spark notebooks, or pipeline activities. Always match the command to the compute engine and data format in your Synapse workspace.