Untitled
unknown
plain_text
a year ago
6.0 kB
2
Indexable
Azure Synapse Analytics is an integrated analytics service that brings together a wide range of commonly used technologies for processing and analyzing data at scale. One of the most prevalent technologies used in data solutions is SQL - an industry standard language for querying and manipulating data. we'll focus on serverless SQL pool, which provides a pay-per-query endpoint to query the data in your data lake. The benefits of using serverless SQL pool include: 1.A familiar Transact-SQL syntax to query data in place without the need to copy or load data into a specialized store. 2.Integrated connectivity from a wide range of business intelligence and ad-hoc querying tools, including the most popular drivers. 3.Distributed query processing that is built for large-scale data, and computational functions - resulting in fast query performance. 4.Built-in query execution fault-tolerance, resulting in high reliability and success rates even for long-running queries involving large data sets. 5.No infrastructure to setup or clusters to maintain. A built-in endpoint for this service is provided within every Azure Synapse workspace, so you can start querying data as soon as the workspace is created. 6.No charge for resources reserved, you're only charged for the data processed by queries you run. Use cases of Serverless Sql Pools Data exploration: Data exploration involves browsing the data lake to get initial insights about the data, and is easily achievable with Azure Synapse Studio. You can browse through the files in your linked data lake storage, and use the built-in serverless SQL pool to automatically generate a SQL script to select TOP 100 rows from a file or folder just as you would do with a table in SQL Server. From there, you can apply projections, filtering, grouping, and most of the operation over the data as if the data were in a regular SQL Server table. Data transformation: While Azure Synapse Analytics provides great data transformations capabilities with Synapse Spark, some data engineers might find data transformation easier to achieve using SQL. Serverless SQL pool enables you to perform SQL-based data transformations; either interactively or as part of an automated data pipeline. Logical data warehouse: After your initial exploration of the data in the data lake, you can define external objects such as tables and views in a serverless SQL database. The data remains stored in the data lake files, but are abstracted by a relational schema that can be used by client applications and analytical tools to query the data as they would in a relational database hosted in SQL Server. CSV files as we are already familiar with, consists of delimiter "," that separates values in a row.(Row Storage JSON Files are text-based data format, it's syntax is somewhat like a javascript object literal as key-value pairs but can be used independently from Javascript. Parquet File are hybrid storage layout to store data.It stores the metadata of the file as well.It uses this data to increase read speed. parquet files->Row groups->columns->data page(Raw data stored here) Query Files in data lake using serverless pool We use OPENROWSET function It is a T-SQL function used to reading data from many sources. It employs the BULK rowset provider to read data from a file. The FORMAT provider helps to specify the format of file being queired. External Database Objects are used to reference or create connection between data sources and hide the details such as credentials or security protocols. They also help in controlling access of stored data to other parties. We have External data Source - To encapsulate the connection of stored location to a table. External File format to specify the files being imported and define the file format. External table are used as views to reference the data stored in data lake. Transformation of Data In Sql Serverless Pool A simple way to use SQL to transform data in a file and persist the results in another file is to use a CREATE EXTERNAL TABLE AS SELECT (CETAS) statement. This statement creates a table based on the requests of a query, but the data for the table is stored as files in a data lake. The transformed data can then be queried through the external table, or accessed directly in the file system Lake Database In comparison In a data lake, there is no fixed schema. Data is stored in files, which may be structured, semi-structured, or unstructured. A lake database provides a relational metadata layer over one or more files in a data lake. You can create a lake database that includes definitions for tables, including column names and data types as well as relationships between primary and foreign key columns. The tables reference files in the data lake, enabling you to apply relational semantics to working with the data and querying it using SQL. However, the storage of the data files is decoupled from the database schema; enabling more flexibility than a relational database system typically offers. Creation of Lake Database You can create a lake database using the lake database designer in Azure Synapse Studio. Start by adding a new lake database on the Data page, selecting a template from the gallery or starting with a blank lake database; and then add and customize tables using the visual database designer interface. As you create each table, you can specify the type and location of the files you want to use to store the underlying data, or you can create a table from existing files that are already in the data lake. In most cases, it's advisable to store all of the database files in a consistent format within the same root folder in the data lake. Database designer The database designer interface in Azure Synapse Studio provides a drag-and-drop surface on which you can edit the tables in your database and the relationships between them.
Editor is loading...
Leave a Comment