- Get link
- X
- Other Apps
Data Storage in Azure
By Rajeev Sharma | May 31, 2021 5 minute read
Structured Data
In relational DB systems like Microsoft SQL Server, Azure SQL Database, & Azure SQL Data Warehouse, data structure is defined in the design time. Data structure is defined in the form of tables. This design is done before any information is loaded into the system. The data structure also includes the relational model, table structure, column width, & data types.
Relational systems react slowly to changes in the data requirements because the structural database needs to change every time a data requirement changes. When new columns are added, you might need to bulk-update all existing records to populate the new column throughout the table.
Relation systems typically use a querying language such as Transact-SQL (T-SQL).
Non-structured / Unstructured Data
Non-structured data includes binary, audio, & image files. Non-structured data is stored in nonrelational systems, commonly called unstructured or NoSQL systems.
In nonrelation systems, the data structure isn't defined at design time, & data is typically loaded in its raw format. The data structure is defined only when the data is read.
The difference in the definition point gives you flexibility to use the same source data for different outputs. Nonrelational systems can also support semistructured data such as JSON file formats.
There are 4 types of NoSQL database:
- Key-value store
- Document database
- Graph database
- Column database
Key-value store :- Stores key-value pairs of data in a table structure
Document database :- Stores documents that are tagged with metadata to aid document searches.
Graph database :- Finds relationship between data points by using a structure that's composed of vertices & edges.
Column database :- Stores data based on columns rather than rows. Columns can be defined at the query's runtime, allowing flexibility in the data that's returned performantly.
Understand Data Storage in Azure
Azure Storage account are the based storage type within Azure. It offers a very scalable object store for data objects & file system service in the cloud. It can also provide a messaging store for reliable messaging, or it can act as a NoSQL store.
By Rajeev Sharma | May 31, 2021 5 minute read
Azure Storage offers 4 configuration options:
- Azure Blob
- Azure Files
- Azure Queue
- Azure Table
Azure Blob :- A scalable object store for text & binary data.
Azure Files :- Managed file shares for cloud or on-premises deployments.
Azure Queue :- A messaging store for reliable messaging between applications.
Azure Table :- A NoSQL store for no-schema storage of structured data.
You can use Azure Storage as the storage basis when you're provisioning a data platform technology such as Azure Data Lake Storage & HDInsight. But you can also provision Azure Storage for standalone use.
When to use Azure Blog Storage
If you need to provision a data store that will store but not query data, your cheapest option is to set up a storage account as a Blob store. Blob storage work well with images & unstructured data, & it's the cheapest way to store data in Azure.
Key features
- Azure Storage accounts are scalable & secure, durable, & highly available. Azure handles your hardware maintenance, updates, & critical issues.
- It provides REST APIs & SDKs for Azure Storage in various languages.
- Supported languages includes .NET, Java, Node.js, Python, PHP, Ruby, & Go.
- Azure Storage also supports scripting in Azure PowerShell & the Azure CLI.
Data ingestion
- To ingest data into your system, use following:
- Azure Data Factory
- Storage Explorer
- the AzCopy tool
- PowerShell
- Visual Studion
- If you use the File Upload feature to import file sizes above 2 GB, use PowerShell or Visual Studio.
- AzCopy supports a maximum file size of 1 TB & automatically splits data files that exceed 200 GB.
Queries
- If you create a storage account as a Blob store, you can't query the data directly.
- To query it, either move the data to a store that supports queries or setup the Azure Storage account for a data lake storage account.
Data Security
- Azure Storage encrypts all data that's written to it.
- Azure Storage also provides you with fine-grained control over who has access you data.
- You'll secure the data by using keys or shared access signatures.
- Azure Resource Manager provides a permissions model that uses role-based access control (RBAC). Use this functionality to set permissions & assign roles to users, groups, or applications.
Data Storage in Azure Data Lake
- Azure Data Lake is a Hadoop-compatible data repository that can store any size or type of data. This storage service is available as Generation 1 (Gen1) or Generation 2 (Gen2).
- Data Lake Storage Gen1 users don't have to upgrade to Gen 2, but they forgo some benefits.
- Data Lake Storage Gen2 users take advantage of Azure Blob Storage, a hierarchical file system, & performance tuning that helps them process big-data analytics solutions.
- In Gen2, developers can access data through either the Blob API or the Data Lake file API.
- Gen2 can also act as a storage layer for a wide range of compute platforms, including:-
- Azure Databricks
- Hadoop
- Azure HDInsight
Where to use Data Lake Storage Gen2
- Data Lake Storage is designed to store massive amount of data for big-data analytics. For example, Contoso Life Sciences is a cancer research center that analyzes petabytes of genetic data, patient data, & records of related sample data.
- Data Lake Storage Gen2 reduces computation times, making the research faster & less expensive.
Key features of Data Lake Storage
- Unlimited scalability
- Hadoop compatibility
- Security support of both access control lists (ACLs)
- POSIX compliance
- An optimized Azure Blob File System (ABS) driver that's designed for big-data analytics
- Zone-redundant storage
- Geo-redundant storage
Comments
Good & informative
ReplyDelete