Updates from: 07/18/2022 01:07:17
Service Microsoft Docs article Related commit history on GitHub Change details
api-management Api Management Api Import Restrictions https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/api-management/api-management-api-import-restrictions.md
You can create [SOAP pass-through](import-soap-api.md) and [SOAP-to-REST](restif
* For an open-source tool to resolve and merge `wsdl:import`, `xsd:import`, and `xsd:include` dependencies in a WSDL file, see this [GitHub repo](https://github.com/Azure-Samples/api-management-schema-import).
+### WS-* specifications
+
+WSDL files incorporating WS-* specifications are not supported.
+ ### Messages with multiple parts This message type is not supported.
automation Private Link Security https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/automation/how-to/private-link-security.md
For more information, see [Key Benefits of Private Link](../../private-link/pri
## Limitations -- In the current implementation of Private Link, Automation account cloud jobs cannot access Azure resources that are secured using private endpoint. For example, Azure Key Vault, Azure SQL, Azure Storage account, etc. To workaround this, use a [Hybrid Runbook Worker](../automation-hybrid-runbook-worker.md) instead.
+- In the current implementation of Private Link, Automation account cloud jobs cannot access Azure resources that are secured using private endpoint. For example, Azure Key Vault, Azure SQL, Azure Storage account, etc. To workaround this, use a [Hybrid Runbook Worker](../automation-hybrid-runbook-worker.md) instead. Hence, on-premise VMs are supported to run Hybrid Runbook Workers against an Automation Account with Private Link enabled.
- You need to use the latest version of the [Log Analytics agent](../../azure-monitor/agents/log-analytics-agent.md) for Windows or Linux. - The [Log Analytics Gateway](../../azure-monitor/agents/gateway.md) does not support Private Link.
azure-monitor Metrics Supported https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/essentials/metrics-supported.md
This latest update adds a new column and reorders the metrics to be alphabetical
|Metric|Exportable via Diagnostic Settings?|Metric Display Name|Unit|Aggregation Type|Description|Dimensions| ||||||||
-|PEBytesIn|Yes|Bytes In|Count|Total|Total number of Bytes Out|No Dimensions|
+|PEBytesIn|Yes|Bytes In|Count|Total|Total number of Bytes In|No Dimensions|
|PEBytesOut|Yes|Bytes Out|Count|Total|Total number of Bytes Out|No Dimensions|
azure-monitor Dashboard Upgrade https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/logs/dashboard-upgrade.md
As dashboards may contain multiple visualizations from multiple queries, the tim
### Query data values - 25 values and other grouping
-Dashboards can be visually dense and complex. In order to reduce cognitive load when viewing a dashboard, we optimize the visualizations by limiting the display to 25 different data types. When there are more than 25, Log Analytics optimizes the data. It individually shows the 25 types with most data as separate and then groups the remaining values into an ΓÇ£otherΓÇ¥ value. The following chart shows such a case.
+Dashboards can be visually dense and complex. To reduce cognitive load when viewing a dashboard, we optimize the visualizations by limiting the display to 25 different data types. When there are more than 25, Log Analytics optimizes the data. It individually shows the 25 types with most data as separate and then groups the remaining values into an ΓÇ£otherΓÇ¥ value. The following chart shows such a case.
![Screenshot that shows a dashboard with 25 different data types.](media/dashboard-upgrade/values-25-limit.png)
+### Query results limit
+
+A query underlying a Log Analytics dashboard can return up to 2000 records.
+ ### Dashboard refresh on load Dashboards are refreshed upon load. All queries related to dashboard-pinned Log Analytics visualizations are executed and the dashboard is refreshed once it loads. If the dashboard page remains open, the data in the dashboard is refreshed every 60 minutes.
azure-monitor Profiler Azure Functions https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/profiler/profiler-azure-functions.md
Title: Profile Azure Functions app with Application Insights Profiler
description: Enable Application Insights Profiler for Azure Functions app. ms.contributor: charles.weininger Previously updated : 05/03/2022 Last updated : 07/15/2022
In this article, you'll use the Azure portal to:
:::image type="content" source="./media/profiler-azure-functions/choose-plan.png" alt-text="Screenshot of where to select App Service plan from drop-down in Functions app creation."::: - - Linked to [an Application Insights resource](../app/create-new-resource.md). Make note of the instrumentation key. ## App settings for enabling Profiler
The app settings now show up in the table:
:::image type="content" source="./media/profiler-azure-functions/app-settings-table.png" alt-text="Screenshot showing the two new app settings in the table on the configuration blade.":::
-## View the Profiler data for your Azure Functions app
-
-1. Under **Settings**, select **Application Insights (preview)** from the left menu.
-
- :::image type="content" source="./media/profiler-azure-functions/app-insights-menu.png" alt-text="Screenshot showing application insights from the left menu of the Functions app.":::
-
-1. Select **View Application Insights data**.
-
- :::image type="content" source="./media/profiler-azure-functions/view-app-insights-data.png" alt-text="Screenshot showing the button for viewing application insights data for the Functions app.":::
-
-1. On the App Insights page for your Functions app, select **Performance** from the left menu.
-
- :::image type="content" source="./media/profiler-azure-functions/performance-menu.png" alt-text="Screenshot showing the performance link in the left menu of the app insights blade of the functions app.":::
-
-1. Select **Profiler** from the top menu of the Performance blade.
-
- :::image type="content" source="./media/profiler-azure-functions/profiler-function-app.png" alt-text="Screenshot showing link to profiler for functions app.":::
+> [!NOTE]
+> You can also enable Profiler using:
+> - [Azure Resource Manager Templates](../app/azure-web-apps-net-core.md#app-service-application-settings-with-azure-resource-manager)
+> - [Azure PowerShell](/powershell/module/az.websites/set-azwebapp)
+> - [Azure CLI](/cli/azure/webapp/config/appsettings)
## Next Steps--- Set these values using [Azure Resource Manager Templates](../app/azure-web-apps-net-core.md#app-service-application-settings-with-azure-resource-manager), [Azure PowerShell](/powershell/module/az.websites/set-azwebapp), or the [Azure CLI](/cli/azure/webapp/config/appsettings).-- Learn more about [Profiler settings](profiler-settings.md).
+Learn how to...
+> [!div class="nextstepaction"]
+> [Generate load and view Profiler traces](./profiler-data.md)
azure-monitor Profiler Cloudservice https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/profiler/profiler-cloudservice.md
Title: Enable Profiler for Azure Cloud Services | Microsoft Docs
description: Profile live Azure Cloud Services with Application Insights Profiler. Previously updated : 05/25/2022 Last updated : 07/15/2022 # Enable Profiler for Azure Cloud Services
Add the following `SinksConfig` section as a child element of `WadCfg`:
Deploy your service with the new Diagnostics configuration. Application Insights Profiler is now configured to run on your Cloud Service.
-## Generate traffic to your service
-
-Now that your Azure Cloud Service is deployed with Profiler, you can generate traffic to view Profiler traces.
-
-Generate traffic to your application by setting up an [availability test](../app/monitor-web-app-availability.md). Wait 10 to 15 minutes for traces to be sent to the Application Insights instance.
-
-Navigate to your Azure Cloud Service's Application Insights resource. In the left side menu, select **Performance**.
--
-Select the **Profiler** for your Cloud Service.
--
-Select **Profile now** to start a profiling session. This process will take a few minutes.
--
-For more instructions on profiling sessions, see the [Profiler overview](./profiler-overview.md#start-a-profiler-on-demand-session).
-
-
## Next steps -- Learn more about [configuring Profiler](./profiler-settings.md).-- [Troubleshoot Profiler issues](./profiler-troubleshooting.md).
+Learn how to...
+> [!div class="nextstepaction"]
+> [Generate load and view Profiler traces](./profiler-data.md)
+
azure-monitor Profiler Containers https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/profiler/profiler-containers.md
Title: Profile Azure Containers with Application Insights Profiler
description: Enable Application Insights Profiler for Azure Containers. ms.contributor: charles.weininger Previously updated : 06/16/2022 Last updated : 07/15/2022
In this article, you'll learn the various ways you can:
dotnet new mvc -n EnableServiceProfilerForContainerApp ```
- Note that we've added delay in the `Controllers/WeatherForecastController.cs` project to simulate the bottleneck.
+ We've added delay in the `Controllers/WeatherForecastController.cs` project to simulate the bottleneck.
```CSharp [HttpGet(Name = "GetWeatherForecast")]
In this article, you'll learn the various ways you can:
1. Via your Application Insights resource in the Azure portal, take note of your Application Insights instrumentation key.
- :::image type="content" source="./media/profiler-containerinstances/application-insights-key.png" alt-text="Find instrumentation key in Azure portal":::
+ :::image type="content" source="./media/profiler-containerinstances/application-insights-key.png" alt-text="Screenshot of finding instrumentation key in Azure portal.":::
1. Open `appsettings.json` and add your Application Insights instrumentation key to this code section:
Service Profiler session finished. # A profiling session is complet
1. Wait for 2-5 minutes so the events can be aggregated to Application Insights. 1. Open the **Performance** blade in your Application Insights resource.
-1. Once the trace process is complete, you will see the Profiler Traces button like it below:
+1. Once the trace process is complete, you'll see the Profiler Traces button like it below:
- :::image type="content" source="./media/profiler-containerinstances/profiler-traces.png" alt-text="Profile traces in the performance blade":::
+ :::image type="content" source="./media/profiler-containerinstances/profiler-traces.png" alt-text="Screenshot of Profile traces in the performance blade.":::
docker rm -f testapp
``` ## Next Steps--- Learn more about [Application Insights Profiler](./profiler-overview.md).-- Learn how to enable Profiler in your [ASP.NET Core applications run on Linux](./profiler-aspnetcore-linux.md).
+Learn how to...
+> [!div class="nextstepaction"]
+> [Generate load and view Profiler traces](./profiler-data.md)
azure-monitor Profiler Data https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/profiler/profiler-data.md
+
+ Title: Generate load and view Application Insights Profiler data
+description: Generate load to your Azure service to view the Profiler data
+ms.contributor: charles.weininger
+ Last updated : 07/15/2022+++
+# View Application Insights Profiler data
+
+Let's say you're running a web performance test. You'll need traces to understand how your web app is running under load. In this article, you'll:
+
+> [!div class="checklist"]
+> - Generate traffic to your web app by starting a web performance test or starting a Profiler on-demand session.
+> - View the Profiler traces after your load test or Profiler session.
+> - Learn how to read the Profiler performance data and call stack.
+
+## Generate traffic to your Azure service
+
+For Profiler to upload traces, your service must be actively handling requests.
+
+If you've newly enabled Profiler, run a short [load test](/vsts/load-test/app-service-web-app-performance-test).
+
+If your Azure service already has incoming traffic or if you just want to manually generate traffic, skip the load test and start a **Profiler on-demand session**:
+
+1. From the Application Insights overview page for your Azure service, select **Performance** from the left menu.
+1. On the **Performance** pane, select **Profiler** from the top menu for Profiler settings.
+
+ :::image type="content" source="./media/profiler-overview/profiler-button-inline.png" alt-text="Screenshot of the Profiler button from the Performance blade." lightbox="media/profiler-settings/profiler-button.png":::
+
+1. Once the Profiler settings page loads, select **Profile Now**.
+
+ :::image type="content" source="./media/profiler-settings/configure-blade-inline.png" alt-text="Screenshot of Profiler page features and settings." lightbox="media/profiler-settings/configure-blade.png":::
+
+## View traces
+
+1. After the Profiler sessions finish running, return to the **Performance** pane.
+1. Under **Drill into...**, select **Profiler traces** to view the traces.
+
+ :::image type="content" source="./media/profiler-overview/trace-explorer-inline.png" alt-text="Screenshot of trace explorer page." lightbox="media/profiler-overview/trace-explorer.png":::
+
+The trace explorer displays the following information:
+
+| Filter | Description |
+| | -- |
+| Profile tree v. Flame graph | View the traces as either a tree or in graph form. |
+| Hot path | Select to open the biggest leaf node. In most cases, this node is near a performance bottleneck. |
+| Framework dependencies | Select to view each of the traced framework dependencies associated with the traces. |
+| Hide events | Type in strings to hide from the trace view. Select *Suggested events* for suggestions. |
+| Event | Event or function name. The tree displays a mix of code and events that occurred, such as SQL and HTTP events. The top event represents the overall request duration. |
+| Module | The module where the traced event or function occurred. |
+| Thread time | The time interval between the start of the operation and the end of the operation. |
+| Timeline | The time when the function or event was running in relation to other functions. |
+
+## How to read performance data
+
+Profiler uses a combination of sampling methods and instrumentation to analyze your application's performance. While performing detailed collection, the Profiler:
+
+- Samples the instruction pointer of each machine CPU every millisecond.
+ - Each sample captures the complete call stack of the thread, giving detailed information at both high and low levels of abstraction.
+- Collects events to track activity correlation and causality, including:
+ - Context switching events
+ - Task Parallel Library (TPL) events
+ - Thread pool events
+
+The call stack displayed in the timeline view is the result of the sampling and instrumentation. Because each sample captures the complete call stack of the thread, it includes code from Microsoft .NET Framework, and any other frameworks that you reference.
+
+### Object allocation (clr!JIT\_New or clr!JIT\_Newarr1)
+
+**clr!JIT\_New** and **clr!JIT\_Newarr1** are helper functions in .NET Framework that allocate memory from a managed heap.
+- **clr!JIT\_New** is invoked when an object is allocated.
+- **clr!JIT\_Newarr1** is invoked when an object array is allocated.
+
+These two functions usually work quickly. If **clr!JIT\_New** or **clr!JIT\_Newarr1** take up time in your timeline, the code might be allocating many objects and consuming significant amounts of memory.
+
+### Loading code (clr!ThePreStub)
+
+**clr!ThePreStub** is a helper function in .NET Framework that prepares the code for initial execution, which usually includes just-in-time (JIT) compilation. For each C# method, **clr!ThePreStub** should be invoked, at most, once during a process.
+
+If **clr!ThePreStub** takes extra time for a request, it's the first request to execute that method. The .NET Framework runtime takes a significant amount of time to load the first method. Consider:
+- Using a warmup process that executes that portion of the code before your users access it.
+- Running Native Image Generator (ngen.exe) on your assemblies.
+
+### Lock contention (clr!JITutil\_MonContention or clr!JITutil\_MonEnterWorker)
+
+**clr!JITutil\_MonContention** or **clr!JITutil\_MonEnterWorker** indicate that the current thread is waiting for a lock to be released. This text is often displayed when you:
+- Execute a C# **LOCK** statement,
+- Invoke the **Monitor.Enter** method, or
+- Invoke a method with the **MethodImplOptions.Synchronized** attribute.
+
+Lock contention usually occurs when thread _A_ acquires a lock and thread _B_ tries to acquire the same lock before thread _A_ releases it.
+
+### Loading code ([COLD])
+
+If the .NET Framework runtime is executing [unoptimized code](/cpp/build/profile-guided-optimizations) for the first time, the method name will contain **[COLD]**:
+
+`mscorlib.ni![COLD]System.Reflection.CustomAttribute.IsDefined`
+
+For each method, it should be displayed once during the process, at most.
+
+If loading code takes a substantial amount of time for a request, it's the request's initiate execute of the unoptimized portion of the method. Consider using a warmup process that executes that portion of the code before your users access it.
+
+### Send HTTP request
+
+Methods such as **HttpClient.Send** indicate that the code is waiting for an HTTP request to be completed.
+
+### Database operation
+
+Methods such as **SqlCommand.Execute** indicate that the code is waiting for a database operation to finish.
+
+### Waiting (AWAIT\_TIME)
+
+**AWAIT\_TIME** indicates that the code is waiting for another task to finish. This delay occurs with the C# **AWAIT** statement. When the code does a C# **AWAIT**:
+- The thread unwinds and returns control to the thread pool.
+- There's no blocked thread waiting for the **AWAIT** to finish.
+
+However, logically, the thread that did the **AWAIT** is "blocked", waiting for the operation to finish. The **AWAIT\_TIME** statement indicates the blocked time, waiting for the task to finish.
+
+### Blocked time
+
+**BLOCKED_TIME** indicates that the code is waiting for another resource to be available. For example, it might be waiting for:
+- A synchronization object
+- A thread to be available
+- A request to finish
+
+### Unmanaged Async
+
+In order for async calls to be tracked across threads, .NET Framework emits ETW events and passes activity IDs between threads. Since unmanaged (native) code and some older styles of asynchronous code lack these events and activity IDs, the Profiler can't track the thread and functions running on the thread. This item is labeled **Unmanaged Async** in the call stack. Download the ETW file to use [PerfView](https://github.com/Microsoft/perfview/blob/master/documentation/Downloading.md) for more insight.
+
+### CPU time
+
+The CPU is busy executing the instructions.
+
+### Disk time
+
+The application is performing disk operations.
+
+### Network time
+
+The application is performing network operations.
+
+### When column
+
+The **When** column is a visualization of the variety of _inclusive_ samples collected for a node over time. The total range of the request is divided into 32 time buckets, where the node's inclusive samples accumulate. Each bucket is represented as a bar. The height of the bar represents a scaled value. For the following nodes, the bar represents the consumption of one of the resources during the bucket:
+- Nodes marked **CPU_TIME** or **BLOCKED_TIME**.
+- Nodes with an obvious relationship to consuming a resource (for example, a CPU, disk, or thread).
+
+For these metrics, you can get a value of greater than 100% by consuming multiple resources. For example, if you use two CPUs during an interval on average, you get 200%.
+
+## Next steps
+Learn how to...
+> [!div class="nextstepaction"]
+> [Configure Profiler settings](./profiler-settings.md)
++
+[performance-blade]: ./media/profiler-overview/performance-blade-v2-examples.png
+[trace-explorer]: ./media/profiler-overview/trace-explorer.png
azure-monitor Profiler Overview https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/profiler/profiler-overview.md
Title: Profile production apps in Azure with Application Insights Profiler
description: Identify the hot path in your web server code with a low-footprint profiler ms.contributor: charles.weininger Previously updated : 05/26/2022 Last updated : 07/15/2022
-# Profile production applications in Azure with Application Insights
+# Profile production applications in Azure with Application Insights Profiler
-Azure Application Insights Profiler provides performance traces for applications running in production in Azure. Profiler:
-- Captures the data automatically at scale without negatively affecting your users.
+Diagnosing performance issues can prove difficult, especially when your application is running on production environment in the cloud. The cloud is dynamic, with machines coming and going, user input and other conditions constantly changing, and the potential for high scale. Slow responses in your application could be caused by infrastructure, framework, or application code handling the request in the pipeline.
+
+With Application Insights Profiler, you can capture and view performance traces for your application in all these dynamic situations, automatically at-scale, without negatively affecting your end users. The Profiler captures the following information so you can easily identify performance issues while your app is running in Azure:
+
+- The median, fastest, and slowest response times for each web request made by your customers.
- Helps you identify the ΓÇ£hotΓÇ¥ code path spending the most time handling a particular web request.
-## Enable Application Insights Profiler for your application
+Enable the Profiler on all of your Azure applications to catch issues early and prevent your customers from being widely impacted. When you enable the Profiler, it will gather data with these triggers:
+
+- **Sampling Trigger**: starts the Profiler randomly about once an hour for 2 minutes.
+- **CPU Trigger**: starts the Profiler when the CPU usage percentage is over 80%.
+- **Memory Trigger**: starts the Profiler when memory usage is above 80%.
-### Supported in Profiler
+Each of these triggers can be configured, enabled, or disabled on the [Configure Profiler page](./profiler-settings.md#trigger-settings).
+
+## Overhead and sampling algorithm
+
+Profiler randomly runs two minutes/hour on each virtual machine hosting the application with Profiler enabled for capturing traces. When Profiler is running, it adds from 5-15% CPU overhead to the server.
+
+## Supported in Profiler
Profiler works with .NET applications deployed on the following Azure services. View specific instructions for enabling Profiler for each service type in the links below.
Profiler works with .NET applications deployed on the following Azure services.
| [Azure Container Instances for Windows](profiler-containers.md) | No | Yes | No | | [Azure Container Instances for Linux](profiler-containers.md) | No | Yes | No | | Kubernetes | No | Yes | No |
-| Azure Functions | Yes | Yes | No |
+| [Azure Functions](./profiler-azure-functions.md) | Yes | Yes | No |
| Azure Spring Cloud | N/A | No | No | | [Azure Service Fabric](profiler-servicefabric.md) | Yes | Yes | No |
-If you've enabled Profiler but aren't seeing traces, check our [Troubleshooting guide](profiler-troubleshooting.md?toc=/azure/azure-monitor/toc.json).
-
-## How to generate load to view Profiler data
-
-For Profiler to upload traces, your application must be actively handling requests. You can trigger Profiler manually with a single click.
-
-Suppose you're running a web performance test. You'll need traces to help you understand how your web app is running under load. By controlling when traces are captured, you'll know when the load test will be running, while the random sampling interval might miss it.
-
-### Generate traffic to your web app by starting a web performance test
-
-If you've newly enabled Profiler, you can run a short [load test](/vsts/load-test/app-service-web-app-performance-test). If your web app already has incoming traffic or if you just want to manually generate traffic, skip the load test and start a Profiler on-demand session.
-
-### Start a Profiler on-demand session
-1. From the Application Insights overview page, select **Performance** from the left menu.
-1. On the **Performance** pane, select **Profiler** from the top menu for Profiler settings.
-
- :::image type="content" source="./media/profiler-overview/profiler-button-inline.png" alt-text="Screenshot of the Profiler button from the Performance blade" lightbox="media/profiler-settings/profiler-button.png":::
-
-1. Once the Profiler settings page loads, select **Profile Now**.
-
- :::image type="content" source="./media/profiler-settings/configure-blade-inline.png" alt-text="Profiler page features and settings" lightbox="media/profiler-settings/configure-blade.png":::
-
-### View traces
-1. After the Profiler sessions finish running, return to the **Performance** pane.
-1. Under **Drill into...**, select **Profiler traces** to view the traces.
-
- :::image type="content" source="./media/profiler-overview/trace-explorer-inline.png" alt-text="Screenshot of trace explorer page" lightbox="media/profiler-overview/trace-explorer.png":::
-
-The trace explorer displays the following information:
-
-| Filter | Description |
-| | -- |
-| Profile tree v. Flame graph | View the traces as either a tree or in graph form. |
-| Hot path | Select to open the biggest leaf node. In most cases, this node is near a performance bottleneck. |
-| Framework dependencies | Select to view each of the traced framework dependencies associated with the traces. |
-| Hide events | Type in strings to hide from the trace view. Select *Suggested events* for suggestions. |
-| Event | Event or function name. The tree displays a mix of code and events that occurred, such as SQL and HTTP events. The top event represents the overall request duration. |
-| Module | The module where the traced event or function occurred. |
-| Thread time | The time interval between the start of the operation and the end of the operation. |
-| Timeline | The time when the function or event was running in relation to other functions. |
-
-## How to read performance data
-
-The Microsoft service profiler uses a combination of sampling methods and instrumentation to analyze the performance of your application. When detailed collection is in progress, the service profiler samples the instruction pointer of each machine CPU every millisecond. Each sample captures the complete call stack of the thread that's currently executing. It gives detailed information about what that thread was doing, at both a high level and a low level of abstraction. The service profiler also collects other events to track activity correlation and causality, including context switching events, Task Parallel Library (TPL) events, and thread pool events.
-
-The call stack displayed in the timeline view is the result of the sampling and instrumentation. Because each sample captures the complete call stack of the thread, it includes code from Microsoft .NET Framework and other frameworks that you reference.
-
-### <a id="jitnewobj"></a>Object allocation (clr!JIT\_New or clr!JIT\_Newarr1)
-
-**clr!JIT\_New** and **clr!JIT\_Newarr1** are helper functions in .NET Framework that allocate memory from a managed heap.
-- **clr!JIT\_New** is invoked when an object is allocated. -- **clr!JIT\_Newarr1** is invoked when an object array is allocated. -
-These two functions usually work quickly. If **clr!JIT\_New** or **clr!JIT\_Newarr1** take up time in your timeline, the code might be allocating many objects and consuming significant amounts of memory.
-
-### <a id="theprestub"></a>Loading code (clr!ThePreStub)
-
-**clr!ThePreStub** is a helper function in .NET Framework that prepares the code for initial execution, which usually includes just-in-time (JIT) compilation. For each C# method, **clr!ThePreStub** should be invoked, at most, once during a process.
-
-If **clr!ThePreStub** takes extra time for a request, it's the first request to execute that method. The .NET Framework runtime takes a significant amount of time to load the first method. Consider:
-- Using a warmup process that executes that portion of the code before your users access it.-- Running Native Image Generator (ngen.exe) on your assemblies.-
-### <a id="lockcontention"></a>Lock contention (clr!JITutil\_MonContention or clr!JITutil\_MonEnterWorker)
-
-**clr!JITutil\_MonContention** or **clr!JITutil\_MonEnterWorker** indicate that the current thread is waiting for a lock to be released. This text is often displayed when you:
-- Execute a C# **LOCK** statement,-- Invoke the **Monitor.Enter** method, or-- Invoke a method with the **MethodImplOptions.Synchronized** attribute. -
-Lock contention usually occurs when thread _A_ acquires a lock and thread _B_ tries to acquire the same lock before thread _A_ releases it.
-
-### <a id="ngencold"></a>Loading code ([COLD])
-
-If the .NET Framework runtime is executing [unoptimized code](/cpp/build/profile-guided-optimizations) for the first time, the method name will contain **[COLD]**:
-
-`mscorlib.ni![COLD]System.Reflection.CustomAttribute.IsDefined`
-
-For each method, it should be displayed once during the process, at most.
-
-If loading code takes a substantial amount of time for a request, it's the request's initiate execute of the unoptimized portion of the method. Consider using a warmup process that executes that portion of the code before your users access it.
-
-### <a id="httpclientsend"></a>Send HTTP request
-
-Methods such as **HttpClient.Send** indicate that the code is waiting for an HTTP request to be completed.
-
-### <a id="sqlcommand"></a>Database operation
-
-Methods such as **SqlCommand.Execute** indicate that the code is waiting for a database operation to finish.
-
-### <a id="await"></a>Waiting (AWAIT\_TIME)
-
-**AWAIT\_TIME** indicates that the code is waiting for another task to finish. This delay occurs with the C# **AWAIT** statement. When the code does a C# **AWAIT**:
-- The thread unwinds and returns control to the thread pool.-- There's no blocked thread waiting for the **AWAIT** to finish. -
-However, logically, the thread that did the **AWAIT** is "blocked", waiting for the operation to finish. The **AWAIT\_TIME** statement indicates the blocked time, waiting for the task to finish.
-
-### <a id="block"></a>Blocked time
-
-**BLOCKED_TIME** indicates that the code is waiting for another resource to be available. For example, it might be waiting for:
-- A synchronization object-- A thread to be available-- A request to finish-
-### Unmanaged Async
-
-In order for async calls to be tracked across threads, .NET Framework emits ETW events and passes activity ids between threads. Since unmanaged (native) code and some older styles of asynchronous code lack these events and activity ids, the Profiler can't track the thread and functions running on the thread. This is labeled **Unmanaged Async** in the call stack. Download the ETW file to use [PerfView](https://github.com/Microsoft/perfview/blob/master/documentation/Downloading.md) for more insight.
-
-### <a id="cpu"></a>CPU time
-
-The CPU is busy executing the instructions.
-
-### <a id="disk"></a>Disk time
-
-The application is performing disk operations.
-
-### <a id="network"></a>Network time
-
-The application is performing network operations.
-
-### <a id="when"></a>When column
-
-The **When** column is a visualization of the variety of _inclusive_ samples collected for a node over time. The total range of the request is divided into 32 time buckets, where the node's inclusive samples accumulate. Each bucket is represented as a bar. The height of the bar represents a scaled value. For the following nodes, the bar represents the consumption of one of the resources during the bucket:
-- Nodes marked **CPU_TIME** or **BLOCKED_TIME**.-- Nodes with an obvious relationship to consuming a resource (for example, a CPU, disk, or thread). -
-For these metrics, you can get a value of greater than 100% by consuming multiple resources. For example, if you use two CPUs during an interval on average, you get 200%.
+If you've enabled Profiler but aren't seeing traces, check our [Troubleshooting guide](profiler-troubleshooting.md).
## Limitations
-The default data retention period is five days.
-
-There are no charges for using the Profiler service. To use it, your web app must be hosted in the basic tier of the Web Apps feature of Azure App Service, at minimum.
-
-## Overhead and sampling algorithm
-
-Profiler randomly runs two minutes/hour on each virtual machine hosting the application with Profiler enabled for capturing traces. When Profiler is running, it adds from 5-15% CPU overhead to the server.
+- **Data retention**: The default data retention period is five days.
+- **Profiling web apps**: While you can use the Profiler at no extra cost, your web app must be hosted in the basic tier of the Web Apps feature of Azure App Service, at minimum.
## Next steps
-Enable Application Insights Profiler for your Azure application. Also see:
-* [App Services](profiler.md?toc=/azure/azure-monitor/toc.json)
-* [Azure Cloud Services](profiler-cloudservice.md?toc=/azure/azure-monitor/toc.json)
-* [Azure Service Fabric](profiler-servicefabric.md?toc=/azure/azure-monitor/toc.json)
-* [Azure Virtual Machines and virtual machine scale sets](profiler-vm.md?toc=/azure/azure-monitor/toc.json)
--
-[performance-blade]: ./media/profiler-overview/performance-blade-v2-examples.png
-[trace-explorer]: ./media/profiler-overview/trace-explorer.png
+Learn how to enable Profiler on your Azure service:
+- [Azure App Service](./profiler.md)
+- [Azure Functions app](./profiler-azure-functions.md)
+- [Cloud Service](./profiler-cloudservice.md)
+- [Service Fabric app](./profiler-servicefabric.md)
+- [Azure Virtual Machine](./profiler-vm.md)
+- [ASP.NET Core application hosted in Linux on Azure App Service](./profiler-aspnetcore-linux.md)
+- [ASP.NET Core application running in containers](./profiler-containers.md)
azure-monitor Profiler Servicefabric https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/profiler/profiler-servicefabric.md
Title: Enable Profiler for Azure Service Fabric applications
description: Profile live Azure Service Fabric apps with Application Insights Previously updated : 06/23/2022 Last updated : 07/15/2022 # Enable Profiler for Azure Service Fabric applications
Redeploy your application once you've enabled Application Insights.
## Next steps
-For help with troubleshooting Profiler issues, see [Profiler troubleshooting](./profiler-troubleshooting.md).
+Learn how to...
+> [!div class="nextstepaction"]
+> [Generate load and view Profiler traces](./profiler-data.md)
+ [!INCLUDE [azure-monitor-log-analytics-rebrand](../../../includes/azure-monitor-instrumentation-key-deprecation.md)]
azure-monitor Profiler https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-monitor/profiler/profiler.md
Title: Enable Profiler for Azure App Service apps | Microsoft Docs description: Profile live apps on Azure App Service with Application Insights Profiler. Previously updated : 05/11/2022 Last updated : 07/15/2022
To enable Profiler on Linux, walk through the [ASP.NET Core Azure Linux web apps
## Enable Application Insights and Profiler
+### For Application Insights and App Service in the same subscription
+
+If your Application Insights resource is in the same subscription as your App Service:
+ 1. Under **Settings** in the left side menu, select **Application Insights**. :::image type="content" source="./media/profiler/app-insights-menu.png" alt-text="Screenshot of selecting Application Insights from the left side menu.":::
To enable Profiler on Linux, walk through the [ASP.NET Core Azure Linux web apps
:::image type="content" source="./media/profiler/enable-profiler.png" alt-text="Screenshot of enabling Profiler on your app.":::
-## Enable Profiler using app settings
+### For Application Insights and App Service in different subscriptions
-If your Application Insights resource is in a different subscription from your App Service, you'll need to enable Profiler manually by creating app settings for your Azure App Service. You can automate the creation of these settings using a template or other means. The settings needed to enable the profiler:
+If your Application Insights resource is in a different subscription from your App Service, you'll need to enable Profiler manually by creating app settings for your Azure App Service. You can automate the creation of these settings using a template or other means. The settings needed to enable the Profiler:
|App Setting | Value | ||-|
Set these values using:
- [Azure PowerShell](/powershell/module/az.websites/set-azwebapp) - [Azure CLI](/cli/azure/webapp/config/appsettings)
-## Enable Profiler for other clouds
+## Enable Profiler for regional clouds
Currently the only regions that require endpoint modifications are [Azure Government](../../azure-government/compare-azure-government-global-azure.md#application-insights) and [Azure China](/azure/china/resources-developer-guide).
Currently the only regions that require endpoint modifications are [Azure Govern
Application Insights Profiler supports Azure AD authentication for profiles ingestion. For all profiles of your application to be ingested, your application must be authenticated and provide the required application settings to the Profiler agent.
-Profiler only supports Azure AD authentication when you reference and configure Azure AD using the Application Insights SDK in your application.
+Profiler only supports Azure AD authentication when you reference and configure Azure AD using the [Application Insights SDK](../app/asp-net-core.md#configure-the-application-insights-sdk) in your application.
To enable Azure AD for profiles ingestion:
To enable Azure AD for profiles ingestion:
For System-Assigned Identity:
- |App Setting | Value |
- ||-|
- |APPLICATIONINSIGHTS_AUTHENTICATION_STRING | Authorization=AAD |
+ | App Setting | Value |
+ | -- | |
+ | APPLICATIONINSIGHTS_AUTHENTICATION_STRING | `Authorization=AAD` |
For User-Assigned Identity:
- |App Setting | Value |
- ||-|
- |APPLICATIONINSIGHTS_AUTHENTICATION_STRING | Authorization=AAD;ClientId={Client id of the User-Assigned Identity} |
+ | App Setting | Value |
+ | - | -- |
+ | APPLICATIONINSIGHTS_AUTHENTICATION_STRING | `Authorization=AAD;ClientId={Client id of the User-Assigned Identity}` |
## Disable Profiler
We recommend that you have Profiler enabled on all your apps to discover any per
Profiler's files can be deleted when using WebDeploy to deploy changes to your web application. You can prevent the deletion by excluding the App_Data folder from being deleted during deployment. ## Next steps-
-* [Working with Application Insights in Visual Studio](../app/visual-studio.md)
+Learn how to...
+> [!div class="nextstepaction"]
+> [Generate load and view Profiler traces](./profiler-data.md)
azure-portal Azure Portal Safelist Urls https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-portal/azure-portal-safelist-urls.md
aka.ms (Microsoft short URL)
ad.azure.com (Azure AD) api.aadrm.com (Azure AD) api.loganalytics.io (Log Analytics Service)
-applicationinsights.azure.com (Application Insights Service)
+*.applicationinsights.azure.com (Application Insights Service)
appservice.azure.com (Azure App Services) asazure.windows.net (Analysis Services) bastion.azure.com (Azure Bastion Service)
login.live.com
> [!NOTE]
-> Traffic to these endpoints uses standard TCP ports for HTTP (80) and HTTPS (443).
+> Traffic to these endpoints uses standard TCP ports for HTTP (80) and HTTPS (443).
azure-vmware Deploy Disaster Recovery Using Jetstream https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/azure-vmware/deploy-disaster-recovery-using-jetstream.md
Title: Deploy disaster recovery using JetStream DR
description: Learn how to implement JetStream DR for your Azure VMware Solution private cloud and on-premises VMware workloads. Previously updated : 04/11/2022 Last updated : 07/15/2022
For full details, refer to the article: [Disaster Recovery with Azure NetApp Fil
For more on-premises JetStream DR prerequisites, see the [JetStream Pre-Installation Guide](https://www.jetstreamsoft.com/portal/jetstream-knowledge-base/pre-installation-guidelines/). -- ## Install JetStream DR on Azure VMware Solution You can follow these steps for both supported scenarios.
cloud-shell Quickstart Powershell https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cloud-shell/quickstart-powershell.md
Run regular PowerShell commands in the Cloud Shell, such as:
```azurepowershell-interactive PS Azure:\> Get-Date
+```
-# Expected Output
+```output
+# You will see output similar to the following:
Friday, July 27, 2018 7:08:48 AM
+```
+```azurepowershell-interactive
PS Azure:\> Get-AzVM -Status
+```
-# Expected Output
+```output
+# You will see output similar to the following:
ResourceGroupName Name Location VmSize OsType ProvisioningState PowerState -- - -- -- - MyResourceGroup2 Demo westus Standard_DS1_v2 Windows Succeeded running
You can find all your virtual machines under the current subscription via `Virtu
```azurepowershell-interactive PS Azure:\MySubscriptionName\VirtualMachines> dir
+```
+```output
+# You will see output similar to the following:
Directory: Azure:\MySubscriptionName\VirtualMachines
TestVm10 MyResourceGroup2 eastus Standard_DS1_v2 Windows mytest
```azurepowershell-interactive PS Azure:\> cd MySubscriptionName\ResourceGroups\MyResourceGroup\Microsoft.Compute\virtualMachines PS Azure:\MySubscriptionName\ResourceGroups\MyResourceGroup\Microsoft.Compute\virtualMachines> Get-Item MyVM1 | Invoke-AzVMCommand -Scriptblock {Get-ComputerInfo} -Credential (Get-Credential)
+ ```
+ ```output
# You will see output similar to the following: PSComputerName : 65.52.28.207
By entering into the `WebApps` directory, you can easily navigate your web apps
```azurepowershell-interactive PS Azure:\MySubscriptionName> dir .\WebApps\
+```
+```output
+# You will see output similar to the following:
Directory: Azure:\MySubscriptionName\WebApps Name State ResourceGroup EnabledHostNames Location
Name State ResourceGroup EnabledHostNames Lo
mywebapp1 Stopped MyResourceGroup1 {mywebapp1.azurewebsites.net... West US mywebapp2 Running MyResourceGroup2 {mywebapp2.azurewebsites.net... West Europe mywebapp3 Running MyResourceGroup3 {mywebapp3.azurewebsites.net... South Central US
+```
+```azurepowershell-interactive
# You can use Azure cmdlets to Start/Stop your web apps PS Azure:\MySubscriptionName\WebApps> Start-AzWebApp -Name mywebapp1 -ResourceGroupName MyResourceGroup1
+```
+```output
+# You will see output similar to the following:
Name State ResourceGroup EnabledHostNames Location - -- - - -- mywebapp1 Running MyResourceGroup1 {mywebapp1.azurewebsites.net ... West US
+```
+```azurepowershell-interactive
# Refresh the current state with -Force PS Azure:\MySubscriptionName\WebApps> dir -Force
+```
+```output
+# You will see output similar to the following:
Directory: Azure:\MySubscriptionName\WebApps Name State ResourceGroup EnabledHostNames Location
confidential-ledger Authentication Azure Ad https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-ledger/authentication-azure-ad.md
+
+ Title: Azure Active Directory authentication with Azure confidential ledger
+description: Azure Active Directory authentication with Azure confidential ledger
++++ Last updated : 07/12/2022+++
+# Azure confidential ledger authentication with Azure Active Directory (Azure AD)
+
+The recommended way to access Azure confidential ledger is by authenticating to the **Azure Active Directory (Azure AD)** service; doing so guarantees that Azure confidential ledger never gets the accessing principal's directory credentials.
+
+To do so, the client performs a two-steps process:
+
+1. In the first step, the client:
+ 1. Communicates with the Azure AD service.
+ 1. Authenticates to the Azure AD service.
+ 1. Requests an access token issued specifically for Azure confidential ledger.
+1. In the second step, the client issues requests to Azure confidential ledger, providing the access token acquired in the first step as a proof of identity to Azure confidential ledger.
+
+Azure confidential ledger then executes the request on behalf of the security principal for which Azure AD issued the access token. All authorization checks are performed using this identity.
+
+In most cases, the recommendation is to use one of Azure confidential ledger SDKs to access the service programmatically, as they remove much of the hassle of implementing the
+flow above (and much more). See, for example, the [Python client library](https://pypi.org/project/azure-confidentialledger/) and [.NET client library](/dotnet/api/overview/azure/storage.confidentialledger-readme-pre).
+
+The main authenticating scenarios are:
+
+- **A client application authenticating a signed-in user**: In this scenario, an interactive (client) application triggers an Azure AD prompt to the user for credentials (such as username and password). See [user authentication](#user-authentication).
+
+- **A "headless" application**: In this scenario, an application is running with no user present to provide credentials. Instead the application authenticates as "itself" to Azure AD using some credentials it has been configured with. See [application authentication](#application-authentication).
+
+- **On-behalf-of authentication**. In this scenario, sometimes called the "web service" or "web app" scenario, the application gets an Azure AD access token from another application, and then "converts" it to another Azure AD access token that can be used with Azure confidential ledger. In other words, the application acts as a mediator between the user or application that provided credentials and the Azure confidential ledger service. See [on-behalf-of authentication](#on-behalf-of-authentication).
+
+## Azure AD parameters
+
+### Azure AD resource for Azure confidential ledger
+
+When acquiring an access token from Azure AD, the client must indicate which *Azure AD resource* the token should be issued to. The Azure AD resource of an Azure confidential ledger endpoint is the URI of the endpoint, barring the port information and the path.
+
+For example, if you had an Azure confidential ledger called "myACL", the URI would be:
+
+```txt
+https://myACL.confidential-ledger.azure.com
+```
+
+### Azure AD tenant ID
+
+Azure AD is a multi-tenant service, and every organization can create an object called **directory** in Azure AD. The directory object holds security-related objects such as user accounts, applications, and groups. Azure AD often refers to the directory as a **tenant**. Azure AD tenants are identified by a GUID (**tenant ID**). In many cases, Azure AD tenants can also be identified by the domain name of the organization.
+
+For example, an organization called "Contoso" might have the tenant ID `4da81d62-e0a8-4899-adad-4349ca6bfe24` and the domain name `contoso.com`.
+
+### Azure AD authority endpoint
+
+Azure AD has many endpoints for authentication:
+
+- When the tenant hosting the principal being authenticated is known (in other words, when one knows which Azure AD directory the user or application are in), the Azure AD endpoint is `https://login.microsoftonline.com/{tenantId}`. Here, `{tenantId}` is either the organization's tenant ID in Azure AD, or its domain name (for example, `contoso.com`).
+- When the tenant hosting the principal being authenticated isn't known, the "common" endpoint can be used by replacing the `{tenantId}` above with the value `common`.
+
+The Azure AD service endpoint used for authentication is also called *Azure AD authority URL* or simply **Azure AD authority**.
+
+> [!NOTE]
+> The Azure AD service endpoint changes in national clouds. When working with an Azure confidential ledger service deployed in a national cloud, please set the corresponding national cloud Azure AD service endpoint. To change the endpoint, set an environment variable `AadAuthorityUri` to the required URI.
+
+## User authentication
+
+The easiest way to access Azure confidential ledger with user authentication is to use the Azure confidential ledger SDK and set the `Federated Authentication` property of the Azure confidential ledger connection string to `true`. The first time the SDK is used to send a request to the service the user will be presented with a sign-in form to enter the Azure AD credentials. Following a successful authentication the request will be sent to Azure confidential ledger.
+
+Applications that don't use the Azure confidential ledger SDK can still use the [Microsoft Authentication Library (MSAL)](../active-directory/develop/msal-overview.md) instead of implementing the Azure AD service security protocol client. See [Enable your Web Apps to sign-in users and call APIs with the Microsoft identity platform for developers](https://github.com/Azure-Samples/active-directory-aspnetcore-webapp-openidconnect-v2).
+
+If your application is intended to serve as front-end and authenticate users for an Azure confidential ledger cluster, the application must be granted delegated permissions on Azure confidential ledger.
+
+## Application authentication
+
+Applications that use Azure confidential ledger authenticate by using a token from Azure Active Directory. The owner of the application must first register it in Azure Active Directory. Registration also creates a second application object that identifies the app across all tenants.
+
+For detailed steps on registering an Azure confidential ledger application with Azure Active Directory, review these articles:
+
+- [How to register an Azure confidential ledger application with Azure AD](register-application.md)
+- [Use portal to create an Azure AD application and service principal that can access resources](../active-directory/develop/howto-create-service-principal-portal.md)
+- [Create an Azure service principal with the Azure CLI](/cli/azure/create-an-azure-service-principal-azure-cli).
+
+At the end of registration, the application owner gets the following values:
+
+- An **Application ID** (also known as the AAD Client ID or appID)
+- An **authentication key** (also known as the shared secret).
+
+The application must present both these values to Azure Active Directory to get a token.
+
+The Azure confidential ledger SDKs use Azure Identity client library, which allows seamless authentication to Azure confidential ledger across environments with same code.
+
+| .NET | Python | Java | JavaScript |
+|--|--|--|--|
+|[Azure Identity SDK .NET](/dotnet/api/overview/azure/identity-readme)|[Azure Identity SDK Python](/python/api/overview/azure/identity-readme)|[Azure Identity SDK Java](/java/api/overview/azure/identity-readme)|[Azure Identity SDK JavaScript](/javascript/api/overview/azure/identity-readme)|
+
+## On-behalf-of authentication
+
+In this scenario, an application was sent an Azure AD access token for some arbitrary resource managed by the application, and it uses that token to acquire a new Azure AD access token for the Azure confidential ledger resource so that the application could access the confidential ledger on behalf of the principal indicated by the original Azure AD access token.
+
+This flow is called the[OAuth2 token exchange flow](https://tools.ietf.org/html/draft-ietf-oauth-token-exchange-04). It generally requires multiple configuration steps with Azure AD, and in some cases(depending on the Azure AD tenant configuration) might require special consent from the administrator of the Azure AD tenant.
+
+## Next steps
+
+- [How to register an Azure confidential ledger application with Azure AD](register-application.md)
+- [Overview of Microsoft Azure confidential ledger](overview.md)
+- [Integrating applications with Azure Active Directory](../active-directory/develop/quickstart-register-app.md)
+- [Use portal to create an Azure AD application and service principal that can access resources](../active-directory/develop/howto-create-service-principal-portal.md)
+- [Create an Azure service principal with the Azure CLI](/cli/azure/create-an-azure-service-principal-azure-cli).
+- [Authenticating Azure confidential ledger nodes](authenticate-ledger-nodes.md)
confidential-ledger Register Application https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/confidential-ledger/register-application.md
+
+ Title: How to register an Azure confidential ledger application with Azure AD
+description: In this how to, you learn how to register an Azure confidential ledger application with Azure AD
++++ Last updated : 07/15/2022+
+#Customer intent: As developer, I want to know how to register my Azure confidential ledger application with the Microsoft identity platform so that the security token service can issue ID and/or access tokens to client applications that request them.
++
+# How to register an Azure confidential ledger application with Azure AD
+
+In this article you'll learn how to integrate your Azure confidential ledger application with Azure AD, by registering it with the Microsoft identity platform.
+
+The Microsoft identity platform performs identity and access management (IAM) only for registered applications. Whether it's a client application like a web or mobile app, or it's a web API that backs a client app, registering it establishes a trust relationship between your application and the identity provider, the Microsoft identity platform. [Learn more about the Microsoft identity platform](../active-directory/develop/v2-overview.md).
+
+## Prerequisites
+
+- An Azure account with an active subscription and permission to manage applications in Azure Active Directory (Azure AD). [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
+- An Azure AD tenant. [Learn how to set up a tenant](../active-directory/develop/quickstart-create-new-tenant.md).
+- An application that calls Azure confidential ledger.
+
+## Register an application
+
+Registering your Azure confidential ledger application establishes a trust relationship between your app and the Microsoft identity platform. The trust is unidirectional: your app trusts the Microsoft identity platform, and not the other way around.
+
+Follow these steps to create the app registration:
+
+1. Sign in to the <a href="https://portal.azure.com/" target="_blank">Azure portal</a>.
+1. If you have access to multiple tenants, use the **Directories + subscriptions** filter :::image type="icon" source="../active-directory/develop/media/common/portal-directory-subscription-filter.png" border="false"::: in the top menu to switch to the tenant in which you want to register the application.
+1. Search for and select **Azure Active Directory**.
+1. Under **Manage**, select **App registrations** > **New registration**.
+1. Enter a display **Name** for your application. Users of your application might see the display name when they use the app, for example during sign-in.
+ You can change the display name at any time and multiple app registrations can share the same name. The app registration's automatically generated Application (client) ID, not its display name, uniquely identifies your app within the identity platform.
+1. Specify who can use the application, sometimes called its _sign-in audience_.
+
+ | Supported account types | Description |
+ | - | - |
+ | **Accounts in this organizational directory only** | Select this option if you're building an application for use only by users (or guests) in _your_ tenant.<br><br>Often called a _line-of-business_ (LOB) application, this app is a _single-tenant_ application in the Microsoft identity platform. |
+ | **Accounts in any organizational directory** | Select this option if you want users in _any_ Azure Active Directory (Azure AD) tenant to be able to use your application. This option is appropriate if, for example, you're building a software-as-a-service (SaaS) application that you intend to provide to multiple organizations.<br><br>This type of app is known as a _multitenant_ application in the Microsoft identity platform. |
+ | **Accounts in any organizational directory and personal Microsoft accounts** | Select this option to target the widest set of customers.<br><br>By selecting this option, you're registering a _multitenant_ application that can also support users who have personal _Microsoft accounts_. |
+ | **Personal Microsoft accounts** | Select this option if you're building an application only for users who have personal Microsoft accounts. Personal Microsoft accounts include Skype, Xbox, Live, and Hotmail accounts. |
+1. Don't enter anything for **Redirect URI (optional)**. You'll configure a redirect URI in the next section.
+1. Select **Register** to complete the initial app registration.
+
+ :::image type="content" source="../active-directory/develop/media/quickstart-register-app/portal-02-app-reg-01.png" alt-text="Screenshot of the Azure portal in a web browser, showing the Register an application pane.":::
+
+When registration finishes, the Azure portal displays the app registration's **Overview** pane. You see the **Application (client) ID**. Also called the _client ID_, this value uniquely identifies your application in the Microsoft identity platform.
+
+> [!IMPORTANT]
+> New app registrations are hidden to users by default. When you are ready for users to see the app on their [My Apps page](https://support.microsoft.com/account-billing/sign-in-and-start-apps-from-the-my-apps-portal-2f3b1bae-0e5a-4a86-a33e-876fbd2a4510) you can enable it. To enable the app, in the Azure portal navigate to **Azure Active Directory** > **Enterprise applications** and select the app. Then on the **Properties** page toggle **Visible to users?** to Yes.
+
+Your application's code, or more typically an authentication library used in your application, also uses the client ID. The ID is used as part of validating the security tokens it receives from the identity platform.
++
+## Add a redirect URI
+
+A _redirect URI_ is the location where the Microsoft identity platform redirects a user's client and sends security tokens after authentication.
+
+In a production web application, for example, the redirect URI is often a public endpoint where your app is running, like `https://contoso.com/auth-response`. During development, it's common to also add the endpoint where you run your app locally, like `https://127.0.0.1/auth-response` or `http://localhost/auth-response`.
+
+You add and modify redirect URIs for your registered applications by configuring their [platform settings](#configure-platform-settings).
+
+### Configure platform settings
+
+Settings for each application type, including redirect URIs, are configured in **Platform configurations** in the Azure portal. Some platforms, like **Web** and **Single-page applications**, require you to manually specify a redirect URI. For other platforms, like mobile and desktop, you can select from redirect URIs generated for you when you configure their other settings.
+
+To configure application settings based on the platform or device you're targeting, follow these steps:
+
+1. In the Azure portal, in **App registrations**, select your application.
+1. Under **Manage**, select **Authentication**.
+1. Under **Platform configurations**, select **Add a platform**.
+1. Under **Configure platforms**, select the tile for your application type (platform) to configure its settings.
+
+ :::image type="content" source="../active-directory/develop/media/quickstart-register-app/portal-04-app-reg-03-platform-config.png" alt-text="Screenshot of the platform configuration pane in the Azure portal." border="false":::
+
+ | Platform | Configuration settings |
+ | -- | -- |
+ | **Web** | Enter a **Redirect URI** for your app. This URI is the location where the Microsoft identity platform redirects a user's client and sends security tokens after authentication.<br/><br/>Select this platform for standard web applications that run on a server. |
+ | **Single-page application** | Enter a **Redirect URI** for your app. This URI is the location where the Microsoft identity platform redirects a user's client and sends security tokens after authentication.<br/><br/>Select this platform if you're building a client-side web app by using JavaScript or a framework like Angular, Vue.js, React.js, or Blazor WebAssembly. |
+ | **iOS / macOS** | Enter the app **Bundle ID**. Find it in **Build Settings** or in Xcode in _Info.plist_.<br/><br/>A redirect URI is generated for you when you specify a **Bundle ID**. |
+ | **Android** | Enter the app **Package name**. Find it in the _AndroidManifest.xml_ file. Also generate and enter the **Signature hash**.<br/><br/>A redirect URI is generated for you when you specify these settings. |
+ | **Mobile and desktop applications** | Select one of the **Suggested redirect URIs**. Or specify a **Custom redirect URI**.<br/><br/>For desktop applications using embedded browser, we recommend<br/>`https://login.microsoftonline.com/common/oauth2/nativeclient`<br/><br/>For desktop applications using system browser, we recommend<br/>`http://localhost`<br/><br/>Select this platform for mobile applications that aren't using the latest Microsoft Authentication Library (MSAL) or aren't using a broker. Also select this platform for desktop applications. |
+
+1. Select **Configure** to complete the platform configuration.
+
+### Redirect URI restrictions
+
+There are some restrictions on the format of the redirect URIs you add to an app registration. For details about these restrictions, see [Redirect URI (reply URL) restrictions and limitations](../active-directory/develop/reply-url.md).
+
+## Add credentials
+
+Credentials are used by [confidential client applications](../active-directory/develop/msal-client-applications.md) that access a web API. Examples of confidential clients are web apps, other web APIs, or service-type and daemon-type applications. Credentials allow your application to authenticate as itself, requiring no interaction from a user at runtime.
+
+You can add both certificates and client secrets (a string) as credentials to your confidential client app registration.
++
+### Add a certificate
+
+Sometimes called a _public key_, a certificate is the recommended credential type because they're considered more secure than client secrets. For more information about using a certificate as an authentication method in your application, see [Microsoft identity platform application authentication certificate credentials](../active-directory/develop/active-directory-certificate-credentials.md).
+
+1. In the Azure portal, in **App registrations**, select your application.
+1. Select **Certificates & secrets** > **Certificates** > **Upload certificate**.
+1. Select the file you want to upload. It must be one of the following file types: _.cer_, _.pem_, _.crt_.
+1. Select **Add**.
+
+### Add a client secret
+
+Sometimes called an _application password_, a client secret is a string value your app can use in place of a certificate to identity itself.
+
+Client secrets are considered less secure than certificate credentials. Application developers sometimes use client secrets during local app development because of their ease of use. However, you should use certificate credentials for any of your applications that are running in production.
+
+1. In the Azure portal, in **App registrations**, select your application.
+1. Select **Certificates & secrets** > **Client secrets** > **New client secret**.
+1. Add a description for your client secret.
+1. Select an expiration for the secret or specify a custom lifetime.
+ - Client secret lifetime is limited to two years (24 months) or less. You can't specify a custom lifetime longer than 24 months.
+ - Microsoft recommends that you set an expiration value of less than 12 months.
+1. Select **Add**.
+1. _Record the secret's value_ for use in your client application code. This secret value is _never displayed again_ after you leave this page.
+
+For application security recommendations, see [Microsoft identity platform best practices and recommendations](../active-directory/develop/identity-platform-integration-checklist.md#security).
+
+## Next steps
+
+- [Azure confidential ledger authentication with Azure Active Directory (Azure AD)](authentication-azure-ad.md)
+- [Overview of Microsoft Azure confidential ledger](overview.md)
+- [Integrating applications with Azure Active Directory](../active-directory/develop/quickstart-register-app.md)
+- [Use portal to create an Azure AD application and service principal that can access resources](../active-directory/develop/howto-create-service-principal-portal.md)
+- [Create an Azure service principal with the Azure CLI](/cli/azure/create-an-azure-service-principal-azure-cli).
+- [Authenticating Azure confidential ledger nodes](authenticate-ledger-nodes.md)
cost-management-billing Automate Budget Creation https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/automate-budget-creation.md
+
+ Title: Automate budget creation
+
+description: This article helps you create budgets with the Budget API and a budget template.
++ Last updated : 07/15/2022++++++
+# Automate budget creation
+
+You can automate budget creation using the [Budgets API](/rest/api/consumption/budgets). You can also create a budget with a [budget template](../costs/quick-create-budget-template.md). Templates are an easy way for you to standardize Azure deployments while ensuring cost control is properly configured and enforced.
+
+## Common Budgets API configurations
+
+There are many ways to configure a budget in your Azure environment. Consider your scenario first and then identify the configuration options that enable it. Review the following options:
+
+- **Time Grain** - Represents the recurring period your budget uses to accrue and evaluate costs. The most common options are Monthly, Quarterly, and Annual.
+- **Time Period** - Represents how long your budget is valid. The budget actively monitors and alerts you only while it remains valid.
+- **Notifications**
+ - Contact Emails ΓÇô The email addresses receive alerts when a budget accrues costs and exceeds defined thresholds.
+ - Contact Roles - All users who have a matching Azure role on the given scope receive email alerts with this option. For example, Subscription Owners could receive an alert for a budget created at the subscription scope.
+ - Contact Groups - The budget calls the configured action groups when an alert threshold is exceeded.
+- **Cost dimension filters** - The same filtering you can do in Cost Analysis or the Query API can also be done on your budget. Use this filter to reduce the range of costs that you're monitoring with the budget.
+
+After you've identified the budget creation options that meet your needs, create the budget using the API. The example below helps get you started with a common budget configuration.
+
+### Create a budget filtered to multiple resources and tags
+
+Request URL: `PUT https://management.azure.com/subscriptions/{SubscriptionId}/providers/Microsoft.Consumption/budgets/{BudgetName}/?api-version=2019-10-01`
+
+```json
+{
+ "eTag": "\"1d34d016a593709\"",
+ "properties": {
+ "category": "Cost",
+ "amount": 100.65,
+ "timeGrain": "Monthly",
+ "timePeriod": {
+ "startDate": "2017-10-01T00:00:00Z",
+ "endDate": "2018-10-31T00:00:00Z"
+ },
+ "filter": {
+ "and": [
+ {
+ "dimensions": {
+ "name": "ResourceId",
+ "operator": "In",
+ "values": [
+ "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachines/{meterName}",
+ "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Compute/virtualMachines/{meterName}"
+ ]
+ }
+ },
+ {
+ "tags": {
+ "name": "category",
+ "operator": "In",
+ "values": [
+ "Dev",
+ "Prod"
+ ]
+ }
+ },
+ {
+ "tags": {
+ "name": "department",
+ "operator": "In",
+ "values": [
+ "engineering",
+ "sales"
+ ]
+ }
+ }
+ ]
+ },
+ "notifications": {
+ "Actual_GreaterThan_80_Percent": {
+ "enabled": true,
+ "operator": "GreaterThan",
+ "threshold": 80,
+ "contactEmails": [
+ "user1@contoso.com",
+ "user2@contoso.com"
+ ],
+ "contactRoles": [
+ "Contributor",
+ "Reader"
+ ],
+ "contactGroups": [
+ "/subscriptions/{subscriptionID}/resourceGroups/{resourceGroupName}/providers/microsoft.insights/actionGroups/{actionGroupName}
+ ],
+ "thresholdType": "Actual"
+ }
+ }
+ }
+}
+```
+
+## Supported locales for budget alert emails
+
+With budgets, you're alerted when costs cross a set threshold. You can set up to five email recipients per budget. Recipients receive the email alerts within 24 hours of crossing the budget threshold. However, your recipient might need to receive an email in a different language. You can use the following language culture codes with the Budgets API. Set the culture code with the `locale` parameter similar to the following example.
+
+```json
+{
+ "eTag": "\"1d681a8fc67f77a\"",
+ "properties": {
+ "timePeriod": {
+ "startDate": "2020-07-24T00:00:00Z",
+ "endDate": "2022-07-23T00:00:00Z"
+ },
+ "timeGrain": "BillingMonth",
+ "amount": 1,
+ "currentSpend": {
+ "amount": 0,
+ "unit": "USD"
+ },
+ "category": "Cost",
+ "notifications": {
+ "actual_GreaterThan_10_Percent": {
+ "enabled": true,
+ "operator": "GreaterThan",
+ "threshold": 20,
+ "locale": "en-us",
+ "contactEmails": [
+ "user@contoso.com"
+ ],
+ "contactRoles": [],
+ "contactGroups": [],
+ "thresholdType": "Actual"
+ }
+ }
+ }
+}
+```
+
+Languages supported by a culture code:
+
+| Culture code| Language |
+| | |
+| en-us | English (United States) |
+| ja-jp | Japanese (Japan) |
+| zh-cn | Chinese (Simplified, China) |
+| de-de | German (Germany) |
+| es-es | Spanish (Spain, International) |
+| fr-fr | French (France) |
+| it-it | Italian (Italy) |
+| ko-kr | Korean (Korea) |
+| pt-br | Portuguese (Brazil) |
+| ru-ru | Russian (Russia) |
+| zh-tw | Chinese (Traditional, Taiwan) |
+| cs-cz | Czech (Czech Republic) |
+| pl-pl | Polish (Poland) |
+| tr-tr | Turkish (Turkey) |
+| da-dk | Danish (Denmark) |
+| en-gb | English (United Kingdom) |
+| hu-hu | Hungarian (Hungary) |
+| nb-no | Norwegian Bokmal (Norway) |
+| nl-nl | Dutch (Netherlands) |
+| pt-pt | Portuguese (Portugal) |
+| sv-se | Swedish (Sweden) |
+
+## Configure cost-based orchestration for budget alerts
+
+You can configure budgets to start automated actions using Azure Action Groups. To learn more about automating actions using budgets, see [Automation with Azure Budgets](../manage/cost-management-budget-scenario.md).
+
+## Next steps
+
+- Learn more about Cost Management + Billing automation at [Cost Management automation overview](automation-overview.md).
+- [Assign permissions to Cost Management APIs](cost-management-api-permissions.md).
cost-management-billing Automation Faq https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/automation-faq.md
+
+ Title: Microsoft Cost Management automation FAQ
+description: This FAQ is a list of frequently asked questions and answers about Cost Management automation.
++ Last updated : 07/15/2022++++++
+# Cost Management automation FAQ
+
+The following sections cover the most commonly asked questions and answers about Cost Management automation.
+
+### What are Cost Details versus Usage Details?
+
+Both are different names for the same dataset. Usage Details is the original name of the dataset back when only Azure consumption resource usage records were present. Over time, more types of cost have been added to the dataset, including marketplace usage, Azure purchases (such as Reservations), marketplace purchases and even Microsoft 365 costs. Cost Details is the name being used moving forward for the dataset.
+
+### Why do I get Usage Details API timeouts?
+
+Cost datasets available from the Usage Details API can often be overly large (multiple GBs or more). The larger the size of the dataset that you request, the longer the service takes to compile the data before sending it to you. Because of the delay, synchronous API solutions like the paginated [JSON Usage Details API](/rest/api/consumption/usage-details/list) might time out before your data is provided. If you encounter timeouts or have processes that frequently need to pull a large amount of cost data, see [Retrieve large cost datasets recurringly with Exports](../costs/tutorial-export-acm-data.md).
+
+### What is the difference between legacy and modern usage details?
+
+A legacy versus modern usage details record is identified by the kind field in the [Usage Details API](/rest/api/consumption/usage-details/list). The field is used to distinguish between data thatΓÇÖs returned for different customer types. The call patterns to obtain legacy and modern usage details are essentially the same. The granularity of the data is the same. The main difference is the fields available in the usage details records themselves. If youΓÇÖre an EA customer, youΓÇÖll always get legacy usage details records. If youΓÇÖre a Microsoft Customer Agreement customer, you'll always get modern usage details records.
+
+### How do I see my recurring charges?
+
+Recurring charges are available in the [Cost Details](/rest/api/cost-management/generate-cost-details-report) report when viewing Actual Cost.
+
+### Where can I see tax information in Cost Details?
+
+Cost details data is all pre-tax. Tax related charges are only available on your invoice.
+
+### Why is PAYGPrice zero in my cost details file?
+
+If youΓÇÖre an EA customer, we donΓÇÖt currently support showing pay-as-you-go prices directly in the usage details data. To see the pricing, use the [Retail Prices API](/rest/api/cost-management/retail-prices/azure-retail-prices).
+
+### Does Cost Details have Reservation charges?
+
+Yes it does. You can see those charges according to when the actual charges occurred (Actual Cost). Or, you can see the charges spread across the resources that consumed the Reservation (Amortized Cost). For more information, see [Get Azure consumption and reservation usage data using API](../reservations/understand-reserved-instance-usage-ea.md#get-azure-consumption-and-reservation-usage-data-using-api).
+
+### Am I charged for using the Cost Details API?
+
+No the Cost Details API is free. Make sure to abide by the rate-limiting policies, however.
+
+<! For more information, see [Data latency and rate limits](api-latency-rate-limits.md). -->
+
+### What's the difference between the Invoice API, the Transaction API, and the Cost Details API?
+
+These APIs provide a different view of the same data:
+
+- The [Invoice API](/api/billing/2019-10-01-preview/invoices) provides an aggregated view of your monthly charges.
+- The [Transactions API](/rest/api/billing/2020-05-01/transactions/list-by-invoice) provides a view of your monthly charges aggregated at product/service family level.
+- The [Cost Details](/rest/api/cost-management/generate-cost-details-report) report provides a granular view of the usage and cost records for each day. Both Enterprise and Microsoft Customer Agreement customers can use it. If you're a legacy pay-as-you-go customer, see [Get Cost Details as a legacy customer](get-usage-details-legacy-customer.md).
+
+### I recently migrated from an EA to an MCA agreement. How do I migrate my API workloads?
+
+See [Migrate from EA to MCA APIs](../costs/migrate-cost-management-api.md).
+
+### When will the [Enterprise Reporting APIs](../manage/enterprise-api.md) get turned off?
+
+The Enterprise Reporting APIs are deprecated. The date that the API will be turned off is still being determined. We recommend that you migrate away from the APIs as soon as possible. For more information, see [Migrate from Enterprise Reporting to Azure Resource Manager APIs](../costs/migrate-from-enterprise-reporting-to-azure-resource-manager-apis.md).
+
+### When will the [Consumption Usage Details API](/rest/api/consumption/usage-details/list) get turned off?
+
+The Consumption Usage Details API is deprecated. The date that the API will bet turned off is still being determined. We recommend that you migrate away from the API as soon as possible. For more information, see [Migrate from Consumption Usage Details API](migrate-consumption-usage-details-api.md).
+
+### When will the [Consumption Marketplaces API](/rest/api/consumption/marketplaces/list) get turned off?
+
+The Marketplaces API is deprecated. The date that the API will be turned off is still being determined. Data from the API is available in the [Cost Details](/rest/api/cost-management/generate-cost-details-report) report. We recommend that you migrate to it as soon as possible. For more information, see [Migrate from Consumption Marketplaces API](migrate-consumption-marketplaces-api.md).
+
+### When will the [Consumption Forecasts API](/rest/api/consumption/forecasts/list) get turned off?
+
+The Forecasts API is deprecated. The date that the API will be turned off is still being determined. Data from the API is available in the [Cost Management Forecast API](/rest/api/cost-management/forecast). We recommend that you migrate to it as soon as possible.
+
+## Next steps
+
+- Learn more about Cost Management + Billing automation at [Cost Management automation overview](automation-overview.md).
cost-management-billing Automation Ingest Usage Details Overview https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/automation-ingest-usage-details-overview.md
+
+ Title: Ingest cost details data
+
+description: This article explains how to use cost details records to correlate meter-based charges with the specific resources responsible for the charges so that you can properly reconcile your bill.
++ Last updated : 07/15/2022++++++
+# Ingest cost details data
+
+Cost details (formerly referred to as usage details) are the most granular cost records that are available across Microsoft. Cost details records allow you to correlate Azure meter-based charges with the specific resources responsible for the charges so that you can properly reconcile your bill. The data also includes charges associated with New Commerce products like Microsoft 365 and Dynamics 365 that are invoiced along with Azure. Currently, only Partners can purchase New Commerce non-Azure products. To learn more, see [Understand cost management data](../costs/understand-cost-mgt-data.md).
+
+This document outlines the main solutions available to you as you work with cost details data. You might need to download your cost data to merge it with other datasets. Or you might need to integrate cost data into your own systems. There are different options available depending on the amount of data involved.
+
+You must have Cost Management permissions at the appropriate scope to use APIs and tools in any case. For more information, see [Assign access to data](../costs/assign-access-acm-data.md) and [Assign permissions to Cost Management APIs](cost-management-api-permissions.md).
+
+## How to get cost details
+
+You can use [exports](../costs/tutorial-export-acm-data.md) or the [Cost Details](/rest/api/cost-management/generate-cost-details-report) report to get cost details programmatically. To learn more about which solutions are best for your scenarios, see [Choose a cost details solution](usage-details-best-practices.md).
+
+For Azure portal download instructions, see [How to get your Azure billing invoice and daily usage data](../manage/download-azure-invoice-daily-usage-date.md). If you have a small cost details dataset that you maintain from one month to another, you can open your CSV file in Microsoft Excel or another spreadsheet application.
+
+## Cost details data format
+
+The Azure billing system uses cost details records at the end of the month to generate your bill. Your bill is based on the net charges that were accrued by meter. Cost records contain daily rated usage based on negotiated rates, purchases (for example, reservations, Marketplace fees), and refunds for the specified period. Fees don't include credits, taxes, or other charges or discounts.
+
+The following table shows the charges that are included in your cost details dataset for each account type.
+
+| **Account type** | **Azure usage** | **Marketplace usage** | **Purchases** | **Refunds** |
+| | | | | |
+| Enterprise Agreement (EA) | Γ£ö | Γ£ö | Γ£ö | Γ£ÿ |
+| Microsoft Customer Agreement (MCA) | Γ£ö | Γ£ö | Γ£ö | Γ£ö |
+| Pay-as-you-go (PAYG) | Γ£ö | Γ£ö | Γ£ÿ | Γ£ÿ |
+
+A single Azure resource often has multiple meters emitting charges. For example, a VM may have both Compute and Networking related meters.
+
+To understand the fields that are available in cost details records, see [Understand cost details fields](understand-usage-details-fields.md).
+
+To learn more about Marketplace orders (also known as external services), see [Understand your Azure external service charges](../understand/understand-azure-marketplace-charges.md).
+
+### A single resource might have multiple records per day
+
+Azure resource providers emit usage and charges to the billing system and populate the Additional Info field of the usage records. Occasionally, resource providers might emit usage for a given day and stamp the records with different datacenters in the Additional Info field of the cost records. It can cause multiple records for a meter or resource to be present in your cost file for a single day. In that situation, you aren't overcharged. The multiple records represent the full cost of the meter for the resource on that day.
+
+### Pricing behavior in cost details
+
+The cost details file exposes multiple price points today. These are outlined below.
+
+- **PAYGPrice:** This is the list price for a given product or service that is determined based on the customer agreement. For customers who have an Enterprise Agreement, the pay-as-you-go price represents the EA baseline price.
+
+- **UnitPrice:** This is the price for a given product or service inclusive of any negotiated discounts on top of the pay-as-you-go price.
+
+- **EffectivePrice** This is the price for a given product or service that represents the actual rate that you end up paying per unit. It's the price that should be used with the Quantity to do Price \* Quantity calculations to reconcile charges. The price takes into account the following scenarios:
+ - *Tiered pricing:* For example: $10 for the first 100 units, $8 for the next 100 units.
+ - *Included quantity:* For example: The first 100 units are free and then $10 for each unit.
+ - *Reservations:* For example, a VM that got a reservation benefit on a given day. In amortized data for reservations, the effective price is the prorated hourly reservation cost. The cost is the total cost of reservation usage by the resource on that day.
+ - *Rounding that occurs during calculation:* Rounding takes into account the consumed quantity, tiered/included quantity pricing, and the scaled unit price.
+
+- **Quantity:** This is the number of units used by the given product or service for a given day and is aligned to the unit of measure used in actual resource usage.
+
+If you want to reconcile costs with your price sheet or invoice, note the following information about unit of measure.
+
+**Price Sheet unit of measure behavior** - The prices shown on the price sheet are the prices that you receive from Azure. They're scaled to a specific unit of measure.
+
+**Cost details unit of measure behavior** - The unit of measure associated with the usage quantities and pricing seen in cost details aligns with actual resource usage.
+
+#### Example pricing scenarios seen in cost details for a resource
+
+| **MeterId** | **Quantity** | **PAYGPrice** | **UnitPrice** | **EffectivePrice** | **UnitOfMeasure** | **Notes** |
+| | | | | | | |
+| 00000000-0000-0000-0000-00000000000 | 24 | 1 | 0.8 | 0.76 | 1 hour | Manual calculation of the actual charge: multiply 24 * 0.76 * 1 hour. |
+
+## Unexpected charges
+
+If you have charges that you don't recognize, there are several things you can do to help understand why:
+
+- Review the invoice that has charges for the resource
+- Review your invoiced charges in Cost analysis
+- Find people responsible for the resource and engage with them
+- Analyze the audit logs
+- Analyze user permissions to the resource's parent scope
+- Create an [Azure support request](https://go.microsoft.com/fwlink/?linkid=2083458) to help identify the charges
+
+For more information, see [Analyze unexpected charges](../understand/analyze-unexpected-charges.md).
+
+Azure doesn't log most user actions. Instead, Azure logs resource usage for billing. If you notice a usage spike in the past and you didn't have logging enabled, Azure can't pinpoint the cause. Enable logging for the service that you want to view the increased usage for so that the appropriate technical team can assist you with the issue.
+
+## Next steps
+
+- Learn more about [Choose a cost details solution](usage-details-best-practices.md).
+- [Create and manage exported data](../costs/tutorial-export-acm-data.md) in the Azure portal with Exports.
+- [Automate Export creation](../costs/ingest-azure-usage-at-scale.md) and ingestion at scale using the API.
+- [Understand cost details fields](understand-usage-details-fields.md).
+- Learn how to [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md).
cost-management-billing Automation Overview https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/automation-overview.md
+
+ Title: Cost Management automation overview
+
+description: This article covers common scenarios for Cost Management automation and options available based on your situation.
++ Last updated : 07/15/2022++++++
+# Cost Management automation overview
+
+You can use Cost Management automation and reporting to build a custom set of solutions to retrieve and manage cost data. This article covers what APIs are available for use and common scenarios for Cost Management automation.
+
+## Available APIs
+
+There are many different APIs that can be used to interact with Cost Management data. A summary of the available APIs and what they do is below. Multiple APIs may need to be used to achieve a specific scenario. Review the common scenarios outlined later to learn more.
+
+For contractual information about how to call each API, review the API specification articles.
+
+### Cost Details APIs
+The APIs below provide you with cost details data (formerly referred to as usage details). Cost Details are the most granular usage and cost records that are available to you within the Azure ecosystem. All Cost Management experiences in the Azure portal and the APIs are built upon the raw dataset. To learn more, see [cost details overview](automation-ingest-usage-details-overview.md).
+
+- [Exports API](/rest/api/cost-management/exports/create-or-update) - Configure a recurring task to export your cost details data to Azure storage on a daily, weekly or monthly basis. Exported data is in CSV format. It's our recommended solution for ingesting cost data and is the most scalable for large enterprises. To learn more, see [Retrieve large cost datasets with exports](../costs/ingest-azure-usage-at-scale.md).
+
+- [Generate Cost Details](/rest/api/cost-management/generate-cost-details-report) - Download a cost details CSV file on demand. It's useful for smaller, date range based datasets. For larger workloads, we strongly recommend that you use Exports. To learn more about using this API, see [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md).
+
+### Pricing APIs
+
+- [Azure Retail Prices](/rest/api/cost-management/retail-prices/azure-retail-prices) - Get meter rates with pay-as-you-go pricing. You can use the returned information with your resource usage information to manually calculate the expected bill.
+
+- [Price Sheet API](/rest/api/consumption/pricesheet) - Get custom pricing for all meters. Enterprises can use this data in combination with usage details and marketplace usage information to manually calculate costs by using usage and marketplace data.
+
+### Budgets and Alerts APIs
+
+- [Budgets API](/rest/api/consumption/budgets) - Create either cost budgets for resources, resource groups, or billing meters. When you've created budgets, you can configure alerts to notify you when you've exceeded defined budget thresholds. You can also configure actions to occur when you've reached budget amounts. For more information, see [Automate budget creation](automate-budget-creation.md) and [Configure budget based actions](../manage/cost-management-budget-scenario.md).
+
+- [Alerts API](/rest/api/cost-management/alerts) - Manage all of the alerts that have been created by budgets and other Azure alerting systems.
+
+### Invoicing APIs
+
+- [Invoices API](/rest/api/billing/2020-05-01/invoices) - Get list of invoices. The API returns a summary of your invoices including total amount, payment status and a link to download a pdf copy of your invoice.
+
+- [Transactions API](/rest/api/billing/2020-05-01/transactions/list-by-invoice) - Get invoice line-items for an invoice. You can use the API to get all purchases, refunds and credits that are included in your invoice. The API is only available for customers with Microsoft Customer Agreement or Microsoft Partner Agreement billing accounts.
+
+### Reservation APIs
+
+- [Reservation Details API](/rest/api/cost-management/generate-reservation-details-report) - Get the detailed resource consumption associated with your reservation purchases.
+
+- [Reservation Transactions API](/rest/api/consumption/reservation-transactions) - Get reservation related purchase and management transactions.
+
+- [Reservation Recommendations API](/rest/api/consumption/reservation-recommendations) - Get recommendations for reservation purchases to make in the future along with expected savings information.
+
+- [Reservation Recommendation Details API](/rest/api/consumption/reservation-recommendation-details) - Get detailed information for specific reservation purchases to perform a what-if analysis.
+
+## Common API scenarios
+
+You can use the billing and cost management APIs in many scenarios to answer cost-related and usage-related questions. Common scenarios and how to use the different APIs to achieve those scenarios are outlined below.
+
+### Invoice reconciliation
+
+This scenario is used to address the following questions:
+
+- Did Microsoft charge me the right amount on my invoice?
+- What's my bill, and can I calculate it myself using the raw data?
+
+To answer these questions, follow the steps below.
+
+1. Call the [Invoices API](/rest/api/billing/2020-05-01/invoices) to get the info needed to download an invoice. If you're a Microsoft Customer Agreement customer and just wish to get the specific line items seen on your invoice automatically, you can also utilize the [Transactions API](/rest/api/billing/2020-05-01/transactions/list-by-invoice) to get those line items in an API-readable format.
+
+2. Use either [Exports](/rest/api/cost-management/exports) or the [Cost Details](/rest/api/cost-management/generate-cost-details-report) API to download the raw usage file.
+
+3. Analyze the data in the raw usage file to compare it against the costs that are present on the invoice. For Azure consumption, the data in your invoice is rolled up based on the meter associated with your usage.
+
+### Cross-charging
+
+Once there's a good understanding of spending for a given month, organizations next need to determine what teams or divisions need to pay for the various charges accrued. Follow the steps below.
+
+1. Use either [Exports](/rest/api/cost-management/exports) or the [Cost Details](/rest/api/cost-management/generate-cost-details-report) API to download the raw usage file.
+
+2. Analyze the data in the raw usage file and allocate it based on the organizational hierarchy that you have in place. Allocation could be based on resource groups, subscriptions, cost allocation rules, tags or other Azure organization hierarchies.
+ - To learn more about best practices to consider when configuring your Azure environments, see [Cost management best practices](../costs/cost-mgt-best-practices.md).
+ - To learn more about the scopes and the organizational structures available to you, see [Understand and work with scopes](../costs/understand-work-scopes.md).
+ - To set up allocation directly in Azure, see [Allocate costs](../costs/allocate-costs.md).
+
+### Azure spending prior to invoice closure
+
+It's important to keep tabs on how costs are accruing throughout the month. Proactive analysis before the invoice is closed can provide opportunities to change spending patterns and get an invoice's projected costs down. To ingest all of the raw data that has accrued month-to-date, use [Exports API](/rest/api/cost-management/exports).
+
+Configuring automatic alerting can also ensure that spending doesn't unexpectedly get out of hand and removes the need for manual cost monitoring throughout the month. To ensure your costs don't breach thresholds or aren't forecasted to breach thresholds, use the [Budgets API](/rest/api/consumption/budgets).
+
+### Cost trend reporting
+
+Often it's useful to understand how much an organization is spending over time. Understanding cost over time helps identify trends and areas for cost optimization improvement. Follow the steps below to set up a cost dataset that can be used for reporting cost over time at scale.
+
+1. Extract the historical costs for prior months. See [Seed a historical cost dataset with the Exports API](tutorial-seed-historical-cost-dataset-exports-api.md) to learn more.
+2. Ingest your historical data from the Azure storage account associated with your Exports into a queryable store. We recommend SQL or Azure Synapse.
+3. Configure a month-to-date Export to storage at a scope with the costs that need to be analyzed. Export to storage is done in the Azure portal. See [Export costs](../costs/tutorial-export-acm-data.md). The month-to-date Export will be used to properly extract costs moving forward.
+4. Configure a data pipeline to ingest cost data for the open month into your queryable store. This pipeline should be used with the month-to-date Export that you've configured. Azure Data Factory provides good solutions for this kind of ingestion scenario.
+5. Perform reporting as needed using reports built with your queryable store. Power BI can be good for this scenario. If you're looking for a more out of the box solution, see our [Power BI Template App](../costs/analyze-cost-data-azure-cost-management-power-bi-template-app.md).
+
+### Reservation related investigations
+
+For more information about reservation-specific automation scenarios, see [APIs for Azure reservation automation](../reservations/reservation-apis.md).
+
+## Next steps
+
+- To learn more about how to assign the proper permissions to call our APIs programatically, see [Assign permissions to Cost Management APIs](cost-management-api-permissions.md).
+- To learn more about working with cost details, see [Ingest usage details data](automation-ingest-usage-details-overview.md).
+
+- To learn more about budget automation, see [Automate budget creation](automate-budget-creation.md).
+- For information about using REST APIs retrieve prices for all Azure services, see [Azure Retail Prices overview](/rest/api/cost-management/retail-prices/azure-retail-prices).
+
+- To compare your invoice with the detailed daily usage file and the cost management reports in the Azure portal, see [Understand your bill for Microsoft Azure](../understand/review-individual-bill.md).
+- If you have questions or need help, [create a support request](https://go.microsoft.com/fwlink/?linkid=2083458).
cost-management-billing Cost Management Api Permissions https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/cost-management-api-permissions.md
+
+ Title: Assign permissions to Cost Management APIs
+
+description: This article describes what you need to know to successfully assign permissions to an Azure service principal.
++ Last updated : 07/15/2022++++++
+# Assign permissions to Cost Management APIs
+
+Before using the Azure Cost Management APIs, you need to properly assign permissions to an Azure service principal. From there you can use the service principal identity to call the APIs.
+
+## Permissions configuration checklist
+
+- Get familiar with the [Azure Resource Manager REST APIs](/rest/api/azure).
+- Determine which Cost Management APIs you want to use. For more information about available APIs, see [Cost Management automation overview](automation-overview.md).
+- Configure service authorization and authentication for the Azure Resource Manager APIs.
+ - If you're not already using Azure Resource Manager APIs, [register your client app with Azure AD](/rest/api/azure/#register-your-client-application-with-azure-ad). Registration creates a service principal for you to use to call the APIs.
+ - Assign the service principal access to the scopes needed, as outlined below.
+ - Update any programming code to use [Azure AD authentication](/rest/api/azure/#create-the-request) with your service principal.
+
+## Assign service principal access to Azure Resource Manager APIs
+
+After you create a service principal to programmatically call the Azure Resource Manager APIs, you need to assign it the proper permissions to authorize against and execute requests in Azure Resource Manager. There are two permission frameworks for different scenarios.
+
+### Azure billing hierarchy access
+
+If you have an Azure Enterprise Agreement or a Microsoft Customer Agreement, you can configure service principal access to Cost Management data in your billing account. To learn more about the billing hierarchies available and what permissions are needed to call each API in Azure Cost Management, see [Understand and work with scopes](../costs/understand-work-scopes.md).
+
+- Enterprise Agreements - To assign service principal permissions to your enterprise billing account, departments, or enrollment account scopes, see [Assign roles to Azure Enterprise Agreement service principal names](../manage/assign-roles-azure-service-principals.md).
+
+- Microsoft Customer Agreements - To assign service principal permissions to your Microsoft Customer Agreement billing account, billing profile, invoice section or customer scopes, see [Manage billing roles in the Azure portal](../manage/understand-mca-roles.md#manage-billing-roles-in-the-azure-portal). Configure the permission to your service principal in the portal as you would a normal user. If you want to automate permissions assignment, see the [Billing Role Assignments API](/rest/api/billing/2020-05-01/billing-role-assignments).
+
+### Azure role-based access control
+
+Service principal support extends to Azure-specific scopes, like management groups, subscriptions, and resource groups. You can assign service principal permissions to thee scopes directly [in the Azure portal](../../active-directory/develop/howto-create-service-principal-portal.md#assign-a-role-to-the-application) or by using [Azure PowerShell](../../active-directory/develop/howto-authenticate-service-principal-powershell.md#assign-the-application-to-a-role).
+
+## Next steps
+
+- Learn more about Cost Management automation at [Cost Management automation overview](automation-overview.md).
cost-management-billing Get Small Usage Datasets On Demand https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/get-small-usage-datasets-on-demand.md
+
+ Title: Get small cost datasets on demand
+
+description: The article explains how you can use the Cost Details API to get raw, unaggregated cost data that corresponds to your Azure bill.
++ Last updated : 07/15/2022++++++
+# Get small cost datasets on demand
+
+Use the [Cost Details](/rest/api/cost-management/generate-cost-details-report) API to get raw, unaggregated cost data that corresponds to your Azure bill. The API is useful when your organization needs a programmatic data retrieval solution. Consider using the API if want to analyze smaller cost data sets of 2 GB (2 million rows) or less. However, you should use Exports for ongoing data ingestion workloads and for the download of larger datasets.
+
+If you want to get large amounts of exported data regularly, see [Retrieve large cost datasets recurringly with exports](../costs/ingest-azure-usage-at-scale.md).
+
+To learn more about the data in cost details (formerly referred to as *usage details*), see [Ingest cost details data](automation-ingest-usage-details-overview.md).
+
+The [Cost Details](/rest/api/cost-management/generate-cost-details-report) report is only available for customers with an Enterprise Agreement or Microsoft Customer Agreement. If you're an MSDN, Pay-As-You-Go or Visual Studio customer, see [Get cost details as a legacy customer](get-usage-details-legacy-customer.md).
+
+## Cost Details API best practices
+
+Microsoft recommends the following best practices as you use the Cost Details API.
+
+### Request schedule
+
+If you want to get the latest cost data, we recommend you query at most once per day. Reports are refreshed every four hours. If you call more frequently, you'll receive identical data. Once you download your cost data for historical invoices, the charges won't change unless you're explicitly notified. We recommend caching your cost data in a queryable store on your side to prevent repeated calls for identical data.
+
+### Chunk your requests
+
+Chunk your calls into small date ranges to get more manageable files that you can download over the network. For example, we recommend chunking by day or by week if you have a large Azure cost file month-to-month. If you have scopes with a large amount of cost data (for example a Billing Account), consider placing multiple calls to child scopes so you get more manageable files that you can download. For more information about Cost Management scopes, see [Understand and work with scopes](../costs/understand-work-scopes.md). After you download the data, use Excel to analyze data further with filters and pivot tables.
+
+If your dataset is more than 2 GB (or roughly 2 million rows) month-to-month, consider using [Exports](../costs/tutorial-export-acm-data.md) as a more scalable solution.
+
+### Latency and rate limits
+
+On demand calls to the API are rate limited. The time it takes to generate your cost details file is directly correlated with the amount of data in the file. To understand the expected amount of time before your file becomes available for download, you can use the `retry-after` header in the API response.
+
+<!-- For more information, see [Cost Management API latency and rate limits](api-latency-rate-limits.md). -->
+
+### Supported dataset time ranges
+
+The Cost Details API supports a maximum data set time range of one month per report. Historical data can be retrieved for up to 13 months back from the current date. If you're looking to seed a 13 month historical dataset, we recommend placing 13 calls for one month datasets going back 13 months.
+
+## Example Cost Details API requests
+
+The following example requests are used by Microsoft customers to address common scenarios. The data that's returned by the request corresponds to the date when the cost was received by the billing system. It might include costs from multiple invoices. It's an asynchronous API. As such, you place an initial call to request your report and receive a polling link in the response header. From there, you can poll the link provided until the report is available for you.
+
+Use the `retry-after` header in the API response to dictate when to poll the API next. The header provides an estimated minimum time that your report will take to generate.
+
+To learn more about the API contract, see [Cost Details](/rest/api/cost-management/generate-cost-details-report) API.
+
+### Actual cost versus amortized cost
+
+To control whether you would like to see an actual cost or amortized cost report, change the value used for the metric field in the initial request body. The available metric values are `ActualCost` or `AmortizedCost`.
+
+Amortized cost breaks down your reservation purchases into daily chunks and spreads them over the duration of the reservation term. For example, instead of seeing a $365 purchase on January 1, you'll see a $1.00 purchase every day from January 1 to December 31. In addition to basic amortization, the costs are also reallocated and associated by using the specific resources that used the reservation. For example, if the $1.00 daily charge was split between two virtual machines, you'd see two $0.50 charges for the day. If part of the reservation isn't utilized for the day, you'd see one $0.50 charge associated with the applicable virtual machine and another $0.50 charge with a charge type of `UnusedReservation`. Unused reservation costs are seen only when viewing amortized cost.
+
+Because of the change in how costs are represented, it's important to note that actual cost and amortized cost views will show different total numbers. In general, the total cost of months over time for a reservation purchase will decrease when viewing amortized costs. The months following a reservation purchase will increase. Amortization is available only for reservation purchases and doesn't currently apply to Azure Marketplace purchases.
+
+### Initial request to create report
+
+```http
+POST https://management.azure.com/{scope}/providers/Microsoft.CostManagement/generateCostDetailsReport?api-version=2022-05-01
+```
+
+**Request body:**
+
+An example request for an ActualCost dataset for a specified date range is provided below.
+
+```json
+{
+ "metric": "ActualCost",
+ "timePeriod": {
+ "start": "2020-03-01",
+ "end": "2020-03-15"
+ }
+}
+
+```
+
+The available fields you can provide in the report request body are summarized below.
+
+- **metric** - The type of report requested. It can be either ActualCost or AmortizedCost. Not required. If the field isn't specified, the API will default to an ActualCost report.
+- **timePeriod** - The requested date range for your data. Not required. This parameter can't be used alongside either the invoiceId or billingPeriod parameters. If a timePeriod, invoiceId or billingPeriod parameter isn't provided in the request body the API will return the current month's cost.
+- **invoiceId** - The requested invoice for your data. This parameter can only be used by Microsoft Customer Agreement customers. Additionally, it can only be used at the Billing Profile or Customer scope. This parameter can't be used alongside either the billingPeriod or timePeriod parameters. If a timePeriod, invoiceId or billingPeriod parameter isn't provided in the request body the API will return the current month's cost.
+- **billingPeriod** - The requested billing period for your data. This parameter can be used only by Enterprise Agreement customers. Use the YearMonth format. For example, 202008. This parameter can't be used alongside either the invoiceId or timePeriod parameters. If a timePeriod, invoiceId or billingPeriod parameter isn't provided in the request body the API will return the current month's cost.
+
+**API response:**
+
+`Response Status: 202 ΓÇô Accepted` : Indicates that the request will be processed. Use the `Location` header to check the status.
+
+Response headers:
+
+| Name | Type | Format | Description |
+| | | | |
+| Location | String | | The URL to check the result of the asynchronous operation. |
+| Retry-After | Integer | Int32 | The expected time for your report to be generated. Wait for this duration before polling again. |
+
+### Report polling and download
+
+Once you've requested to create a Cost Details report, poll for the report using the endpoint provided in the `location` header of the API response. An example polling request is below.
+
+Report polling request:
+
+```http
+GET https://management.azure.com/{scope}/providers/Microsoft.CostManagement/costDetailsOperationStatus/{operationId}?api-version=2022-05-01
+```
+
+`Response Status 200 ΓÇô Succeeded`: Indicates that the request has succeeded.
+
+```JSON
+{
+ "id": "subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.CostManagement/operationResults/00000000-0000-0000-0000-000000000000",
+ "name": "00000000-0000-0000-0000-000000000000",
+ "status": "Completed",
+ "manifest": {
+ "manifestVersion": "2022-05-01",
+ "dataFormat": "Csv",
+ "blobCount": 1,
+ "byteCount": 160769,
+ "compressData": false,
+ "requestContext": {
+ "requestScope": "subscriptions/00000000-0000-0000-0000-000000000000",
+ "requestBody": {
+ "metric": "ActualCost",
+ "timePeriod": {
+ "start": "2020-03-01",
+ "end": "2020-03-15"
+ }
+ }
+ },
+ "blobs": [
+ {
+ "blobLink": "{downloadLink}",
+ "byteCount": 32741
+ }
+ ]
+ },
+ "validTill": "2022-05-10T08:08:46.1973252Z"
+}
+```
+
+A summary of the key fields in the API response is below:
+
+- **manifestVersion** - The version of the manifest contract that is used in the response. At this time, the manifest version will remain the same for a given API version.
+- **dataFormat** - CSV is the only supported file format provided by the API at this time.
+- **blobCount** - Represents the number of individual data blobs present in the report dataset. It's important to note that this API may provide a partitioned dataset of more than one file in the response. Design your data pipelines to be able to handle partitioned files accordingly. Partitioning will allow you to be able to ingest larger datasets more quickly moving forward.
+- **byteCount** - The total byte count of the report dataset across all partitions.
+- **compressData** - Compression is always set to false for the first release. The API will support compression in the future, however.
+- **requestContext** - The initial configuration requested for the report.
+- **blobs** - A list of n blob files that together comprise the full report.
+ - **blobLink** - The download URL of an individual blob partition.
+ - **byteCount** - The byte count of the individual blob partition.
+- **validTill** - The date at which the report will no longer be accessible.
+
+## Next steps
+
+- Read the [Ingest cost details data](automation-ingest-usage-details-overview.md) article.
+- Learn more about [Choose a cost details solution](usage-details-best-practices.md).
+- [Understand cost details fields](understand-usage-details-fields.md).
+- [Create and manage exported data](../costs/tutorial-export-acm-data.md) in the Azure portal with exports.
+- [Automate Export creation](../costs/ingest-azure-usage-at-scale.md) and ingestion at scale using the API.
cost-management-billing Get Usage Data Azure Cli https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/get-usage-data-azure-cli.md
+
+ Title: Get usage data with the Azure CLI
+
+description: This article explains how you get usage data with the Azure CLI.
++ Last updated : 07/15/2022++++++
+# Get usage data with the Azure CLI
+
+This article explains how you get cost and usage data with the Azure CLI. If you want to get usage data using the Azure portal, see [View and download your Azure usage and charges](../understand/download-azure-daily-usage.md).
+
+## Set up the Azure CLI
+
+Start by preparing your environment for the Azure CLI.
++
+## Configure an export job to export cost data to Azure storage
+
+After you sign in, use the [export](/cli/azure/costmanagement/export) commands to export usage data to an Azure storage account. You can download the data from there.
+
+1. Create a resource group or use an existing resource group. To create a resource group, run the [group create](/cli/azure/group#az_group_create) command:
+
+ ```azurecli
+ az group create --name TreyNetwork --location "East US"
+ ```
+1. Create a storage account to receive the exports or use an existing storage account. To create an account, use the [storage account create](/cli/azure/storage/account#az_storage_account_create) command:
+
+ ```azurecli
+ az storage account create --resource-group TreyNetwork --name cmdemo
+ ```
+
+1. Run the [export create](/cli/azure/costmanagement/export#az_costmanagement_export_create) command to create the export:
+
+ ```azurecli
+ az costmanagement export create --name DemoExport --type Usage \--scope "subscriptions/00000000-0000-0000-0000-000000000000" --storage-account-id cmdemo \--storage-container democontainer --timeframe MonthToDate --storage-directory demodirectory
+ ```
+
+## Next steps
+
+- Read the [Ingest usage details data](automation-ingest-usage-details-overview.md) article.
+- Learn how to [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md).
+- [Understand usage details fields](understand-usage-details-fields.md).
+- [Create and manage exported data](../costs/tutorial-export-acm-data.md) in the Azure portal with exports.
+- [Automate Export creation](../costs/ingest-azure-usage-at-scale.md) and ingestion at scale using the API.
cost-management-billing Get Usage Details Legacy Customer https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/get-usage-details-legacy-customer.md
+
+ Title: Get Azure cost details as a legacy customer
+
+description: This article explains how you get cost data if you're a legacy customer.
++ Last updated : 07/15/2022++++++
+# Get cost details as a legacy customer
+
+If you have an MSDN, pay-as-you-go, or Visual Studio Azure subscription, we recommend that you use [Exports](../costs/tutorial-export-acm-data.md) or the [Exports API](../costs/ingest-azure-usage-at-scale.md) to get cost details data (formerly known as usage details). The [Cost Details](/rest/api/cost-management/generate-cost-details-report) API report isn't supported for your subscription type yet.
+
+If you need to download small datasets and you don't want to use Azure Storage, you can also use the Consumption Usage Details API. Instructions about how to use the API are below.
+
+> [!NOTE]
+> The API is deprecated for all customers except those with MSDN, pay-as-you-go and Visual Studio subscriptions. If you're an EA or MCA customer don't use this API.
+
+The date that the API will be turned off is still being determined. The [Cost Details](/rest/api/cost-management/generate-cost-details-report) API will be updated to support MSDN, pay-as-you-go and Visual studio subscriptions prior to the deprecation of the Consumption Usage Details API.
+
+## Example Consumption Usage Details API requests
+
+The following example requests are used by Microsoft customers to address common scenarios.
+
+### Get usage details for a scope during a specific date range
+
+The data that's returned by the request corresponds to the date when the data was received by the billing system. It might include costs from multiple invoices. The call to use varies by your subscription type.
+
+For legacy customers, use the following call.
+
+```http
+GET https://management.azure.com/{scope}/providers/Microsoft.Consumption/usageDetails?$filter=properties%2FusageStart%20ge%20'2020-02-01'%20and%20properties%2FusageEnd%20le%20'2020-02-29'&$top=1000&api-version=2019-10-01
+```
+### Get amortized cost details
+
+If you need actual costs to show purchases as they're accrued, change the `metric` to `ActualCost` in the following request. To use amortized and actual costs, you must use version `2019-04-01-preview` or later.
+
+```http
+GET https://management.azure.com/{scope}/providers/Microsoft.Consumption/usageDetails?metric=AmortizedCost&$filter=properties/usageStart+ge+'2019-04-01'+AND+properties/usageEnd+le+'2019-04-30'&api-version=2019-04-01-preview
+```
+## Next steps
+
+- Read the [Ingest cost details data](automation-ingest-usage-details-overview.md) article.
+- Learn how to [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md).
+- [Understand cost details fields](understand-usage-details-fields.md).
+- [Create and manage exported data](../costs/tutorial-export-acm-data.md) in the Azure portal with exports.
+- [Automate Export creation](../costs/ingest-azure-usage-at-scale.md) and ingestion at scale using the API.
cost-management-billing Migrate Consumption Marketplaces Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-consumption-marketplaces-api.md
+
+ Title: Migrate from Consumption Marketplaces API
+
+description: This article has information to help you migrate from the Consumption Marketplaces API.
++ Last updated : 07/15/2022++++++
+# Migrate from Consumption Marketplaces API
+
+This article discusses migration away from the [Consumption Marketplaces API](/rest/api/consumption/marketplaces/list). The Consumption Marketplaces API is deprecated. The date that the API will be turned off is still being determined. We recommend that you migrate away from the API as soon as possible.
+
+This article only applies to customers with an Enterprise Agreement or an MSDN, pay-as-you-go, or Visual Studio subscription.
+
+## Migration destinations
+
+We've merged Azure Marketplace and Azure usage records into a single usage details dataset. Read the [Choose a cost details solution](usage-details-best-practices.md) article before you choose the solution that's right for your workload. Generally, we recommend using [Exports](../costs/tutorial-export-acm-data.md) if you have ongoing data ingestion needs or a large monthly usage details dataset. For more information, see [Ingest usage details data](automation-ingest-usage-details-overview.md).
+
+If you have a smaller usage details dataset or a scenario that isn't met by Exports, consider using the [Cost Details](/rest/api/cost-management/generate-cost-details-report) report instead. For more information, see [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md).
+
+> [!NOTE]
+> The [Cost Details](/rest/api/cost-management/generate-cost-details-report) report is only available for customers with an Enterprise Agreement or Microsoft Customer Agreement. If you have an MSDN, pay-as-you-go, or Visual Studio subscription, you can migrate to Exports or continue using the Consumption Usage Details API.
+
+## Migration benefits
+
+New solutions provide many benefits over the Consumption Usage Details API. Here's a summary:
+
+- **Single dataset for all usage details** - Azure and Azure Marketplace usage details were merged into one dataset. It reduces the number of APIs that you need to call to get see all your charges.
+- **Scalability** - The Marketplaces API is deprecated because it promotes a call pattern that isn't able to scale as your Azure usage increases. The usage details dataset can get exceedingly large as you deploy more resources into the cloud. The Marketplaces API is a paginated synchronous API so it isn't optimized to effectively transfer large volumes of data over a network with high efficiency and reliability. Exports and the [Cost Details](/rest/api/cost-management/generate-cost-details-report) report are asynchronous. They provide you with a CSV file that can be directly downloaded over the network.
+- **API improvements** - Exports and the Cost Details API are the solutions that Azure supports moving forward. All new features are being integrated into them.
+- **Schema consistency** - The [Cost Details](/rest/api/cost-management/generate-cost-details-report) API and [Exports](../costs/tutorial-export-acm-data.md) process provide files with matching fields os you can move from one solution to the other, based on your scenario.
+- **Cost Allocation integration** - Enterprise Agreement and Microsoft Customer Agreement customers using Exports or the Cost Details API can view charges in relation to the cost allocation rules that they've configured. For more information about cost allocation, see [Allocate costs](../costs/allocate-costs.md).
+
+## Field differences
+
+The following table summarizes the field mapping needed to transition from the data provided by the Marketplaces API to Exports and the Cost Details API. Both of the solutions provide a CSV file download as opposed to the paginated JSON response that's provided by the Consumption API.
+
+Usage records can be identified as marketplace records in the combined dataset through the `PublisherType` field. Also, there are many new fields in the newer solutions that might be useful to you. For more information about available fields, see [Understand usage details fields](understand-usage-details-fields.md).
+
+| **Old Property** | **New Property** | **Notes** |
+| | | |
+| | PublisherType | Used to identify a marketplace usage record |
+| accountName | AccountName | |
+| additionalProperties | AdditionalInfo | |
+| costCenter | CostCenter | |
+| departmentName | BillingProfileName | |
+| billingPeriodId | | Use BillingPeriodStartDate / BillingPeriodEndDate |
+| usageStart | | Use Date |
+| usageEnd | | Use Date |
+| instanceName | ResourceName | |
+| instanceId | ResourceId | |
+| currency | BillingCurrencyCode | |
+| consumedQuantity | Quantity | |
+| pretaxCost | CostInBillingCurrency | |
+| isEstimated | | Not available |
+| meterId | MeterId | |
+| offerName | OfferId | |
+| resourceGroup | ResourceGroup | |
+| orderNumber | | Not available |
+| publisherName | PublisherName | |
+| planName | PlanName | |
+| resourceRate | EffectivePrice | |
+| subscriptionGuid | SubscriptionId | |
+| subscriptionName | SubscriptionName | |
+| unitOfMeasure | UnitOfMeasure | |
+| isRecurringCharge | | Where applicable, use the Frequency and Term fields moving forward. |
+
+## Next steps
+
+- Learn more about Cost Management automation at [Cost Management automation overview](automation-overview.md).
cost-management-billing Migrate Consumption Usage Details Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-consumption-usage-details-api.md
+
+ Title: Migrate from Consumption Usage Details API
+
+description: This article has information to help you migrate from the Consumption Usage Details API.
++ Last updated : 07/15/2022++++++
+# Migrate from Consumption Usage Details API
+
+This article discusses migration away from the [Consumption Usage Details API](/rest/api/consumption/usage-details/list). The Consumption Usage Details API is deprecated. The date that the API will be turned off is still being determined. We recommend that you migrate away from the API as soon as possible.
+
+## Migration destinations
+
+Read the [Choose a cost details solution](usage-details-best-practices.md) article before you choose which solution is right for your workload. Generally, we recommend [Exports](../costs/tutorial-export-acm-data.md) if you have ongoing data ingestion needs and or a large monthly usage details dataset. For more information, see [Ingest usage details data](automation-ingest-usage-details-overview.md).
+
+If you have a smaller usage details dataset or a scenario that isn't met by Exports, consider using the [Cost Details](/rest/api/cost-management/generate-cost-details-report) report instead. For more information, see [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md).
+
+> [!NOTE]
+> The [Cost Details](/rest/api/cost-management/generate-cost-details-report) report is only available for customers with an Enterprise Agreement or Microsoft Customer Agreement. If you have an MSDN, pay-as-you-go, or Visual Studio subscription, you can migrate to Exports or continue using the Consumption Usage Details API.
+
+## Migration benefits
+
+New solutions provide many benefits over the Consumption Usage Details API. Here's a summary:
+
+- **Single dataset for all usage details** - Azure and Azure Marketplace usage details were merged into one dataset. It reduces the number of APIs that you need to call to get see all your charges.
+- **Scalability** - The Marketplaces API is deprecated because it promotes a call pattern that isn't able to scale as your Azure usage increases. The usage details dataset can get extremely large as you deploy more resources into the cloud. The Marketplaces API is a paginated synchronous API so it isn't optimized to effectively transfer large volumes of data over a network with high efficiency and reliability. Exports and the [Cost Details](/rest/api/cost-management/generate-cost-details-report) API are asynchronous. They provide you with a CSV file that can be directly downloaded over the network.
+- **API improvements** - Exports and the Cost Details API are the solutions that Azure supports moving forward. All new features are being integrated into them.
+- **Schema consistency** - The [Cost Details](/rest/api/cost-management/generate-cost-details-report) report and [Exports](../costs/tutorial-export-acm-data.md) provide files with matching fields os you can move from one solution to the other, based on your scenario.
+- **Cost Allocation integration** - Enterprise Agreement and Microsoft Customer Agreement customers using Exports or the Cost Details API can view charges in relation to the cost allocation rules that they have configured. For more information about cost allocation, see [Allocate costs](../costs/allocate-costs.md).
++
+## Field Differences
+
+The following table summarizes the field differences between the Consumption Usage Details API and Exports/Cost Details API. Exports and the Cost Details API provide a CSV file download instead of the paginated JSON response that's provided by the Consumption API.
+
+## Enterprise Agreement field mapping
+
+Enterprise Agreement customers who are using the Consumption Usage Details API have usage details records of the kind `legacy`. A legacy usage details record is shown below. All Enterprise Agreement customers have records of this kind due to the underlying billing system that's used for them.
+
+```json
+{
+
+ "value": [
+
+ {
+
+ "id": "{id}",
+
+ "name": "{name}",
+
+ "type": "Microsoft.Consumption/usageDetails",
+
+ "kind": "legacy",
+
+ "tags": {
+
+ "env": "newcrp",
+
+ "dev": "tools"
+
+ },
+
+ "properties": {
+
+…...
+
+ }
+
+}
+```
+
+A full example legacy Usage Details record is shown at [Usage Details - List - REST API (Azure Consumption)](/rest/api/consumption/usage-details/list#billingaccountusagedetailslist-legacy)
+
+The following table provides a mapping between the old and new fields. New properties are available in the CSV files produced by Exports and the Cost Details API. To learn more about the fields, see [Understand usage details fields](understand-usage-details-fields.md).
+
+| **Old Property** | **New Property** |
+| | |
+| accountName | AccountName |
+| | AccountOwnerId |
+| additionalInfo | AdditionalInfo |
+| | AvailabilityZone |
+| billingAccountId | BillingAccountId |
+| billingAccountName | BillingAccountName |
+| billingCurrency | BillingCurrencyCode |
+| billingPeriodEndDate | BillingPeriodEndDate |
+| billingPeriodStartDate | BillingPeriodStartDate |
+| billingProfileId | BillingProfileId |
+| billingProfileName | BillingProfileName |
+| chargeType | ChargeType |
+| consumedService | ConsumedService |
+| cost | CostInBillingCurrency |
+| costCenter | CostCenter |
+| date | Date |
+| effectivePrice | EffectivePrice |
+| frequency | Frequency |
+| invoiceSection | InvoiceSectionName |
+| | InvoiceSectionId |
+| isAzureCreditEligible | IsAzureCreditEligible |
+| meterCategory | MeterCategory |
+| meterId | MeterId |
+| meterName | MeterName |
+| | MeterRegion |
+| meterSubCategory | MeterSubCategory |
+| offerId | OfferId |
+| partNumber | PartNumber |
+| | PayGPrice |
+| | PlanName |
+| | PricingModel |
+| product | ProductName |
+| | ProductOrderId |
+| | ProductOrderName |
+| | PublisherName |
+| | PublisherType |
+| quantity | Quantity |
+| | ReservationId |
+| | ReservationName |
+| resourceGroup | ResourceGroup |
+| resourceId | ResourceId |
+| resourceLocation | ResourceLocation |
+| resourceName | ResourceName |
+| serviceFamily | ServiceFamily |
+| | ServiceInfo1 |
+| | ServiceInfo2 |
+| subscriptionId | SubscriptionId |
+| subscriptionName | SubscriptionName |
+| | Tags |
+| | Term |
+| unitOfMeasure | UnitOfMeasure |
+| unitPrice | UnitPrice |
+| | CostAllocationRuleName |
+
+## Microsoft Customer Agreement field mapping
+
+Microsoft Customer Agreement customers that use the Consumption Usage Details API have usage details records of the kind `modern`. A modern usage details record is shown below. All Microsoft Customer Agreement customers have records of this kind due to the underlying billing system that is used for them.
+
+```json
+{
+
+ "value": [
+
+ {
+
+ "id": "{id}",
+
+ "name": "{name}",
+
+ "type": "Microsoft.Consumption/usageDetails",
+
+ "kind": "modern",
+
+ "tags": {
+
+ "env": "newcrp",
+
+ "dev": "tools"
+
+ },
+
+ "properties": {
+
+…...
+
+ }
+
+}
+```
+
+An full example legacy Usage Details record is shown at [Usage Details - List - REST API (Azure Consumption)](/rest/api/consumption/usage-details/list#billingaccountusagedetailslist-modern)
+
+A mapping between the old and new fields are shown in the following table. New properties are available in the CSV files produced by Exports and the Cost Details API. Fields that need a mapping due to differences across the solutions are shown in **bold text**.
+
+For more information, see [Understand usage details fields](understand-usage-details-fields.md).
+
+| **Old property** | **New property** |
+| | |
+| invoiceId | invoiceId |
+| previousInvoiceId | previousInvoiceId |
+| billingAccountId | billingAccountId |
+| billingAccountName | billingAccountName |
+| billingProfileId | billingProfileId |
+| billingProfileName | billingProfileName |
+| invoiceSectionId | invoiceSectionId |
+| invoiceSectionName | invoiceSectionName |
+| partnerTenantId | partnerTenantId |
+| partnerName | partnerName |
+| resellerName | resellerName |
+| resellerMpnId | resellerMpnId |
+| customerTenantId | customerTenantId |
+| customerName | customerName |
+| costCenter | costCenter |
+| billingPeriodEndDate | billingPeriodEndDate |
+| billingPeriodStartDate | billingPeriodStartDate |
+| servicePeriodEndDate | servicePeriodEndDate |
+| servicePeriodStartDate | servicePeriodStartDate |
+| date | date |
+| serviceFamily | serviceFamily |
+| productOrderId | productOrderId |
+| productOrderName | productOrderName |
+| consumedService | consumedService |
+| meterId | meterId |
+| meterName| meterName |
+| meterCategory | meterCategory |
+| meterSubCategory | meterSubCategory |
+| meterRegion | meterRegion |
+| **productIdentifier** | **ProductId** |
+| **product** | **ProductName** |
+| **subscriptionGuid** | **SubscriptionId** |
+| subscriptionName | subscriptionName |
+| publisherType | publisherType |
+| publisherId | publisherId |
+| publisherName | publisherName |
+| **resourceGroup** | **resourceGroupName** |
+| instanceName | ResourceId |
+| **resourceLocationNormalized** | **location** |
+| **resourceLocation** | **location** |
+| effectivePrice | effectivePrice |
+| quantity | quantity |
+| unitOfMeasure | unitOfMeasure |
+| chargeType | chargeType |
+| **billingCurrencyCode** | **billingCurrency** |
+| **pricingCurrencyCode** | **pricingCurrency** |
+| costInBillingCurrency | costInBillingCurrency |
+| costInPricingCurrency | costInPricingCurrency |
+| costInUsd | costInUsd |
+| paygCostInBillingCurrency | paygCostInBillingCurrency |
+| paygCostInUSD | paygCostInUsd |
+| exchangeRatePricingToBilling | exchangeRatePricingToBilling |
+| exchangeRateDate | exchangeRateDate |
+| isAzureCreditEligible | isAzureCreditEligible |
+| serviceInfo1 | serviceInfo1 |
+| serviceInfo2 | serviceInfo2 |
+| additionalInfo | additionalInfo |
+| tags | tags |
+| partnerEarnedCreditRate | partnerEarnedCreditRate |
+| partnerEarnedCreditApplied | partnerEarnedCreditApplied |
+| **marketPrice** | **PayGPrice** |
+| frequency | frequency |
+| term | term |
+| reservationId | reservationId |
+| reservationName | reservationName |
+| pricingModel | pricingModel |
+| unitPrice | unitPrice |
+| exchangeRatePricingToBilling | exchangeRatePricingToBilling |
+
+## Next steps
+
+- Learn more about Cost Management + Billing automation at [Cost Management automation overview](automation-overview.md).
cost-management-billing Migrate Ea Balance Summary Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-ea-balance-summary-api.md
+
+ Title: Migrate from EA Balance Summary API
+
+description: This article has information to help you migrate from the EA Balance Summary API.
++ Last updated : 07/15/2022++++++
+# Migrate from EA Balance Summary API
+
+EA customers who were previously using the Enterprise Reporting consumption.azure.com API to get their balance summary need to migrate to a replacement Azure Resource Manager API. Instructions to do this are outlined below along with any contract differences between the old API and the new API.
+
+## Assign permissions to an SPN to call the API
+
+Before calling the API, you need to configure a Service Principal with the correct permission. You use the service principal to call the API. For more information, see [Assign permissions to ACM APIs](cost-management-api-permissions.md).
+
+### Call the Balance Summary API
+
+Use the following request URIs when calling the new Balance Summary API. Your enrollment number should be used as the `billingAccountId`.
+
+#### Supported requests
+
+[Get for Enrollment](/rest/api/consumption/balances/getbybillingaccount)
++
+```http
+https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountId}/providers/Microsoft.Consumption/balances?api-version=2019-10-01
+```
+
+### Response body changes
+
+Old response body:
+
+```json
+{
+ "id": "enrollments/100/billingperiods/201507/balancesummaries",
+ "billingPeriodId": 201507,
+ "currencyCode": "USD",
+ "beginningBalance": 0,
+ "endingBalance": 1.1,
+ "newPurchases": 1,
+ "adjustments": 1.1,
+ "utilized": 1.1,
+ "serviceOverage": 1,
+ "chargesBilledSeparately": 1,
+ "totalOverage": 1,
+ "totalUsage": 1.1,
+ "azureMarketplaceServiceCharges": 1,
+ "newPurchasesDetails": [
+ {
+ "name": "",
+ "value": 1
+ }
+ ],
+ "adjustmentDetails": [
+ {
+ "name": "Promo Credit",
+ "value": 1.1
+ },
+ {
+ "name": "SIE Credit",
+ "value": 1
+ }
+ ]
+}
+```
+
+New response body:
+
+The same data is now available in the properties field of the new API response. There might be minor changes to the spelling on some of the field names.
+
+```json
+{
+ "id": "/providers/Microsoft.Billing/billingAccounts/123456/providers/Microsoft.Billing/billingPeriods/201702/providers/Microsoft.Consumption/balances/balanceId1",
+ "name": "balanceId1",
+ "type": "Microsoft.Consumption/balances",
+ "properties": {
+ "currency": "USD ",
+ "beginningBalance": 3396469.19,
+ "endingBalance": 2922371.02,
+ "newPurchases": 0,
+ "adjustments": 0,
+ "utilized": 474098.17,
+ "serviceOverage": 0,
+ "chargesBilledSeparately": 0,
+ "totalOverage": 0,
+ "totalUsage": 474098.17,
+ "azureMarketplaceServiceCharges": 609.82,
+ "billingFrequency": "Month",
+ "priceHidden": false,
+ "newPurchasesDetails": [
+ {
+ "name": "Promo Purchase",
+ "value": 1
+ }
+ ],
+ "adjustmentDetails": [
+ {
+ "name": "Promo Credit",
+ "value": 1.1
+ },
+ {
+ "name": "SIE Credit",
+ "value": 1
+ }
+ ]
+ }
+}
+```
+
+## Next steps
+
+- Read the [Migrate from EA Reporting to ARM APIs ΓÇô Overview](migrate-ea-reporting-arm-apis-overview.md) article.
cost-management-billing Migrate Ea Price Sheet Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-ea-price-sheet-api.md
+
+ Title: Migrate from the EA Price Sheet API
+
+description: This article has information to help you migrate from the EA Price Sheet API.
++ Last updated : 07/15/2022++++++
+# Migrate from EA Price Sheet API
+
+EA customers who were previously using the Enterprise Reporting consumption.azure.com API to get their price sheet need to migrate to a replacement Azure Resource Manager API. Instructions to do this are outlined below along with any contract differences between the old API and the new API.
+
+## Assign permissions to an SPN to call the API
+
+Before calling the API, you need to configure a Service Principal with the correct permission. You use the service principal to call the API. For more information, see [Assign permissions to ACM APIs](cost-management-api-permissions.md).
+
+### Call the Price Sheet API
+
+Use the following request URIs when calling the new Price Sheet API.
+
+#### Supported requests
+
+You can call the API using the following scopes:
+
+- Enrollment: `providers/Microsoft.Billing/billingAccounts/{billingAccountId}`
+- Subscription: `subscriptions/{subscriptionId}`
+
+[Get for current Billing Period](/rest/api/consumption/pricesheet/get)
+
+```http
+https://management.azure.com/{scope}/providers/Microsoft.Consumption/pricesheets/default?api-version=2019-10-01
+```
+
+[Get for specified Billing Period](/rest/api/consumption/pricesheet/getbybillingperiod)
+
+```http
+https://management.azure.com/{scope}/providers/Microsoft.Billing/billingPeriods/{billingPeriodName}/providers/Microsoft.Consumption/pricesheets/default?api-version=2019-10-01
+```
+
+#### Response body changes
+
+Old response:
+
+```json
+[
+ {
+ "id": "enrollments/57354989/billingperiods/201601/products/343/pricesheets",
+ "billingPeriodId": "201704",
+ "meterId": "dc210ecb-97e8-4522-8134-2385494233c0",
+ "meterName": "A1 VM",
+ "unitOfMeasure": "100 Hours",
+ "includedQuantity": 0,
+ "partNumber": "N7H-00015",
+ "unitPrice": 0.00,
+ "currencyCode": "USD"
+ },
+ {
+ "id": "enrollments/57354989/billingperiods/201601/products/2884/pricesheets",
+ "billingPeriodId": "201404",
+ "meterId": "dc210ecb-97e8-4522-8134-5385494233c0",
+ "meterName": "Locally Redundant Storage Premium Storage - Snapshots - AU East",
+ "unitOfMeasure": "100 GB",
+ "includedQuantity": 0,
+ "partNumber": "N9H-00402",
+ "unitPrice": 0.00,
+ "currencyCode": "USD"
+ },
+ ...
+ ]
+```
+
+New response:
+
+Old data is now in the `pricesheets` field of the new API response. Meter details information is also provided.
+
+```json
+{
+ "id": "/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Billing/billingPeriods/201702/providers/Microsoft.Consumption/pricesheets/default",
+ "name": "default",
+ "type": "Microsoft.Consumption/pricesheets",
+ "properties": {
+ "nextLink": "https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/microsoft.consumption/pricesheets/default?api-version=2018-01-31&$skiptoken=AQAAAA%3D%3D&$expand=properties/pricesheets/meterDetails",
+ "pricesheets": [
+ {
+ "billingPeriodId": "/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Billing/billingPeriods/201702",
+ "meterId": "00000000-0000-0000-0000-000000000000",
+ "unitOfMeasure": "100 Hours",
+ "includedQuantity": 100,
+ "partNumber": "XX-11110",
+ "unitPrice": 0.00000,
+ "currencyCode": "EUR",
+ "offerId": "OfferId 1",
+ "meterDetails": {
+ "meterName": "Data Transfer Out (GB)",
+ "meterCategory": "Networking",
+ "unit": "GB",
+ "meterLocation": "Zone 2",
+ "totalIncludedQuantity": 0,
+ "pretaxStandardRate": 0.000
+ }
+ }
+ ]
+ }
+}
+```
+
+## Next steps
+
+- Read the [Migrate from EA Reporting to ARM APIs overview](migrate-ea-reporting-arm-apis-overview.md) article.
cost-management-billing Migrate Ea Reporting Arm Apis Overview https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-ea-reporting-arm-apis-overview.md
+
+ Title: Migrate from EA Reporting to Azure Resource Manager APIs overview
+
+description: This article provides an overview about migrating from EA Reporting to Azure Resource Manager APIs.
++ Last updated : 07/15/2022++++++
+# Migrate from EA Reporting to Azure Resource Manager APIs overview
+
+This article helps developers that have built custom solutions using the [Azure Reporting APIs for Enterprise Customers](../manage/enterprise-api.md) to migrate to Azure Resource Manager APIs for Cost Management. Service principal support for the newer Azure Resource Manager APIs. Azure Resource Manager APIs are still in active development. Consider migrating to them instead of using the older Azure Reporting APIs for Enterprise customers. The older APIs are being deprecated. This article helps you understand the differences between the Reporting APIs and the Azure Resource Manager APIs, what to expect when you migrate to the Azure Resource Manager APIs, and the new capabilities that are available with the new Azure Resource Manager APIs.
+
+## API differences
+
+The following information describes the differences between the older Reporting APIs for Enterprise Customers and the newer Azure Resource Manager APIs.
+
+| Use | Enterprise Agreement APIs | Azure Resource Manager APIs |
+| | | |
+| Authentication | API key provisioned in the Enterprise Agreement (EA) portal | Azure Active Directory (Azure AD) Authentication using user tokens or service principals. Service principals take the place of API keys. |
+| Scopes and permissions | All requests are at the enrollment scope. API Key permission assignments will determine whether data for the entire enrollment, a department, or a specific account is returned. No user authentication. | Users or service principals are assigned access to the enrollment, department, or account scope. |
+| URI Endpoint | [https://consumption.azure.com](https://consumption.azure.com/) | [https://management.azure.com](https://management.azure.com/) |
+| Development status | In maintenance mode. On the path to deprecation. | In active development |
+| Available APIs | Limited to what's currently available | Equivalent APIs are available to replace each EA API. Additional [Cost Management APIs](/rest/api/cost-management/) are also available, including: <br>- Budgets<br>- Alerts<br>- Exports |
+
+## Migration checklist
+
+- Familiarize yourself with the [Azure Resource Manager REST APIs](/rest/api/azure).
+- Determine which EA APIs you use and see which Azure Resource Manager APIs to move to at [EA API mapping to new Azure Resource Manager APIs](../costs/migrate-from-enterprise-reporting-to-azure-resource-manager-apis.md#ea-api-mapping-to-new-azure-resource-manager-apis).
+- Configure service authorization and authentication for the Azure Resource Manager APIs. For more information, see [Assign permission to ACM APIs](cost-management-api-permissions.md).
+- Test the APIs and then update any programming code to replace EA API calls with Azure Resource Manager API calls.
+- Update error handling to use new error codes. Some considerations include:
+ - Azure Resource Manager APIs have a timeout period of 60 seconds.
+ - Azure Resource Manager APIs have rate limiting in place. This results in a `429 throttling error` if rates are exceeded. Build your solutions so that you don't make too many API calls in a short time period.
+- Review the other Cost Management APIs available through Azure Resource Manager and assess for use later. For more information, see [Use additional Cost Management APIs](../costs/migrate-from-enterprise-reporting-to-azure-resource-manager-apis.md#use-additional-cost-management-apis).
+
+## EA API mapping to new Azure Resource Manager APIs
+
+Use the following information to identify the EA APIs that you currently use and the replacement Azure Resource Manager API to use instead.
+
+| Scenario | EA APIs | Azure Resource Manager APIs |
+| | | |
+| [Migrate from EA Usage Details APIs](migrate-ea-usage-details-api.md) | [/usagedetails/download](/rest/api/billing/enterprise/billing-enterprise-api-usage-detail)<br>[/usagedetails/submit](/rest/api/billing/enterprise/billing-enterprise-api-usage-detail)<br>[/usagedetails](/rest/api/billing/enterprise/billing-enterprise-api-usage-detail)<br>[/usagedetailsbycustomdate](/rest/api/billing/enterprise/billing-enterprise-api-usage-detail) | Use [Microsoft.CostManagement/Exports](/rest/api/cost-management/exports/create-or-update) for all recurring data ingestion workloads. <br>Use the [Cost Details](/rest/api/cost-management/generate-cost-details-report) report for small on-demand datasets. |
+| [Migrate from EA Balance Summary APIs](migrate-ea-balance-summary-api.md) | [/balancesummary](/rest/api/billing/enterprise/billing-enterprise-api-balance-summary) | [Microsoft.Consumption/balances](/rest/api/consumption/balances/getbybillingaccount) |
+| [Migrate from EA Price Sheet APIs](migrate-ea-price-sheet-api.md) | [/pricesheet](/rest/api/billing/enterprise/billing-enterprise-api-pricesheet) | For negotiated prices, use [Microsoft.Consumption/pricesheets/default](/rest/api/consumption/pricesheet) <br> For retail prices, use [Retail Prices API](/rest/api/cost-management/retail-prices/azure-retail-prices) |
+| [Migrate from EA Reserved Instance Usage Details API](migrate-ea-reserved-instance-usage-details-api.md) | [/reservationdetails](/rest/api/billing/enterprise/billing-enterprise-api-reserved-instance-usage) | [Microsoft.CostManagement/generateReservationDetailsReport](/rest/api/cost-management/generatereservationdetailsreport) |
+| [Migrate from EA Reserved Instance Usage Summary APIs](migrate-ea-reserved-instance-usage-summary-api.md) | [/reservationsummaries](/rest/api/billing/enterprise/billing-enterprise-api-reserved-instance-usage) | [Microsoft.Consumption/reservationSummaries](/rest/api/consumption/reservationssummaries/list#reservationsummariesdailywithbillingaccountid) |
+| [Migrate from EA Reserved Instance Recommendations APIs](migrate-ea-reserved-instance-recommendations-api.md) | [/SharedReservationRecommendations](/rest/api/billing/enterprise/billing-enterprise-api-reserved-instance-recommendation)<br>[/SingleReservationRecommendations](/rest/api/billing/enterprise/billing-enterprise-api-reserved-instance-recommendation) | [Microsoft.Consumption/reservationRecommendations](/rest/api/consumption/reservationrecommendations/list) |
+| [Migrate from EA Reserved Instance Charges APIs](migrate-ea-reserved-instance-charges-api.md) | [/reservationcharges](/rest/api/billing/enterprise/billing-enterprise-api-reserved-instance-charges) | [Microsoft.Consumption/reservationTransactions](/rest/api/consumption/reservationtransactions/list) |
+
+## Use additional Cost Management APIs
+
+After you've migrated to Azure Resource Manager APIs for your existing reporting scenarios, you can use many other APIs, too. The APIs are also available through Azure Resource Manager and can be automated using service principal-based authentication. Here's a quick summary of the new capabilities that you can use.
+
+- [Budgets](/rest/api/consumption/budgets/createorupdate) - Use to set thresholds to proactively monitor your costs, alert relevant stakeholders, and automate actions in response to threshold breaches.
+- [Alerts](/rest/api/cost-management/alerts) - Use to view alert information including, but not limited to, budget alerts, invoice alerts, credit alerts, and quota alerts.
+- [Exports](/rest/api/cost-management/exports) - Use to schedule recurring data export of your charges to an Azure Storage account of your choice. It's the recommended solution for customers with a large Azure presence who want to analyze their data and use it in their own internal systems.
+
+## Next steps
+
+- Familiarize yourself with the [Azure Resource Manager REST APIs](/rest/api/azure).
+- If needed, determine which EA APIs you use and see which Azure Resource Manager APIs to move to at [EA API mapping to new Azure Resource Manager APIs](../costs/migrate-from-enterprise-reporting-to-azure-resource-manager-apis.md#ea-api-mapping-to-new-azure-resource-manager-apis).
+- If you're not already using Azure Resource Manager APIs, [register your client app with Azure AD](/rest/api/azure/#register-your-client-application-with-azure-ad).
+- If needed, update any of your programming code to use [Azure AD authentication](/rest/api/azure/#create-the-request) with your service principal.
cost-management-billing Migrate Ea Reserved Instance Charges Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-ea-reserved-instance-charges-api.md
+
+ Title: Migrate from the EA Reserved Instance Charges API
+
+description: This article has information to help you migrate from the EA Reserved Instance Charges API.
++ Last updated : 07/15/2022++++++
+# Migrate from EA Reserved Instance Charges API
+
+EA customers who were previously using the Enterprise Reporting consumption.azure.com API to obtain reserved instance charges need to migrate to a parity Azure Resource Manager API. Instructions to do this are outlined below along with any contract differences between the old API and the new API.
+
+## Assign permissions to an SPN to call the API
+
+Before calling the API, you need to configure a Service Principal with the correct permission. You use the service principal to call the API. For more information, see [Assign permissions to ACM APIs](cost-management-api-permissions.md).
+
+### Call the Reserved Instance Charges API
+
+Use the following request URIs to call the new Reserved Instance Charges API.
+
+#### Supported requests
+
+[Get Reservation Charges by Date Range](/rest/api/consumption/reservationtransactions/list)
+
+```http
+https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountId}/providers/Microsoft.Consumption/reservationTransactions?$filter=properties/eventDate+ge+2020-05-20+AND+properties/eventDate+le+2020-05-30&api-version=2019-10-01
+```
+
+#### Response body changes
+
+Old response:
+
+```json
+[
+ {
+ "purchasingEnrollment": "string",
+ "armSkuName": "Standard_F1s",
+ "term": "P1Y",
+ "region": "eastus",
+ "PurchasingsubscriptionGuid": "00000000-0000-0000-0000-000000000000",
+ "PurchasingsubscriptionName": "string",
+ "accountName": "string",
+ "accountOwnerEmail": "string",
+ "departmentName": "string",
+ "costCenter": "",
+ "currentEnrollment": "string",
+ "eventDate": "string",
+ "reservationOrderId": "00000000-0000-0000-0000-000000000000",
+ "description": "Standard_F1s eastus 1 Year",
+ "eventType": "Purchase",
+ "quantity": int,
+ "amount": double,
+ "currency": "string",
+ "reservationOrderName": "string"
+ }
+]
+```
+
+New response:
+
+```json
+{
+ "value": [
+ {
+ "id": "/billingAccounts/123456/providers/Microsoft.Consumption/reservationtransactions/201909091919",
+ "name": "201909091919",
+ "type": "Microsoft.Consumption/reservationTransactions",
+ "tags": {},
+ "properties": {
+ "eventDate": "2019-09-09T19:19:04Z",
+ "reservationOrderId": "00000000-0000-0000-0000-000000000000",
+ "description": "Standard_DS1_v2 westus 1 Year",
+ "eventType": "Cancel",
+ "quantity": 1,
+ "amount": -21,
+ "currency": "USD",
+ "reservationOrderName": "Transaction-DS1_v2",
+ "purchasingEnrollment": "123456",
+ "armSkuName": "Standard_DS1_v2",
+ "term": "P1Y",
+ "region": "westus",
+ "purchasingSubscriptionGuid": "11111111-1111-1111-1111-11111111111",
+ "purchasingSubscriptionName": "Infrastructure Subscription",
+ "accountName": "Microsoft Infrastructure",
+ "accountOwnerEmail": "admin@microsoft.com",
+ "departmentName": "Unassigned",
+ "costCenter": "",
+ "currentEnrollment": "123456",
+ "billingFrequency": "recurring"
+ }
+ },
+ ]
+}
+```
+
+## Next steps
+
+- Read the [Migrate from EA Reporting to ARM APIs overview](migrate-ea-reporting-arm-apis-overview.md) article.
cost-management-billing Migrate Ea Reserved Instance Recommendations Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-ea-reserved-instance-recommendations-api.md
+
+ Title: Migrate from the EA Reserved Instance Recommendations API
+
+description: This article has information to help you migrate from the EA Reserved Instance Recommendations API.
++ Last updated : 07/15/2022++++++
+# Migrate from EA Reserved Instance Recommendations API
+
+EA customers who were previously using the Enterprise Reporting consumption.azure.com API to obtain reserved instance recommendations need to migrate to a parity Azure Resource Manager API. Instructions to do this are outlined below along with any contract differences between the old API and the new API.
+
+## Assign permissions to an SPN to call the API
+
+Before calling the API, you need to configure a Service Principal with the correct permission. You use the service principal to call the API. For more information, see [Assign permissions to ACM APIs](cost-management-api-permissions.md).
+
+### Call the reserved instance recommendations API
+
+Use the following request URIs to call the new Reservation Recommendations API.
+
+#### Supported requests
+
+Call the API with the following scopes:
+
+- Enrollment: `providers/Microsoft.Billing/billingAccounts/{billingAccountId}`
+- Subscription: `subscriptions/{subscriptionId}`
+- Resource Groups: `subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}`
+
+[Get Recommendations](/rest/api/consumption/reservationrecommendations/list)
+
+Both the shared and the single scope recommendations are available through this API. You can also filter on the scope as an optional API parameter.
+
+```http
+https://management.azure.com/providers/Microsoft.Billing/billingAccounts/123456/providers/Microsoft.Consumption/reservationRecommendations?api-version=2019-10-01
+```
+
+#### Response body changes
+
+Recommendations for Shared and Single scopes are combined into one API.
+
+Old response:
+
+```json
+[{
+ "subscriptionId": "1111111-1111-1111-1111-111111111111",
+ "lookBackPeriod": "Last7Days",
+ "meterId": "2e3c2132-1398-43d2-ad45-1d77f6574933",
+ "skuName": "Standard_DS1_v2",
+ "term": "P1Y",
+ "region": "westus",
+ "costWithNoRI": 186.27634908960002,
+ "recommendedQuantity": 9,
+ "totalCostWithRI": 143.12931642978083,
+ "netSavings": 43.147032659819189,
+ "firstUsageDate": "2018-02-19T00:00:00"
+}
+]
+```
+
+New response:
+
+```json
+{
+ "value": [
+ {
+ "id": "billingAccount/123456/providers/Microsoft.Consumption/reservationRecommendations/00000000-0000-0000-0000-000000000000",
+ "name": "00000000-0000-0000-0000-000000000000",
+ "type": "Microsoft.Consumption/reservationRecommendations",
+ "location": "westus",
+ "sku": "Standard_DS1_v2",
+ "kind": "legacy",
+ "properties": {
+ "meterId": "00000000-0000-0000-0000-000000000000",
+ "term": "P1Y",
+ "costWithNoReservedInstances": 12.0785105,
+ "recommendedQuantity": 1,
+ "totalCostWithReservedInstances": 11.4899644807748,
+ "netSavings": 0.588546019225182,
+ "firstUsageDate": "2019-07-07T00:00:00-07:00",
+ "scope": "Shared",
+ "lookBackPeriod": "Last7Days",
+ "instanceFlexibilityRatio": 1,
+ "instanceFlexibilityGroup": "DSv2 Series",
+ "normalizedSize": "Standard_DS1_v2",
+ "recommendedQuantityNormalized": 1,
+ "skuProperties": [
+ {
+ "name": "Cores",
+ "value": "1"
+ },
+ {
+ "name": "Ram",
+ "value": "1"
+ }
+ ]
+ }
+ },
+ ]
+}
+```
+
+## Next steps
+
+- Read the [Migrate from EA Reporting to ARM APIs overview](migrate-ea-reporting-arm-apis-overview.md) article.
cost-management-billing Migrate Ea Reserved Instance Usage Details Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-ea-reserved-instance-usage-details-api.md
+
+ Title: Migrate from the EA Reserved Instance Usage Details API
+
+description: This article has information to help you migrate from the EA Reserved Instance Usage Details API.
++ Last updated : 07/15/2022++++++
+# Migrate from EA Reserved Instance Usage Details API
+
+EA customers who were previously using the Enterprise Reporting consumption.azure.com API to obtain reserved instance usage details need to migrate to a parity Azure Resource Manager API. Instructions to do this are outlined below along with any contract differences between the old API and the new API.
+
+## Assign permissions to an SPN to call the API
+
+Before calling the API, you need to configure a Service Principal with the correct permission. You use the service principal to call the API. For more information, see [Assign permissions to ACM APIs](cost-management-api-permissions.md).
+
+### Call the Reserved instance usage details API
+
+Microsoft isn't updating the older synchronous-based Reservation Details APIs. We recommend at you move to the newer SPN-supported asynchronous API call pattern as a part of the migration. Asynchronous requests better handle large amounts of data and reduce timeout errors.
+
+#### Supported requests
+
+Use the following request URIs when calling the new Asynchronous Reservation Details API. Your enrollment number should be used as the billingAccountId. You can call the API with the following scopes:
+
+- Enrollment: `providers/Microsoft.Billing/billingAccounts/{billingAccountId}`
+
+#### Sample request to generate a reservation details report
+
+```http
+POST https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountId}/providers/Microsoft.CostManagement/generateReservationDetailsReport?startDate={startDate}&endDate={endDate}&api-version=2019-11-01
+```
+
+#### Sample request to poll report generation status
+
+```http
+GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountId}/providers/Microsoft.CostManagement/reservationDetailsOperationResults/{operationId}?api-version=2019-11-01
+```
+
+#### Sample poll response
+
+```json
+{
+ "status": "Completed",
+ "properties": {
+ "reportUrl": "https://storage.blob.core.windows.net/details/20200911/00000000-0000-0000-0000-000000000000?sv=2016-05-31&sr=b&sig=jep8HT2aphfUkyERRZa5LRfd9RPzjXbzB%2F9TNiQ",
+ "validUntil": "2020-09-12T02:56:55.5021869Z"
+ }
+}
+```
+
+#### Response body changes
+
+The response of the older synchronous based Reservation Details API is below.
+
+Old response:
+
+```json
+{
+ "reservationOrderId": "00000000-0000-0000-0000-000000000000",
+ "reservationId": "00000000-0000-0000-0000-000000000000",
+ "usageDate": "2018-02-01T00:00:00",
+ "skuName": "Standard_F2s",
+ "instanceId": "/subscriptions/00000000-0000-0000-0000-000000000000/resourcegroups/resourvegroup1/providers/microsoft.compute/virtualmachines/VM1",
+ "totalReservedQuantity": 18.000000000000000,
+ "reservedHours": 432.000000000000000,
+ "usedHours": 400.000000000000000
+}
+```
+
+New response:
+
+The new API creates a CSV file for you. See the following file fields.
+
+| Old property | New property | Notes |
+| | | |
+| | InstanceFlexibilityGroup | New property for instance flexibility. |
+| | InstanceFlexibilityRatio | New property for instance flexibility. |
+| instanceId | InstanceName | |
+| | Kind | It's a new property. Value is `None`, `Reservation`, or `IncludedQuantity`. |
+| reservationId | ReservationId | |
+| reservationOrderId | ReservationOrderId | |
+| reservedHours | ReservedHours | |
+| skuName | SkuName | |
+| totalReservedQuantity | TotalReservedQuantity | |
+| usageDate | UsageDate | |
+| usedHours | UsedHours | |
+
+## Next steps
+
+- Read the [Migrate from EA Reporting to ARM APIs overview](migrate-ea-reporting-arm-apis-overview.md) article.
cost-management-billing Migrate Ea Reserved Instance Usage Summary Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-ea-reserved-instance-usage-summary-api.md
+
+ Title: Migrate from the EA Reserved Instance Usage Summary API
+
+description: This article has information to help you migrate from the EA Reserved Instance Usage Summary API.
++ Last updated : 07/15/2022++++++
+# Migrate from EA Reserved Instance Usage Summary API
+
+EA customers who were previously using the Enterprise Reporting consumption.azure.com API to obtain reserved instance usage summaries need to migrate to a parity Azure Resource Manager API. Instructions to do this are outlined below along with any contract differences between the old API and the new API.
+
+## Assign permissions to an SPN to call the API
+
+Before calling the API, you need to configure a Service Principal with the correct permission. You use the service principal to call the API. For more information, see [Assign permissions to ACM APIs](cost-management-api-permissions.md).
+
+### Call the Reserved Instance Usage Summary API
+
+Use the following request URIs to call the new Reservation Summaries API.
+
+#### Supported requests
+
+Call the API with the following scopes:
+
+- Enrollment: `providers/Microsoft.Billing/billingAccounts/{billingAccountId}`
+
+[Get Reservation Summary Daily](/rest/api/consumption/reservationssummaries/list#reservationsummariesdailywithbillingaccountid)
+
+```http
+https://management.azure.com/{scope}/Microsoft.Consumption/reservationSummaries?grain=daily&$filter=properties/usageDate ge 2017-10-01 AND properties/usageDate le 2017-11-20&api-version=2019-10-01
+```
+
+[Get Reservation Summary Monthly](/rest/api/consumption/reservationssummaries/list#reservationsummariesmonthlywithbillingaccountid)
+
+```http
+https://management.azure.com/{scope}/Microsoft.Consumption/reservationSummaries?grain=daily&$filter=properties/usageDate ge 2017-10-01 AND properties/usageDate le 2017-11-20&api-version=2019-10-01
+```
+
+#### Response body changes
+
+Old response:
+
+```json
+[
+ {
+ "reservationOrderId": "00000000-0000-0000-0000-000000000000",
+ "reservationId": "00000000-0000-0000-0000-000000000000",
+ "skuName": "Standard_F1s",
+ "reservedHours": 24,
+ "usageDate": "2018-05-01T00:00:00",
+ "usedHours": 23,
+ "minUtilizationPercentage": 0,
+ "avgUtilizationPercentage": 95.83,
+ "maxUtilizationPercentage": 100
+ }
+]
+```
+
+New response:
+
+```json
+{
+ "value": [
+ {
+ "id": "/providers/Microsoft.Billing/billingAccounts/12345/providers/Microsoft.Consumption/reservationSummaries/reservationSummaries_Id1",
+ "name": "reservationSummaries_Id1",
+ "type": "Microsoft.Consumption/reservationSummaries",
+ "tags": null,
+ "properties": {
+ "reservationOrderId": "00000000-0000-0000-0000-000000000000",
+ "reservationId": "00000000-0000-0000-0000-000000000000",
+ "skuName": "Standard_B1s",
+ "reservedHours": 720,
+ "usageDate": "2018-09-01T00:00:00-07:00",
+ "usedHours": 0,
+ "minUtilizationPercentage": 0,
+ "avgUtilizationPercentage": 0,
+ "maxUtilizationPercentage": 0
+ }
+ }
+ ]
+}
+```
+
+## Next steps
+
+- Read the [Migrate from EA Reporting to ARM APIs overview](migrate-ea-reporting-arm-apis-overview.md) article.
cost-management-billing Migrate Ea Usage Details Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/migrate-ea-usage-details-api.md
+
+ Title: Migrate from the EA Usage Details APIs
+
+description: This article has information to help you migrate from the EA Usage Details APIs.
++ Last updated : 07/15/2022++++++
+# Migrate from EA Usage Details APIs
+
+EA customers who were previously using the Enterprise Reporting APIs behind the *consumption.azure.com* endpoint to obtain usage details and marketplace charges need to migrate to new and improved solutions. Instructions are outlined below along with contract differences between the old API and the new solutions.
+
+The dataset is referred to as *cost details* instead of *usage details*.
+
+## New solutions generally available
+
+The following table provides a summary of the migration destinations that are available along with a summary of what to consider when choosing which solution is best for you.
+
+| Solution | Description | Considerations | Onboarding info |
+| | | | |
+| **Exports** | Recurring data dumps to storage on a schedule | - The most scalable solution for your workloads.<br> - Can be configured to use file partitioning for bigger datasets.<br> - Great for establishing and growing a cost dataset that can be integrated with your own queryable data stores.<br> -Requires access to a storage account that can hold the data. | - [Configure in Azure portal](../costs/tutorial-export-acm-data.md)<br>[Automate Export creation with the API](../costs/ingest-azure-usage-at-scale.md)<br> - [Export API Reference](/rest/api/cost-management/exports/create-or-update) |
+| **Cost Details API** | On demand download | - Useful for small cost datasets.<br> - Useful for scenarios when Exports to Azure storage aren't feasible due to security or manageability concerns. | - [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md)<br> - [Cost Details](/rest/api/cost-management/generate-cost-details-report) API |
+
+Generally we recommend using [Exports](../costs/tutorial-export-acm-data.md) if you have ongoing data ingestion needs and/or a large monthly cost details dataset. For more information, see [Ingest cost details data](automation-ingest-usage-details-overview.md). If you need additional information to help you make a decision for your workload, see [Choose a cost details solution](usage-details-best-practices.md).
+
+### Assign permissions to an SPN to call the APIs
+
+If you're looking to call either the Exports or Cost Details APIs programmatically, you'll need to configure a Service Principal with the correct permission. For more information, see [Assign permissions to ACM APIs](cost-management-api-permissions.md).
+
+### Avoid the Microsoft Consumption Usage Details API
+
+The [Consumption Usage Details API](/rest/api/consumption/usage-details/list) is another endpoint that currently supports EA customers. Don't migrate to this API. Migrate to either Exports or the Cost Details API, as outlined earlier in this document. The Consumption Usage Details API will be deprecated in the future and is located behind the endpoint below.
+
+```http
+GET https://management.azure.com/{scope}/providers/Microsoft.Consumption/usageDetails?api-version=2021-10-01
+```
+
+This API is a synchronous endpoint and will be unable to scale as both your spending and the size of your month over month cost dataset increases. If you're currently using the Consumption Usage Details API, we recommend migrating off of it to either Exports of the Cost Details API as soon as possible. A formal deprecation announcement will be made at a future date and a timeline for retirement will be provided. To learn more about migrating away from Consumption Usage Details, see [Migrate from Consumption Usage Details API](migrate-consumption-usage-details-api.md).
+
+## Migration benefits
+
+Our new solutions provide many benefits over the EA Reporting Usage Details APIs. Here's a summary:
+
+- **Security and stability** - New solutions require Service Principal and/or user tokens in order to access data. They're more secure than the API keys that are used for authenticating to the EA Reporting APIs. Keys in these legacy APIs are valid for six months and can expose sensitive financial data if leaked. Additionally, if keys aren't renewed and integrated into workloads prior to their six month expiry data access is revoked. This breaks customer workloads.
+- **Scalability** - The EA Reporting APIs aren't built to scale well as your Azure usage increases. The usage details dataset can get exceedingly large as you deploy more resources into the cloud. The new solutions are asynchronous and have extensive infrastructure enhancements behind them to ensure successful downloads for any size dataset.
+- **Single dataset for all usage details** - Azure and Azure Marketplace usage details have been merged into one dataset in the new solutions. The single dataset reduces the number of APIs that you need to call to see all your charges.
+- **Purchase amortization** - Customers who purchase Reservations can see an Amortized view of their costs using the new solutions.
+- **Schema consistency** - Each solution that is available provides files with matching fields. It allows you to easily move between solutions based on your scenario.
+- **Cost Allocation integration** - Enterprise Agreement and Microsoft Customer Agreement customers can use the new solution to view charges in relation to the cost allocation rules that they've configured. For more information about cost allocation, see [Allocate costs](../costs/allocate-costs.md).
+- **Go forward improvements** - The new solutions are being actively developed moving forward. They'll receive all new features as they're released.
+
+## Enterprise Usage APIs to migrate off
+
+The table below summarizes the different APIs that you may be using today to ingest cost details data. If you're using one of the APIs below, you'll need to migrate to one of the new solutions outlined above. All APIs below are behind the *https://consumption.azure.com* endpoint.
+
+| Endpoint | API Comments |
+| | |
+| `/v3/enrollments/{enrollmentNumber}/usagedetails/download?billingPeriod={billingPeriod}` | - API method: GET<br> - Synchronous (non polling)<br> - Data format: CSV |
+| `/v3/enrollments/{enrollmentNumber}/usagedetails/download?startTime=2017-01-01&endTime=2017-01-10` | - API method: GET <br> - Synchronous (non polling)<br> - Data format: CSV |
+| `/v3/enrollments/{enrollmentNumber}/usagedetails` | - API method: GET<br> - Synchronous (non polling)<br> - Data format: JSON |
+| `/v3/enrollments/{enrollmentNumber}/billingPeriods/{billingPeriod}/usagedetails` | - API method: GET<br> - Synchronous (non polling)<br> - Data format: JSON |
+| `/v3/enrollments/{enrollmentNumber}/usagedetailsbycustomdate?startTime=2017-01-01&endTime=2017-01-10` | - API method: GET<br> - Synchronous (non polling)<br> - Data format: JSON |
+| `/v3/enrollments/{enrollmentNumber}/usagedetails/submit?billingPeriod={billingPeriod}` | - API method: POST<br> - Asynchronous (polling based)<br> - Data format: CSV |
+| `/v3/enrollments/{enrollmentNumber}/usagedetails/submit?startTime=2017-04-01&endTime=2017-04-10` | - API method: POST<br> - Asynchronous (polling based)<br> - Data format: CSV |
+
+## Enterprise Marketplace Store Charge APIs to migrate off
+
+In addition to the usage details APIs outlined above, you'll need to migrate off the [Enterprise Marketplace Store Charge APIs](/rest/api/billing/enterprise/billing-enterprise-api-marketplace-storecharge). All Azure and Marketplace charges have been merged into a single file that is available through the new solutions. You can identify which charges are *Azure* versus *Marketplace* charges by using the `PublisherType` field that is available in the new dataset. The table below outlines the applicable APIs. All of the following APIs are behind the *https://consumption.azure.com* endpoint.
+
+| Endpoint | API Comments |
+| | |
+| `/v3/enrollments/{enrollmentNumber}/marketplacecharges` | - API method: GET<br> - Synchronous (non polling)<br> - Data format: JSON |
+| `/v3/enrollments/{enrollmentNumber}/billingPeriods/{billingPeriod}/marketplacecharges` | - API method: GET<br> - Synchronous (non polling)<br> - Data format: JSON |
+| `/v3/enrollments/{enrollmentNumber}/marketplacechargesbycustomdate?startTime=2017-01-01&endTime=2017-01-10` | - API method: GET<br> - Synchronous (non polling)<br> - Data format: JSON |
+
+## Data field mapping
+
+The table below provides a summary of the old fields available in the solutions you're currently using along with the field to use in the new solutions.
+
+| Old field | New field | Comments |
+| | | |
+| serviceName | MeterCategory | |
+| serviceTier | MeterSubCategory | |
+| location | ResourceLocation | |
+| chargesBilledSeparately | isAzureCreditEligible | The properties are opposites. If isAzureCreditEnabled is true, ChargesBilledSeparately would be false. |
+| partNumber | PartNumber | |
+| resourceGuid | MeterId | |
+| offerId | OfferId | |
+| cost | CostInBillingCurrency | |
+| accountId | AccountId | |
+| resourceLocationId | | Not available. |
+| consumedServiceId | ConsumedService | |
+| departmentId | InvoiceSectionId | |
+| accountOwnerEmail | AccountOwnerId | |
+| accountName | AccountName | |
+| subscriptionId | SubscriptionId | |
+| subscriptionGuid | SubscriptionId | |
+| subscriptionName | SubscriptionName | |
+| date | Date | |
+| product | ProductName | |
+| meterId | MeterId | |
+| meterCategory | MeterCategory | |
+| meterSubCategory | MeterSubCategory | |
+| meterRegion | MeterRegion | |
+| meterName | MeterName | |
+| consumedQuantity | Quantity | |
+| resourceRate | EffectivePrice | |
+| resourceLocation | ResourceLocation | |
+| consumedService | ConsumedService | |
+| instanceId | ResourceId | |
+| serviceInfo1 | ServiceInfo1 | |
+| serviceInfo2 | ServiceInfo2 | |
+| additionalInfo | AdditionalInfo | |
+| tags | Tags | |
+| storeServiceIdentifier | | Not available. |
+| departmentName | InvoiceSectionName | |
+| costCenter | CostCenter | |
+| unitOfMeasure | UnitOfMeasure | |
+| resourceGroup | ResourceGroup | |
+| isRecurringCharge | | Where applicable, use the Frequency and Term fields moving forward. |
+| extendedCost | CostInBillingCurrency | |
+| planName | PlanName | |
+| publisherName | PublisherName | |
+| orderNumber | | Not available. |
+| usageStartDate | Date | |
+| usageEndDate | Date | |
+
+## Next steps
+
+- Read the [Migrate from EA Reporting to Azure Resource Manager APIs overview](migrate-ea-reporting-arm-apis-overview.md) article.
cost-management-billing Partner Automation https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/partner-automation.md
+
+ Title: Cost Management automation for partners
+
+description: This article explains how Microsoft partners and their customers can use Cost Management APIs for common tasks.
++ Last updated : 07/15/2022++++++
+# Automation for partners
+
+Azure Cost Management is natively available for direct partners who have onboarded their customers to a Microsoft Customer Agreement and have [purchased an Azure Plan](/partner-center/purchase-azure-plan). Partners and their customers can use Cost Management APIs common tasks. For more information about non-automation scenarios, see [Cost Management for Partners](../costs/get-started-partners.md).
+
+## Azure Cost Management APIs - Direct and indirect providers
+
+Partners with access to billing scopes in a partner tenant can use the following APIs to view invoiced costs.
+
+APIs at the subscription scope can be called by a partner regardless of the cost policy, as long as they have access to the subscription. Other users with access to the subscription, like the customer or reseller, can call the APIs only after the partner enables the cost policy for the customer tenant.
+
+### To get a list of billing accounts
+
+```http
+GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts?api-version=2019-10-01-preview
+```
+
+### To get a list of customers
+
+```http
+GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/customers?api-version=2019-10-01-preview
+```
+
+### To get a list of subscriptions
+
+```http
+GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/billingSubscriptions?api-version=2019-10-01-preview
+```
+
+### To get a list of invoices for a period of time
+
+```http
+GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/invoices?api-version=2019-10-01-preview&periodStartDate={periodStartDate}&periodEndDate={periodEndDate}
+```
+The API call returns an array of invoices that has elements similar to the following JSON code.
+
+```json
+ { "id": "/providers/Microsoft.Billing/billingAccounts/{billingAccountID}/billingProfiles/{BillingProfileID}/invoices/{InvoiceID}", "name": "{InvoiceID}", "properties": { "amountDue": { "currency": "USD", "value": x.xx }, ... }
+```
+
+Use the preceding returned ID field value and replace it in the following example as the scope to query for usage details.
+
+```http
+GET https://management.azure.com/{id}/providers/Microsoft.Consumption/UsageDetails?api-version=2019-10-01
+```
+
+The example returns the usage records associated with the specific invoice.
+
+### To get the policy for customers to view costs
+
+```http
+GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/customers/{customerID}/policies/default?api-version=2019-10-01-preview
+```
+
+### To set the policy for customers to view costs
+
+```http
+PUT https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/customers/{customerID}/policies/default?api-version=2019-10-01-preview
+```
+
+### To get Azure service usage for a billing account
+
+We recommend that you configure an Export for these scenarios. For more information, see [Retrieve large usage datasets with exports](../costs/ingest-azure-usage-at-scale.md).
+
+### To download a customer's Azure service usage
+
+We recommend that you configure an Export for this scenario as well. If you need to download the data on demand, however, you can use the [Cost Details](/rest/api/cost-management/generate-cost-details-report) API. For more information, see [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md).
+
+### To get or download the price sheet for consumed Azure services
+
+First, use the following post.
+
+```http
+POST https://management.azure.com/providers/Microsoft.Billing/BillingAccounts/{billingAccountName}/billingProfiles/{billingProfileID}/pricesheet/default/download?api-version=2019-10-01-preview&format=csv" -verbose
+```
+
+Then, call the asynchronous operation property value. For example:
+
+```http
+GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/billingProfiles/{billingProfileID}/pricesheetDownloadOperations/{operation}?sessiontoken=0:11186&api-version=2019-10-01-preview
+```
+
+The preceding get call returns the download link containing the price sheet.
+
+### To get aggregated costs
+
+```http
+POST https://management.azure.com/providers/microsoft.billing/billingAccounts/{billingAccountName}/providers/microsoft.costmanagement/query?api-version=2019-10-01
+```
+
+### Create a budget for a partner
+
+```http
+PUT https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/providers/Microsoft.CostManagement/budgets/partnerworkshopbudget?api-version=2019-10-01
+```
+
+### Create a budget for a customer
+
+```http
+PUT https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/customers/{customerID}/providers/Microsoft.Consumption/budgets/{budgetName}?api-version=2019-10-01
+```
+
+### Delete a budget
+
+```http
+DELETE https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountId}/providers/Microsoft.CostManagement/budgets/{budgetName}?api-version=2019-10-01
+```
+
+## Next steps
+
+- Learn more about Cost Management automation at [Cost Management automation overview](automation-overview.md).
+Automation scenarios.
+- [Get started with Azure Cost Management for partners](../costs/get-started-partners.md#cost-management-rest-apis).
+- [Retrieve large usage datasets with exports](../costs/ingest-azure-usage-at-scale.md).
+- [Understand usage details fields](understand-usage-details-fields.md).
cost-management-billing Tutorial Seed Historical Cost Dataset Exports Api https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/tutorial-seed-historical-cost-dataset-exports-api.md
+
+ Title: Tutorial - Seed a historical cost dataset with the Exports API
+description: This tutorial helps your seed a historical cost dataset to visualize cost trends over time.
++ Last updated : 07/15/2022++++++
+# Tutorial: Seed a historical cost dataset with the Exports API
+
+Large organizations often need to analyze their historical costs going back a year or more. Creating the dataset might be needed for targeted one-time inquiries or to set up reporting dashboards to visualize cost trends over time. In either case, you need a way to get the data reliably so that you can load it into a data store that you can query. After your historical cost dataset is seeded, your data store can then be updated as new costs come in so that your reporting is kept up to date. Historical costs rarely change and if so, you'll be notified. So we recommend that you refresh your historical costs no more than once a month.
+
+In this tutorial, you learn how to:
+
+> [!div class="checklist"]
+> * Get a bearer token for your service principal
+> * Format the request
+> * Execute the requests in one-month chunks
+
+## Prerequisites
+
+You need proper permissions to successfully call the Exports API. We recommend using a Service Principal in automation scenarios.
+
+- To learn more, see [Assign permissions to Cost Management APIs](cost-management-api-permissions.md).
+- To learn more about the specific permissions needed for the Exports API, see [Understand and work with scopes](../costs/understand-work-scopes.md).
+
+Additionally, you'll need a way to query the API directly. For this tutorial, we recommend using [PostMan](https://www.postman.com/).
+
+## Get a bearer token for your service principal
+
+To learn how to get a bearer token with a service principal, see [Acquire an Access token](/rest/api/azure/#acquire-an-access-token).
+
+## Format the request
+
+See the following example request and create your own one-time data Export. The following example request creates a one-month Actual Cost dataset in the specified Azure storage account. We recommend that you request no more than one month's of data per report. If you have a large dataset every month, we recommend setting `partitionData = true` for your one-time export to split it into multiple files. For more information, see [File partitioning for large datasets](../costs/tutorial-export-acm-data.md#file-partitioning-for-large-datasets).
+
+```http
+PUT https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{enrollmentId}/providers/Microsoft.CostManagement/exports/{ExportName}?api-version=2021-10-01
+```
++
+**Request Headers**
+
+```
+Authorization: <YOUR BEARER TOKEN>
+Accept: /*/
+Content-Type: application/json
+```
+
+**Request Body**
+
+```json
+{
+ "properties": {
+ "definition": {
+ "dataset": {
+ "granularity": "Daily",
+ "grouping": []
+ },
+ "timePeriod": {
+ "from": "2021-09-01T00:00:00.000Z",
+ "to": "2021-09-30T00:00:00.000Z"
+ },
+ "timeframe": "Custom",
+ "type": "ActualCost"
+ },
+ "deliveryInfo": {
+ "destination": {
+ "container": "{containerName}",
+ "rootFolderPath": "{folderName}",
+ "resourceId": "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}"
+ }
+ },
+ "format": "Csv",
+ "partitionData": false
+ }
+}
+```
+
+## Create Exports in one-month chunks
+
+We recommend creating one-time data exports in one month chunks. If you want to seed a one-year historical dataset, then you should execute 12 Exports API requests - one for each month. After you've seeded your historical dataset, you can then create a scheduled export to continue populating your cost data in Azure storage as your charges accrue over time.
+
+## Run each Export
+
+Now that you have created the Export for each month, you need to manually run each by calling the [Execute API](/rest/api/cost-management/exports/execute). An example request to the API is below.
+
+```http
+POST https://management.azure.com/{scope}/providers/Microsoft.CostManagement/exports/{exportName}/run?api-version=2021-10-01
+```
+
+## Next steps
+
+In this tutorial, you learned how to:
+
+> [!div class="checklist"]
+> * Get a bearer token for your service principal
+> * Format the request
+> * Execute the requests in one-month chunks
+
+To learn more about cost details, see [ingest cost details data](automation-ingest-usage-details-overview.md).
+
+To learn more about what data is available in the cost details dataset, see [Understand cost details data fields](understand-usage-details-fields.md).
cost-management-billing Understand Usage Details Fields https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/understand-usage-details-fields.md
+
+ Title: Understand usage details fields
+
+description: This article describes the fields in the usage data files.
++ Last updated : 07/15/2022++++++
+# Understand cost details fields
+
+This document describes the cost details (formerly known as usage details) fields found in files from using [Azure portal download](../understand/download-azure-daily-usage.md), [Exports](../costs/tutorial-export-acm-data.md) from Cost Management, or the [Cost Details](/rest/api/cost-management/generate-cost-details-report) API. For more information about cost details best practices, see [Choose a cost details solution](usage-details-best-practices.md).
+
+## Migration to new cost details formats
+
+If you're using an older cost details solution and want to migrate to Exports or the Cost Details API, read the following articles.
+
+- [Migrate from Enterprise Usage Details APIs](migrate-ea-usage-details-api.md)
+- [Migrate from EA to MCA APIs](../costs/migrate-cost-management-api.md)
+- [Migrate from Consumption Usage Details API](migrate-consumption-usage-details-api.md)
+
+## List of fields and descriptions
+
+The following table describes the important terms used in the latest version of the cost details file. The list covers pay-as-you-go (also called Microsoft Online Services Program), Enterprise Agreement (EA), and Microsoft Customer Agreement (MCA) accounts. To identify what account type you are, see [supported Microsoft Azure offers](../costs/understand-cost-mgt-data.md#supported-microsoft-azure-offers).
+
+| Term | Account type | Description |
+| | | |
+| AccountName | EA, pay-as-you-go | Display name of the EA enrollment account or pay-as-you-go billing account. |
+| AccountOwnerId┬╣ | EA, pay-as-you-go | Unique identifier for the EA enrollment account or pay-as-you-go billing account. |
+| AdditionalInfo | All | Service-specific metadata. For example, an image type for a virtual machine. |
+| BillingAccountId┬╣ | All | Unique identifier for the root billing account. |
+| BillingAccountName | All | Name of the billing account. |
+| BillingCurrency | All | Currency associated with the billing account. |
+| BillingPeriod | EA, pay-as-you-go | The billing period of the charge. |
+| BillingPeriodEndDate | All | The end date of the billing period. |
+| BillingPeriodStartDate | All | The start date of the billing period. |
+| BillingProfileId┬╣ | All | Unique identifier of the EA enrollment, pay-as-you-go subscription, MCA billing profile, or AWS consolidated account. |
+| BillingProfileName | All | Name of the EA enrollment, pay-as-you-go subscription, MCA billing profile, or AWS consolidated account. |
+| ChargeType | All | Indicates whether the charge represents usage (**Usage**), a purchase (**Purchase**), or a refund (**Refund**). |
+| ConsumedService | All | Name of the service the charge is associated with. |
+| CostCenter┬╣ | EA, MCA | The cost center defined for the subscription for tracking costs (only available in open billing periods for MCA accounts). |
+| Cost | EA, pay-as-you-go | See CostInBillingCurrency. |
+| CostInBillingCurrency | MCA | Cost of the charge in the billing currency before credits or taxes. |
+| CostInPricingCurrency | MCA | Cost of the charge in the pricing currency before credits or taxes. |
+| Currency | EA, pay-as-you-go | See `BillingCurrency`. |
+| Date┬╣ | All | The usage or purchase date of the charge. |
+| EffectivePrice | All | Blended unit price for the period. Blended prices average out any fluctuations in the unit price, like graduated tiering, which lowers the price as quantity increases over time. |
+| ExchangeRateDate | MCA | Date the exchange rate was established. |
+| ExchangeRatePricingToBilling | MCA | Exchange rate used to convert the cost in the pricing currency to the billing currency. |
+| Frequency | All | Indicates whether a charge is expected to repeat. Charges can either happen once (**OneTime**), repeat on a monthly or yearly basis (**Recurring**), or be based on usage (**UsageBased**). |
+| InvoiceId | pay-as-you-go, MCA | The unique document ID listed on the invoice PDF. |
+| InvoiceSection | MCA | See `InvoiceSectionName`. |
+| InvoiceSectionId┬╣ | EA, MCA | Unique identifier for the EA department or MCA invoice section. |
+| InvoiceSectionName | EA, MCA | Name of the EA department or MCA invoice section. |
+| IsAzureCreditEligible | All | Indicates if the charge is eligible to be paid for using Azure credits (Values: `True` or `False`). |
+| Location | MCA | Datacenter location where the resource is running. |
+| MeterCategory | All | Name of the classification category for the meter. For example, _Cloud services_ and _Networking_. |
+| MeterId┬╣ | All | The unique identifier for the meter. |
+| MeterName | All | The name of the meter. |
+| MeterRegion | All | Name of the datacenter location for services priced based on location. See Location. |
+| MeterSubCategory | All | Name of the meter subclassification category. |
+| OfferId┬╣ | All | Name of the offer purchased. |
+| pay-as-you-goPrice | All | Retail price for the resource. |
+| PartNumber┬╣ | EA, pay-as-you-go | Identifier used to get specific meter pricing. |
+| PlanName | EA, pay-as-you-go | Marketplace plan name. |
+| PreviousInvoiceId | MCA | Reference to an original invoice if the line item is a refund. |
+| PricingCurrency | MCA | Currency used when rating based on negotiated prices. |
+| PricingModel | All | Identifier that indicates how the meter is priced. (Values: `On Demand`, `Reservation`, and `Spot`) |
+| Product | All | Name of the product. |
+| ProductId┬╣ | MCA | Unique identifier for the product. |
+| ProductOrderId | All | Unique identifier for the product order. |
+| ProductOrderName | All | Unique name for the product order. |
+| Provider | All | Identifier for product category or Line of Business. For example, Azure, Microsoft 365, and AWS. |
+| PublisherName | All | Publisher for Marketplace services. |
+| PublisherType | All | Type of publisher (Values: **Azure**, **AWS**, **Marketplace**). |
+| Quantity | All | The number of units purchased or consumed. |
+| ReservationId | EA, MCA | Unique identifier for the purchased reservation instance. |
+| ReservationName | EA, MCA | Name of the purchased reservation instance. |
+| ResourceGroup | All | Name of the [resource group](../../azure-resource-manager/management/overview.md) the resource is in. Not all charges come from resources deployed to resource groups. Charges that don't have a resource group will be shown as null or empty, **Others**, or **Not applicable**. |
+| ResourceId┬╣ | All | Unique identifier of the [Azure Resource Manager](/rest/api/resources/resources) resource. |
+| ResourceLocation | All | Datacenter location where the resource is running. See `Location`. |
+| ResourceName | EA, pay-as-you-go | Name of the resource. Not all charges come from deployed resources. Charges that don't have a resource type will be shown as null/empty, **Others** , or **Not applicable**. |
+| ResourceType | MCA | Type of resource instance. Not all charges come from deployed resources. Charges that don't have a resource type will be shown as null/empty, **Others** , or **Not applicable**. |
+| ServiceFamily | MCA | Service family that the service belongs to. |
+| ServiceInfo┬╣ | All | Service-specific metadata. |
+| ServiceInfo2 | All | Legacy field with optional service-specific metadata. |
+| ServicePeriodEndDate | MCA | The end date of the rating period that defined and locked pricing for the consumed or purchased service. |
+| ServicePeriodStartDate | MCA | The start date of the rating period that defined and locked pricing for the consumed or purchased service. |
+| SubscriptionId┬╣ | All | Unique identifier for the Azure subscription. |
+| SubscriptionName | All | Name of the Azure subscription. |
+| Tags┬╣ | All | Tags assigned to the resource. Doesn't include resource group tags. Can be used to group or distribute costs for internal chargeback. For more information, see [Organize your Azure resources with tags](https://azure.microsoft.com/updates/organize-your-azure-resources-with-tags/). |
+| Term | All | Displays the term for the validity of the offer. For example: In case of reserved instances, it displays 12 months as the Term. For one-time purchases or recurring purchases, Term is one month (SaaS, Marketplace Support). Not applicable for Azure consumption. |
+| UnitOfMeasure | All | The unit of measure for billing for the service. For example, compute services are billed per hour. |
+| UnitPrice | EA, pay-as-you-go | The price per unit for the charge. |
+| CostAllocationRuleName | EA, MCA | Name of the Cost Allocation rule that's applicable to the record. |
+
+┬╣ Fields used to build a unique ID for a single cost record.
+
+Some fields might differ in casing and spacing between account types. Older versions of pay-as-you-go cost details files have separate sections for the statement and daily cost.
+
+### List of terms from older APIs
+
+The following table maps terms used in older APIs to the new terms. Refer to the preceding table for descriptions.
+
+| Old term | New term |
+| | |
+| ConsumedQuantity | Quantity |
+| IncludedQuantity | N/A |
+| InstanceId | ResourceId |
+| Rate | EffectivePrice |
+| Unit | UnitOfMeasure |
+| UsageDate | Date |
+| UsageEnd | Date |
+| UsageStart | Date |
+
+## Next steps
+
+- Get an overview of how to [ingest cost data](automation-ingest-usage-details-overview.md).
+- Learn more about [Choose a cost details solution](usage-details-best-practices.md).
+- [Create and manage exported data](../costs/tutorial-export-acm-data.md) in the Azure portal with Exports.
+- [Automate Export creation](../costs/ingest-azure-usage-at-scale.md) and ingestion at scale using the API.
+- Learn how to [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md).
cost-management-billing Usage Details Best Practices https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/automate/usage-details-best-practices.md
+
+ Title: Cost details best practices
+
+description: This article describes best practices recommended by Microsoft when you work with data in cost details files.
++ Last updated : 07/15/2022++++++
+# Choose a cost details solution
+
+There are multiple ways to work with the cost details dataset (formerly referred to as usage details). If your organization has a large Azure presence across many resources or subscriptions, you'll have a large amount of cost details data. Excel often can't load such large files. In this situation, we recommend the options below.
+
+## Exports
+
+Exports are recurring data dumps to storage that can be configured to run on a custom schedule. We recommend Exports as the solution to ingest cost details data. It's the most scalable for large enterprises. Exports are [configured in the Azure portal](../costs/tutorial-export-acm-data.md) or using the [Exports API](/rest/api/cost-management/exports). Review the considerations below for analyzing whether this solution is best for your particular data ingestion workload.
+
+- Exports are most scalable solution for your workloads.
+- Can be configured to use file partitioning for bigger datasets.
+- Great for establishing and growing a cost dataset that can be integrated with your own queryable data stores.
+- Requires access to a storage account that can hold the data.
+
+To learn more about how to properly call the API and ingest cost details at scale, see [Retrieve large datasets with exports](../costs/ingest-azure-usage-at-scale.md).
+
+## Cost Details API
+
+The [Cost Details](/rest/api/cost-management/generate-cost-details-report) API is the go to solution for on demand download of the cost details dataset. Review the considerations below to analyze whether this solution is best for your particular data ingestion workload.
+
+- Useful for small cost datasets. Exports scale better than the API. The API may not be a good solution if you need to ingest many gigabytes worth of cost data month over month. A GB of cost details data is roughly 1 million rows of data.
+- Useful for scenarios when Exports to Azure storage aren't feasible due to security or manageability concerns.
+
+If the Cost Details API is your chosen solution, review the best practices to call the API below.
+
+- If you want to get the latest cost data, we recommend that you query at most once per day. Reports are refreshed every four hours. If you call more frequently, you'll receive identical data.
+- Once you download your cost data for historical invoices, the charges won't change unless you're explicitly notified. We recommend caching your cost data in a queryable store on to prevent repeated calls for identical data.
+- Chunk your calls into small date ranges to get more manageable files that you can download. For example, we recommend chunking by day or by week if you have large Azure usage files month-to-month.
+- If you have scopes with a large amount of usage data (for example a Billing Account), consider placing multiple calls to child scopes so you get more manageable files that you can download.
+- If you're bound by rate limits at a lower scope, consider calling a higher scope to download data.
+- If your dataset is more than 2 GB month-to-month, consider using [exports](../costs/tutorial-export-acm-data.md) as a more scalable solution.
+
+To learn more about how to properly call the [Cost Details](/rest/api/cost-management/generate-cost-details-report) API, see [Get small usage data sets on demand](get-small-usage-datasets-on-demand.md).
+
+The Cost Details API is only available for customers with an Enterprise Agreement or Microsoft Customer Agreement. If you're an MSDN, pay-as-you-go or Visual Studio customer, see [Get usage details as a legacy customer](get-usage-details-legacy-customer.md).
+
+## Power BI
+
+Power BI is another solution that's used to work with cost details data. The following Power BI solutions are available:
+
+- Azure Cost Management Template App: - If you're an Enterprise Agreement or Microsoft Customer Agreement customer, you can use the Power BI template app to analyze costs for your billing account. It includes predefined reports that are built on top of the cost details dataset, among others. For more information, see [Analyze Azure costs with the Power BI template app](../costs/analyze-cost-data-azure-cost-management-power-bi-template-app.md).
+- Azure Cost Management Connector: - If you want to analyze your data daily, you can use the [Power BI data connector](/power-bi/connect-data/desktop-connect-azure-cost-management) to get data for detailed analysis. Any reports that you create are kept up to date by the connector as more costs accrue.
+
+## Azure portal download
+
+Only [download your usage from the Azure portal](../understand/download-azure-daily-usage.md) if you have a small cost details dataset that is capable of being loaded in Excel. Cost files that are larger than one or 2 GB may take an exceedingly long time to generate on demand from the Azure portal. They'll take longer to transfer over a network to your local computer. We recommend using one of the above solutions if you have a large monthly usage dataset.
+
+## Next steps
+
+- Get an overview of [how to ingest cost data](automation-ingest-usage-details-overview.md).
+- [Create and manage exported data](../costs/tutorial-export-acm-data.md) in the Azure portal with Exports.
+- [Automate Export creation](../costs/ingest-azure-usage-at-scale.md) and ingestion at scale using the API.
+- [Understand cost details fields](understand-usage-details-fields.md).
+- Learn how to [Get small cost datasets on demand](get-small-usage-datasets-on-demand.md).
cost-management-billing Cost Management Error Codes https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/costs/cost-management-error-codes.md
The error is caused by excessive use within a short timeframe. Wait five minutes
### More information
-For more information, see [Error code 429 - Call count has exceeded rate limits](manage-automation.md#error-code-429call-count-has-exceeded-rate-limits).
+For more information, see [Data latency and rate limits](manage-automation.md#data-latency-and-rate-limits).
## ServerTimeout
cost-management-billing Get Started Partners https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/costs/get-started-partners.md
To verify data in the export list, select the storage account name. On the stora
## Cost Management REST APIs
-Partners and customers can use Cost Management APIs described in the following sections for common tasks.
-
-### Cost Management APIs - Direct and indirect providers
-
-Partners with access to billing scopes in a partner tenant can use the following APIs to view invoiced costs.
-
-APIs at the subscription scope can be called by a partner regardless of the cost policy if they have access to the subscription. Other users with access to the subscription, like the customer or reseller, can call the APIs only after the partner enables the cost policy for the customer tenant.
--
-#### To get a list of billing accounts
-
-```
-GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts?api-version=2019-10-01-preview
-```
-
-#### To get a list of customers
-
-```
-GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/customers?api-version=2019-10-01-preview
-```
-
-#### To get a list of subscriptions
-
-```
-GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/billingSubscriptions?api-version=2019-10-01-preview
-```
-
-#### To get a list of invoices for a period of time
-
-```
-GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/invoices?api-version=2019-10-01-preview&periodStartDate={periodStartDate}&periodEndDate={periodEndDate}
-```
-
-The API call returns an array of invoices that has elements similar to the following JSON code.
-
-```
- {
- "id": "/providers/Microsoft.Billing/billingAccounts/{billingAccountID}/billingProfiles/{BillingProfileID}/invoices/{InvoiceID}",
- "name": "{InvoiceID}",
- "properties": {
- "amountDue": {
- "currency": "USD",
- "value": x.xx
- },
- ...
- }
-```
-
-Use the preceding returned ID field value and replace it in the following example as the scope to query for usage details.
-
-```
-GET https://management.azure.com/{id}/providers/Microsoft.Consumption/UsageDetails?api-version=2019-10-01
-```
-
-The example returns the usage records associated with the specific invoice.
--
-#### To get the policy for customers to view costs
-
-```
-GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/customers/{customerID}/policies/default?api-version=2019-10-01-preview
-```
-
-#### To set the policy for customers to view costs
-
-```
-PUT https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/customers/{customerID}/policies/default?api-version=2019-10-01-preview
-```
-
-#### To get Azure service usage for a billing account
-
-```
-GET https://management.azure.com/providers/Microsoft.Billing/BillingAccounts/{billingAccountName}/providers/Microsoft.Consumption/usageDetails?api-version=2019-10-01
-```
-
-#### To download a customer's Azure service usage
-
-The following get call is an asynchronous operation.
-
-```
-GET https://management.azure.com/Microsoft.Billing/billingAccounts/{billingAccountName}/customers/{customerID}/providers/Microsoft.Consumption/usageDetails/download?api-version=2019-10-01 -verbose
-```
-
-Call the `Location` URI returned in the response to check the operation status. When the status is *Completed*, the `downloadUrl` property contains a link that you can use to download the generated report.
--
-#### To get or download the price sheet for consumed Azure services
-
-First, use the following post.
-
-```
-POST https://management.azure.com/providers/Microsoft.Billing/BillingAccounts/{billingAccountName}/billingProfiles/{billingProfileID}/pricesheet/default/download?api-version=2019-10-01-preview&format=csv" -verbose
-```
-
-Then, call the asynchronous operation property value. For example:
-
-```
-GET https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/billingProfiles/{billingProfileID}/pricesheetDownloadOperations/{operation}?sessiontoken=0:11186&api-version=2019-10-01-preview
-```
-The preceding get call returns the download link containing the price sheet.
--
-#### To get aggregated costs
-
-```
-POST https://management.azure.com/providers/microsoft.billing/billingAccounts/{billingAccountName}/providers/microsoft.costmanagement/query?api-version=2019-10-01
-```
-
-#### Create a budget for a partner
-
-```
-PUT https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/providers/Microsoft.CostManagement/budgets/partnerworkshopbudget?api-version=2019-10-01
-```
-
-#### Create a budget for a customer
-
-```
-PUT https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountName}/customers/{customerID}/providers/Microsoft.Consumption/budgets/{budgetName}?api-version=2019-10-01
-```
-
-#### Delete a budget
-
-```
-DELETE
-https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{billingAccountId}/providers/Microsoft.CostManagement/budgets/{budgetName}?api-version=2019-10-01
-```
-
+Partners and their customers can use Cost Management APIs for common tasks. For more information, see [Automation for partners](../automate/partner-automation.md).
## Next steps+ - [Start analyzing costs](quick-acm-cost-analysis.md) in Cost Management - [Create and manage budgets](tutorial-acm-create-budgets.md) in Cost Management
cost-management-billing Ingest Azure Usage At Scale https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/costs/ingest-azure-usage-at-scale.md
description: This article helps you regularly export large amounts of data with exports from Cost Management. Previously updated : 12/10/2021 Last updated : 07/15/2022
# Retrieve large cost datasets recurringly with exports
-This article helps you regularly export large amounts of data with exports from Cost Management. Exporting is the recommended way to retrieve unaggregated cost data. Especially when usage files are too large to reliably call and download using the Usage Details API. Exported data is placed in the Azure Storage account that you choose. From there, you can load it into your own systems and analyze it as needed. To configure exports in the Azure portal, see [Export data](tutorial-export-acm-data.md).
+This article helps you regularly export large amounts of data with exports from Cost Management. Exporting is the recommended way to retrieve unaggregated cost data. Especially when usage files are too large to reliably call and download using the [Cost Details](/rest/api/cost-management/generate-cost-details-report) API. Exported data is placed in the Azure Storage account that you choose. From there, you can load it into your own systems and analyze it as needed. To configure exports in the Azure portal, see [Export data](tutorial-export-acm-data.md).
If you want to automate exports at various scopes, the sample API request in the next section is a good starting point. You can use the Exports API to create automatic exports as a part of your general environment configuration. Automatic exports help ensure that you have the data that you need. You can use in your own organization's systems as you expand your Azure use.
Before you create your first export, consider your scenario and the configuratio
- ActualCost - Shows the total usage and costs for the period specified, as they're accrued and shows on your bill. - AmortizedCost - Shows the total usage and costs for the period specified, with amortization applied to the reservation purchase costs that are applicable. - Usage - All exports created before July 20 2020 are of type Usage. Update all your scheduled exports as either ActualCost or AmortizedCost.-- **Columns** ΓÇô Defines the data fields you want included in your export file. They correspond with the fields available in the Usage Details API. For more information, see [Usage Details API](/rest/api/consumption/usagedetails/list).-
-## Seed a historical cost dataset in Azure storage
-
-When setting up a data pipeline using exports, you might find it useful to seed your historical cost data. This historical data can then be loaded into the data store of your choice. We recommend creating one-time data exports in one month chunks. The following example explains how to create a one-time export using the Exports API. If you have a large dataset each month, we recommend setting `partitionData = true` for your one-time export to split it into multiple files. For more information, see [File partitioning for large datasets](tutorial-export-acm-data.md?tabs=azure-portal#file-partitioning-for-large-datasets).
-
-After you've seeded your historical dataset, you can then create a scheduled export to continue populating your cost data in Azure storage as your charges accrue moving forward. The next section has additional information.
-
-```http
-PUT https://management.azure.com/providers/Microsoft.Billing/billingAccounts/{enrollmentId}/providers/Microsoft.CostManagement/exports/{ExportName}?api-version=2021-10-01
-```
-
-Request body:
-
-```json
-{
- "properties": {
- "definition": {
- "dataset": {
- "granularity": "Daily",
- "grouping": []
- },
- "timePeriod": {
- "from": "2021-09-01T00:00:00.000Z",
- "to": "2021-09-30T00:00:00.000Z"
- },
- "timeframe": "Custom",
- "type": "ActualCost"
- },
- "deliveryInfo": {
- "destination": {
- "container": "{containerName}",
- "rootFolderPath": "{folderName}",
- "resourceId": "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}"
- }
- },
- "format": "Csv",
- "partitionData": false
- }
-}
-```
+- **Columns** ΓÇô Defines the data fields you want included in your export file. They correspond with the fields available in the [Cost Details](/rest/api/cost-management/generate-cost-details-report) API.
+- **Partitioning** - Set the option to true if you have a large dataset and would like it to be broken up into multiple files. This makes data ingestion much faster and easier. For more information about partitioning, see [File partitioning for large datasets](../costs/tutorial-export-acm-data.md#file-partitioning-for-large-datasets).
## Create a daily month-to-date export for a subscription
Azure blob storage supports high global transfer rates with its service-side syn
- See the [Microsoft Azure Storage Data Movement Library](https://github.com/Azure/azure-storage-net-data-movement) source. - [Transfer data with the Data Movement library](../../storage/common/storage-use-data-movement-library.md). - See the [AzureDmlBackup sample application](https://github.com/markjbrown/AzureDmlBackup) source sample.-- Read [High-Throughput with Azure Blob Storage](https://azure.microsoft.com/blog/high-throughput-with-azure-blob-storage).
+- Read [High-Throughput with Azure Blob Storage](https://azure.microsoft.com/blog/high-throughput-with-azure-blob-storage).
cost-management-billing Manage Automation https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/costs/manage-automation.md
You can configure budgets to start automated actions using Azure Action Groups.
## Data latency and rate limits
-We recommend that you call the APIs no more than once per day. Cost Management data is refreshed every four hours as new usage data is received from Azure resource providers. Calling more frequently doesn't provide more data. Instead, it creates increased load. To learn more about how often data changes and how data latency is handled, see [Understand cost management data](understand-cost-mgt-data.md).
+We recommend that you call the APIs no more than once per day. Cost Management data is refreshed every four hours as new usage data is received from Azure resource providers. Calling more frequently doesn't provide more data. Instead, it creates increased load.
-### Error code 429 - Call count has exceeded rate limits
-
-To enable a consistent experience for all Cost Management subscribers, Cost Management APIs are rate limited. When you reach the limit, you receive the HTTP status code `429: Too many requests`. The current throughput limits for our APIs are as follows:
--- 15 calls per minute - It's done per scope, per user, or application.-- 100 calls per minute - It's done per tenant, per user, or application.
+<!-- For more information, see [Cost Management API latency and rate limits](../automate/api-latency-rate-limits.md) -->
## Next steps
cost-management-billing Tutorial Acm Create Budgets https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/costs/tutorial-acm-create-budgets.md
To toggle between configuring an Actual vs Forecasted cost alert, use the `Type`
If you want to receive emails, add azure-noreply@microsoft.com to your approved senders list so that emails don't go to your junk email folder. For more information about notifications, see [Use cost alerts](./cost-mgt-alerts-monitor-usage-spending.md).
-In the following example, an email alert gets generated when 90% of the budget is reached. If you create a budget with the Budgets API, you can also assign roles to people to receive alerts. Assigning roles to people isn't supported in the Azure portal. For more about the Azure budgets API, see [Budgets API](/rest/api/consumption/budgets). If you want to have an email alert sent in a different language, see [Supported locales for budget alert emails](manage-automation.md#supported-locales-for-budget-alert-emails).
+In the following example, an email alert gets generated when 90% of the budget is reached. If you create a budget with the Budgets API, you can also assign roles to people to receive alerts. Assigning roles to people isn't supported in the Azure portal. For more about the Azure budgets API, see [Budgets API](/rest/api/consumption/budgets). If you want to have an email alert sent in a different language, see [Supported locales for budget alert emails](../automate/automate-budget-creation.md#supported-locales-for-budget-alert-emails).
Alert limits support a range of 0.01% to 1000% of the budget threshold that you've provided.
cost-management-billing Download Azure Daily Usage https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/cost-management-billing/understand/download-azure-daily-usage.md
Title: View and download Azure usage and charges
-description: Learn how to download or view your Azure daily usage and charges, and see additional available resources.
+description: Learn how to download or view your Azure daily usage and charges, and see other available resources.
keywords: billing usage, usage charges, usage download, view usage, azure invoice, azure usage
Last updated 11/18/2021
# View and download your Azure usage and charges
-You can download a daily breakdown of your Azure usage and charges in the Azure portal. You can also get your usage data using Azure CLI. Only certain roles have permission to get Azure usage information, like the Account Administrator or Enterprise Administrator. To learn more about getting access to billing information, see [Manage access to Azure billing using roles](../manage/manage-billing-access.md).
+You can download a daily breakdown of your Azure usage and charges in the Azure portal. Only certain roles have permission to get Azure usage information, like the Account Administrator or Enterprise Administrator. To learn more about getting access to billing information, see [Manage access to Azure billing using roles](../manage/manage-billing-access.md).
-If you have a Microsoft Customer Agreement (MCA), you must be a billing profile Owner, Contributor, Reader, or Invoice manager to view your Azure usage and charges. If you have a Microsoft Partner Agreement (MPA), only the Global Admin and Admin Agent role in the partner organization Microsoft can view and download Azure usage and charges.
+If you have a Microsoft Customer Agreement (MCA), you must be a billing profile Owner, Contributor, Reader, or Invoice manager to view your Azure usage and charges. If you have a Microsoft Partner Agreement (MPA), only the Global Admin and Admin Agent role in the partner organization Microsoft can view and download Azure usage and charges.
Based on the type of subscription that you use, options to download your usage and charges vary.
+If you want to get cost and usage data using the Azure CLI, see [Get usage data with the Azure CLI](../automate/get-usage-data-azure-cli.md).
+ ## Download usage from the Azure portal (.csv) 1. Sign in to the [Azure portal](https://portal.azure.com).
To view and download usage data for a billing profile, you must be a billing pro
### Download usage for open charges
-You can also download month-to-date usage for the current billing period, meaning the charges have not been billed yet.
+You can also download month-to-date usage for the current billing period, meaning the charges haven't been billed yet.
1. Search for **Cost Management + Billing**. 2. Select a billing profile.
-3. In the **Overview** blade, select **Download Azure usage and charges**.
+3. In the **Overview** area, select **Download Azure usage and charges**.
### Download usage for pending charges
-If you have a Microsoft Customer Agreement, you can download month-to-date usage for the current billing period. These usage charges that have not been billed yet.
+If you have a Microsoft Customer Agreement, you can download month-to-date usage for the current billing period. These usage charges that haven't been billed yet.
1. Sign in to the [Azure portal](https://portal.azure.com). 2. Search for *Cost Management + Billing*.
data-factory Better Understand Different Integration Runtime Charges https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/data-factory/better-understand-different-integration-runtime-charges.md
+
+ Title: Examples for better understanding pricing model under different integration runtime types
+description: Learn about pricing model under different integration runtime types from some examples.
+++++ Last updated : 07/17/2022++
+# Examples for better understanding pricing model under different integration runtime types
++
+In this article, we'll illustrate the pricing model using different integration runtime through some concrete examples. These examples only focus on copy activity, pipeline activity and external activity running on the integration runtime. It doesn't involve charges for Data Factory Pipeline Orchestration, Data Factory Operations and Data Factory Operations. For all pricing details, see [Data Pipeline Pricing and FAQ](pricing-concepts.md).
+
+The integration runtime, which is serverless in Azure and self-hosted in hybrid scenarios, provides the compute resources used to execute the activities in a pipeline. Integration runtime charges are prorated by the minute and rounded up.
+
+> [!NOTE]
+> The prices used in these examples below are hypothetical and are not intended to imply actual pricing.
+
+## Azure integration runtime
+
+**Example 1: If there are 6 copy activities executed sequentially. 4 DIUs are used for the first 2 copy activities and the execution time of each is 40 seconds. The other 4 choose using 8 DIUs and the execution time of each is 1 minute and 20 seconds.**
+
+In this example, the execution time of the first 2 activities is both rounded up to 1 minute each. The other 4 are all rounded up to 2 minutes each.
+++
+**Example 2: If there are 50 copy activities triggered by Foreach. 4 DIUs are used for each copy activity, and each executes for 40 seconds. The parallel is configured as 50 in Foreach.**
+
+In this example, the execution time of each copy activity is rounded up to 1 minute. Although the 50 copy activities run in parallel, they use different computes. So they're charged independently.
+++
+**Example 3: If there are 6 HDInsight activities triggered by Foreach. The execution time of each is 9 minutes and 40 seconds. The parallel is configured as 50 in Foreach.**
+
+In this example, the execution time of each HDInsight activity is rounded up to 10 minutes. Although the 6 HDInsight activities run in parallel, they use different computes. So they're charged independently.
+++
+## Azure integration runtime with managed virtual network enabled
+
+**Example 1: If there are 6 copy activities executed sequentially. 4 DIUs are used for each copy activity and the execution time of each is 40 seconds. Assume the queue time is 1 minute and no TTL enabled.**
+
+In this example, the execution time and queue time totally are rounded up to 2 minutes. As no TTL is enabled, every copy activity has a queue time. The price for copy activity using Azure integration runtime with managed virtual network enabled is $0.25 /DIU-hour.
++
+**Example 2: If there are 6 copy activities executed sequentially. 4 DIUs are reserved for copy activity and the execution time of each is 40 seconds. Assume the queue time is 1 minute and TTL is 5 minutes.**
+
+As the compute is reserved, the 6 copy activities aren't rounded up independently and are charged together. And TTL is enabled, so only the first copy activity has queue time.
++
+**Example 3: If there are 6 HDInsight activities triggered by Foreach. The execution time of each is 9 minutes and 40 seconds. The parallel is configured as 50 in Foreach. TTL is 30 minutes.**
+
+In this example, the execution time of each HDInsight activity is rounded up to 10 minutes. As the 6 HDInsight activities run in parallel and within the concurrency limitation (800), they're only charged once.
+++
+## Self-hosted integration runtime
+
+**Example 1: If there are 6 copy activities executed sequentially. The execution time of each is 40 seconds.**
+
+In this example, the execution time of each copy activity is rounded up to 1 minute each.
++
+**Example 2: If there are 50 copy activities triggered by Foreach. The execution time of each is 40 seconds. The parallel is configured as 50 in Foreach.**
+
+In this example, the execution time of each copy activity is rounded up to 1 minute. Although the 50 copy activities run in parallel, they're charged independently.
++
+**Example 3: If there are 6 HDInsight activities triggered by Foreach. The execution time of each is 9 minutes and 40 seconds. The parallel is configured as 50 in Foreach.**
+
+In this example, the execution time of each HDInsight activity is rounded up to 10 minutes. Although the 6 HDInsight activities run in parallel, they're charged independently.
+++
+## Next steps
+
+Now that you understand the pricing for Azure Data Factory, you can get started!
+
+- [Create a data factory by using the Azure Data Factory UI](quickstart-create-data-factory-portal.md)
data-factory Managed Virtual Network Private Endpoint https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/data-factory/managed-virtual-network-private-endpoint.md
Last updated 06/24/2022
This article explains managed virtual networks and managed private endpoints in Azure Data Factory. + ## Managed virtual network When you create an Azure integration runtime within a Data Factory managed virtual network, the integration runtime is provisioned with the managed virtual network. It uses private endpoints to securely connect to supported data stores.
Currently, the managed virtual network is only supported in the same region as t
:::image type="content" source="./media/managed-vnet/managed-vnet-architecture-diagram.png" alt-text="Diagram that shows Data Factory managed virtual network architecture.":::
-## Managed private endpoints
+There are two ways to enable managed virtual network in your data factory:
+1. Enable managed virtual network during the creation of data factory.
-Managed private endpoints are private endpoints created in the Data Factory managed virtual network that establish a private link to Azure resources. Data Factory manages these private endpoints on your behalf.
+2. Enable managed virtual network in integration runtime.
+++
+## Managed private endpoints
+
+Managed private endpoints are private endpoints created in the Data Factory managed virtual network that establishes a private link to Azure resources. Data Factory manages these private endpoints on your behalf.
-Data Factory supports private links. You can use Azure Private Link to access Azure platform as a service (PaaS) services like Azure Storage, Azure Cosmos DB, and Azure Synapse Analytics.
+Data Factory supports private links. You can use Azure private link to access Azure platform as a service (PaaS) services like Azure Storage, Azure Cosmos DB, and Azure Synapse Analytics.
-When you use a private link, traffic between your data stores and managed virtual network traverses entirely over the Microsoft backbone network. Private Link protects against data exfiltration risks. You establish a private link to a resource by creating a private endpoint.
+When you use a private link, traffic between your data stores and managed virtual network traverses entirely over the Microsoft backbone network. Private link protects against data exfiltration risks. You establish a private link to a resource by creating a private endpoint.
A private endpoint uses a private IP address in the managed virtual network to effectively bring the service into it. Private endpoints are mapped to a specific resource in Azure and not the entire service. Customers can limit connectivity to a specific resource approved by their organization. For more information, see [Private links and private endpoints](../private-link/index.yml). > [!NOTE]
-> Create managed private endpoints to connect to all your Azure data sources.
+> The resource provider Microsoft.Network must be registered to your subscription.
-Make sure the resource provider Microsoft.Network is registered to your subscription.
+1. Make sure you enable managed virtual network in your data factory.
+2. Create a new managed private endpoint in **Manage Hub**.
-> [!WARNING]
-> If a PaaS data store like Azure Blob Storage, Azure Data Lake Storage Gen2, and Azure Synapse Analytics has a private endpoint already created against it, even if it allows access from all networks, Data Factory would only be able to access it by using a managed private endpoint. If a private endpoint doesn't already exist, you must create one in such scenarios.
-A private endpoint connection is created in a **Pending** state when you create a managed private endpoint in Data Factory. An approval workflow is initiated. The private link resource owner is responsible for approving or rejecting the connection.
+3. A private endpoint connection is created in a **Pending** state when you create a managed private endpoint in Data Factory. An approval workflow is initiated. The private link resource owner is responsible for approving or rejecting the connection.
:::image type="content" source="./media/tutorial-copy-data-portal-private/manage-private-endpoint.png" alt-text="Screenshot that shows the option Manage approvals in Azure portal.":::
-If the owner approves the connection, the private link is established. Otherwise, the private link won't be established. In either case, the managed private endpoint is updated with the status of the connection.
+4. If the owner approves the connection, the private link is established. Otherwise, the private link won't be established. In either case, the managed private endpoint is updated with the status of the connection.
:::image type="content" source="./media/tutorial-copy-data-portal-private/approve-private-endpoint.png" alt-text="Screenshot that shows approving a managed private endpoint."::: Only a managed private endpoint in an approved state can send traffic to a specific private link resource. + ## Interactive authoring Interactive authoring capabilities are used for functionalities like test connection, browse folder list and table list, get schema, and preview data. You can enable interactive authoring when creating or editing an Azure integration runtime, which is in Azure Data Factory managed virtual network. The backend service will pre-allocate compute for interactive authoring functionalities. Otherwise, the compute will be allocated every time any interactive operation is performed which will take more time. The time to live (TTL) for interactive authoring is 60 minutes by default, which means it will automatically become disabled after 60 minutes of the last interactive authoring operation. You can change the TTL value according to your actual needs. :::image type="content" source="./media/managed-vnet/interactive-authoring.png" alt-text="Screenshot that shows interactive authoring."::: + ## Time to live (preview) ### Copy activity
-By default, every copy activity spins up a new compute based upon the configuration in copy activity. With managed virtual network enabled, cold computes start-up time takes a few minutes and data movement can't start until it is complete. If your pipelines contain multiple sequential copy activities or you have a lot of copy activities in foreach loop and canΓÇÖt run them all in parallel, you can enable a time to live (TTL) value in the Azure integration runtime configuration. Specifying a time to live value and DIU numbers required for the copy activity keeps the corresponding computes alive for a certain period of time after its execution completes. If a new copy activity starts during the TTL time, it will reuse the existing computes and start-up time will be greatly reduced. After the second copy activity completes, the computes will again stay alive for the TTL time.
+By default, every copy activity spins up a new compute based upon the configuration in copy activity. With managed virtual network enabled, cold computes start-up time takes a few minutes and data movement can't start until it's complete. If your pipelines contain multiple sequential copy activities or you have many copy activities in foreach loop and canΓÇÖt run them all in parallel, you can enable a time to live (TTL) value in the Azure integration runtime configuration. Specifying a time to live value and DIU numbers required for the copy activity keeps the corresponding computes alive for a certain period of time after its execution completes. If a new copy activity starts during the TTL time, it will reuse the existing computes, and start-up time will be greatly reduced. After the second copy activity completes, the computes will again stay alive for the TTL time.
> [!NOTE] > Reconfiguring the DIU number will not affect the current copy activity execution.
+> [!NOTE]
+> The data integration unit (DIU) measure of 2 DIU isn't supported for the Copy activity in a managed virtual network.
+
+The DIU you select in TTL will be used to run all copy activities, the size of the DIU won't be auto-scaled according to actual needs. So you have to choose enough DIUs.
+
+> [!WARNING]
+> Selecting few DIUs to run many activities will cause many activities to be pending in the queue, which will seriously affect the overall performance.
++ ### Pipeline and external activity Unlike copy activity, pipeline and external activity have a default time to live (TTL) of 60 minutes. You can change the default TTL on Azure integration runtime configuration according to your actual needs, but itΓÇÖs not supported to disable the TTL.
Unlike copy activity, pipeline and external activity have a default time to live
:::image type="content" source="./media/managed-vnet/time-to-live-configuration.png" alt-text="Screenshot that shows the TTL configuration.":::
-> [!NOTE]
-> The data integration unit (DIU) measure of 2 DIU isn't supported for the Copy activity in a managed virtual network.
+
+### Comparison of different TTL
+The following table lists the differences between different types of TTL:
+
+| | Interactive authoring | Copy compute scale | Pipeline & External compute scale |
+| -- | - | -- | |
+| When to take effect | Immediately after enablement | First activity execution | First activity execution |
+| Can be disabled | Y | Y | N |
+| Reserved compute is configurable | N | Y | N |
+ ## Create a managed virtual network via Azure PowerShell
New-AzResource -ApiVersion "${apiVersion}" -ResourceId "${integrationRuntimeReso
> [!Note] > You can get the **groupId** of other data sources from a [private link resource](../private-link/private-endpoint-overview.md#private-link-resource).
-## Limitations and known issues
-This section discusses limitations and known issues.
+## Outbound connection
### Supported data sources and services
The following services have native private endpoint support. They can be connect
For the support of data sources, you can refer to [connector overview](connector-overview.md). You can access all data sources that are supported by Data Factory through a public network. > [!NOTE]
-> Because SQL Managed Instance native private endpoint is in private preview, you can access it from a managed virtual network by using Private Link and Azure Load Balancer. For more information, see [Access SQL Managed Instance from a Data Factory managed virtual network using a private endpoint](tutorial-managed-virtual-network-sql-managed-instance.md).
+> Because SQL Managed Instance native private endpoint is in preview, you can access it from a managed virtual network by using Private Link and Azure Load Balancer. For more information, see [Access SQL Managed Instance from a Data Factory managed virtual network using a private endpoint](tutorial-managed-virtual-network-sql-managed-instance.md).
+ ### On-premises data sources To learn how to access on-premises data sources from a managed virtual network by using a private endpoint, see [Access on-premises SQL Server from a Data Factory managed virtual network using a private endpoint](tutorial-managed-virtual-network-on-premise-sql-server.md). + ### Outbound communications through public endpoint from a Data Factory managed virtual network All ports are opened for outbound communications. +
+## Limitations and known issues
+ ### Linked service creation for Key Vault When you create a linked service for Key Vault, there's no integration runtime reference. So, you can't create private endpoints during linked service creation of Key Vault. But when you create linked service for data stores that references Key Vault, and this linked service references an integration runtime with managed virtual network enabled, you can create a private endpoint for Key Vault during creation.
When you create a linked service for Key Vault, there's no integration runtime r
- **Test connection:** This operation for a linked service of Key Vault only validates the URL format but doesn't do any network operation. - **Using private endpoint:** This column is always shown as blank even if you create a private endpoint for Key Vault. + ### Linked service creation of Azure HDInsight The column **Using private endpoint** is always shown as blank even if you create a private endpoint for HDInsight by using a private link service and a load balancer with port forwarding. :::image type="content" source="./media/managed-vnet/akv-pe.png" alt-text="Screenshot that shows a private endpoint for Key Vault.":::
+### Access constraints in managed virtual network with private endpoints
+You're unable to access each PaaS resource when both sides are exposed to Private Link and a private endpoint. This issue is a known limitation of Private Link and private endpoints.
+
+For example, you have a managed private endpoint for storage account A. You can also access storage account B through public network in the same managed virtual network. But when storage account B has a private endpoint connection from other managed virtual network or customer virtual network, then you can't access storage account B in your managed virtual network through public network.
+ ## Next steps See the following tutorials:
defender-for-cloud Release Notes https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/defender-for-cloud/release-notes.md
You can now also group your alerts by resource group to view all of your alerts
Until now, the integration with Microsoft Defender for Endpoint (MDE) included automatic installation of the new [MDE unified solution](/microsoft-365/security/defender-endpoint/configure-server-endpoints?view=o365-worldwide#new-windows-server-2012-r2-and-2016-functionality-in-the-modern-unified-solution&preserve-view=true) for machines (Azure subscriptions and multicloud connectors) with Defender for Servers Plan 1 enabled, and for multicloud connectors with Defender for Servers Plan 2 enabled. Plan 2 for Azure subscriptions enabled the unified solution for Linux machines and Windows 2019 and 2022 servers only. Windows servers 2012R2 and 2016 used the MDE legacy solution dependent on Log Analytics agent.
-Now, the new unified solution is available for all machines in both plans, for both Azure subscriptions and multi-cloud connectors. For Azure subscriptions with Servers plan 2 that enabled MDE integration *after* June 20th 2022, the unified solution is enabled by default for all machines Azure subscriptions with the Defender for Servers Plan 2 enabled with MDE integration *before* June 20th 2022 can now enable unified solution installation for Windows servers 2012R2 and 2016 through the dedicated button in the Integrations page:
+Now, the new unified solution is available for all machines in both plans, for both Azure subscriptions and multi-cloud connectors. For Azure subscriptions with Servers Plan 2 that enabled MDE integration *after* June 20th 2022, the unified solution is enabled by default for all machines Azure subscriptions with the Defender for Servers Plan 2 enabled with MDE integration *before* June 20th 2022 can now enable unified solution installation for Windows servers 2012R2 and 2016 through the dedicated button in the Integrations page:
:::image type="content" source="media/integration-defender-for-endpoint/enable-unified-solution.png" alt-text="The integration between Microsoft Defender for Cloud and Microsoft's EDR solution, Microsoft Defender for Endpoint, is enabled." lightbox="media/integration-defender-for-endpoint/enable-unified-solution.png":::
While Defender for Servers Plan 2 continues to provide protections from threats
If you have been using Defender for Servers until now no action is required.
-In addition, Defender for Cloud also begins gradual support for the [Defender for Endpoint unified agent for Windows Server 2012 R2 and 2016](https://techcommunity.microsoft.com/t5/microsoft-defender-for-endpoint/defending-windows-server-2012-r2-and-2016/ba-p/2783292). Defender for Servers Plan 1 deploys the new unified agent to Windows Server 2012 R2 and 2016 workloads. Defender for Servers Plan 2 deploys the legacy agent to Windows Server 2012 R2 and 2016 workloads and will start deploying the unified agent soon.
+In addition, Defender for Cloud also begins gradual support for the [Defender for Endpoint unified agent for Windows Server 2012 R2 and 2016](https://techcommunity.microsoft.com/t5/microsoft-defender-for-endpoint/defending-windows-server-2012-r2-and-2016/ba-p/2783292). Defender for Servers Plan 1 deploys the new unified agent to Windows Server 2012 R2 and 2016 workloads.
### Relocation of custom recommendations
expressroute Expressroute Howto Linkvnet Portal Resource Manager https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/expressroute/expressroute-howto-linkvnet-portal-resource-manager.md
Title: 'Tutorial: Link a VNet to an ExpressRoute circuit - Azure portal'
description: This tutorial shows you how to create a connection to link a virtual network to an Azure ExpressRoute circuit using the Azure portal. - Previously updated : 08/10/2021 Last updated : 07/15/2022 --+ # Tutorial: Connect a virtual network to an ExpressRoute circuit using the portal
In this tutorial, you learn how to:
* Review guidance for [connectivity between virtual networks over ExpressRoute](virtual-network-connectivity-guidance.md).
-* You can [view a video](https://azure.microsoft.com/documentation/videos/azure-expressroute-how-to-create-a-connection-between-your-vpn-gateway-and-expressroute-circuit) before beginning to better understand the steps.
- ## Connect a VNet to a circuit - same subscription > [!NOTE]
You can delete a connection and unlink your VNet to an ExpressRoute circuit by s
## Next steps
-In this tutorial, you learned how to connect a virtual network to a circuit in the same subscription and a different subscription. For more information about the ExpressRoute gateway, see:
+In this tutorial, you learned how to connect a virtual network to a circuit in the same subscription and a different subscription. For more information about ExpressRoute gateways, see: [ExpressRoute virtual network gateways](expressroute-about-virtual-network-gateways.md).
+
+To learn how to configure route filters for Microsoft peering using the Azure portal, advance to the next tutorial.
> [!div class="nextstepaction"]
-> [About ExpressRoute virtual network gateways](expressroute-about-virtual-network-gateways.md)
+> [Configure route filters for Microsoft peering](how-to-routefilter-portal.md)
healthcare-apis Get Started With Iot https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/healthcare-apis/iot/get-started-with-iot.md
# Get started with MedTech service in Azure Health Data Services
-This article outlines the basic steps to get started with MedTech service in [Azure Health Data Services](../healthcare-apis-overview.md). MedTech service first processes data that has been sent to an event hub from a medical device, and then saves the data to the Fast Healthcare Interoperability Resources (FHIR&#174;) service as Observation resources. This procedure makes it possible to link the FHIR service Observation to patient and device resources.
+This article outlines the basic steps to get started with Azure MedTech service in [Azure Health Data Services](../healthcare-apis-overview.md). MedTech service ingests health data from a medical device using Azure Event Hubs service. It then persists the data to the Azure Fast Healthcare Interoperability Resources (FHIR&#174;) service as Observation resources. This data processing procedure makes it possible to link FHIR service Observations to patient and device resources.
-The following diagram shows the four development steps of the data flow needed to get MedTech service to receive data from a device and send it to FHIR service.
+The following diagram shows the four-step data flow that enables MedTech service to receive data from a device and send it to FHIR service.
- Step 1 introduces the subscription and permissions prerequisites needed.
iot-hub Iot Hub Devguide Direct Methods https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/iot-hub/iot-hub-devguide-direct-methods.md
Previously updated : 07/17/2018 Last updated : 07/15/2022
The payload for method requests and responses is a JSON document up to 128 KB.
## Invoke a direct method from a back-end app
-Now, invoke a direct method from a back-end app.
+To invoke a direct method from a back-end app use the [Invoke device method](/rest/api/iothub/service/devices/invoke-method) REST API or its equivalent in one of the [IoT Hub service SDKs](iot-hub-devguide-sdks.md#azure-iot-hub-service-sdks).
### Method invocation Direct method invocations on a device are HTTPS calls that are made up of the following items:
-* The *request URI* specific to the device along with the [API version](/rest/api/iothub/service/devices/invokemethod):
+* The *request URI* specific to the device along with the API version:
```http https://fully-qualified-iothubname.azure-devices.net/twins/{deviceId}/methods?api-version=2021-04-12
Direct method invocations on a device are HTTPS calls that are made up of the fo
* The POST *method*
-* *Headers* that contain the authorization, request ID, content type, and content encoding.
+* *Headers* that contain the authorization, content type, and content encoding.
* A transparent JSON *body* in the following format: ```json {
+ "connectTimeoutInSeconds": 200,
"methodName": "reboot", "responseTimeoutInSeconds": 200, "payload": {
The value provided as `connectTimeoutInSeconds` in the request is the amount of
#### Example
-This example will allow you to securely initiate a request to invoke a Direct Method on an IoT device registered to an Azure IoT Hub.
+This example will allow you to securely initiate a request to invoke a direct method on an IoT device registered to an Azure IoT hub.
To begin, use the [Microsoft Azure IoT extension for Azure CLI](https://github.com/Azure/azure-iot-cli-extension) to create a SharedAccessSignature.
curl -X POST \
}' ```
-Execute the modified command to invoke the specified Direct Method. Successful requests will return an HTTP 200 status code.
+Execute the modified command to invoke the specified direct method. Successful requests will return an HTTP 200 status code.
> [!NOTE]
-> The above example demonstrates invoking a Direct Method on a device. If you wish to invoke a Direct Method in an IoT Edge Module, you would need to modify the url request as shown below:
+> The example above demonstrates invoking a direct method on a device. If you want to invoke a direct method in an IoT Edge Module, you would need to modify the url request as shown below:
+>
+> ```bash
+> https://<iothubName>.azure-devices.net/twins/<deviceId>/modules/<moduleName>/methods?api-version=2021-04-12
+> ```
-```bash
-https://<iothubName>.azure-devices.net/twins/<deviceId>/modules/<moduleName>/methods?api-version=2021-04-12
-```
### Response The back-end app receives a response that is made up of the following items:
The back-end app receives a response that is made up of the following items:
* 404 indicates that either device ID is invalid, or that the device was not online upon invocation of a direct method and for `connectTimeoutInSeconds` thereafter (use accompanied error message to understand the root cause); * 504 indicates gateway timeout caused by device not responding to a direct method call within `responseTimeoutInSeconds`.
-* *Headers* that contain the ETag, request ID, content type, and content encoding.
+* *Headers* that contain the request ID, content type, and content encoding.
* A JSON *body* in the following format:
The back-end app receives a response that is made up of the following items:
} ```
- Both `status` and `body` are provided by the device and used to respond with the device's own status code and/or description.
+ Both `status` and `payload` are provided by the device and used to respond with the device's own status code and the method response.
### Method invocation for IoT Edge modules
-Invoking direct methods using a module ID is supported in the [IoT Service Client C# SDK](https://www.nuget.org/packages/Microsoft.Azure.Devices/).
+Invoking direct methods on a module is supported by the [Invoke module method](/rest/api/iothub/service/modules/invoke-method) REST API or its equivalent in one of the IoT Hub service SDKs.
-For this purpose, use the `ServiceClient.InvokeDeviceMethodAsync()` method and pass in the `deviceId` and `moduleId` as parameters.
+The `moduleId` is passed along with the `deviceId` in the request URI when using the REST API or as a parameter when using a service SDK. For example, `https://<iothubName>.azure-devices.net/twins/<deviceId>/modules/<moduleName>/methods?api-version=2021-04-12`. The request body and response is similar to that of direct methods invoked on the device.
## Handle a direct method on a device
-Let's look at how to handle a direct method on an IoT device.
+On an IoT device, direct methods can be received over MQTT, AMQP, or either of these protocols over WebSockets. The [IoT Hub device SDKs](iot-hub-devguide-sdks.md#azure-iot-hub-device-sdks) help you receive and respond to direct methods on devices without having to worry about the underlying protocol details.
### MQTT
-The following section is for the MQTT protocol.
+The following section is for the MQTT protocol. To learn more about using the MQTT protocol directly with IoT Hub, see [MQTT protocol support](iot-hub-mqtt-support.md).
#### Method invocation
-Devices receive direct method requests on the MQTT topic: `$iothub/methods/POST/{method name}/?$rid={request id}`. The number of subscriptions per device is limited to 5. It is therefore recommended not to subscribe to each direct method individually. Instead consider subscribing to `$iothub/methods/POST/#` and then filter the delivered messages based on your desired method names.
+Devices receive direct method requests on the MQTT topic: `$iothub/methods/POST/{method name}/?$rid={request id}`. However, the `request id` is generated by IoT Hub and cannot be known ahead of time, so subscribe to `$iothub/methods/POST/#` and then filter the delivered messages based on method names supported by your device. (You'll use the `request id` to respond.)
The body that the device receives is in the following format:
The body is set by the device and can be any status.
### AMQP
-The following section is for the AMQP protocol.
+The following section is for the AMQP protocol. To learn more about using the AMQP protocol directly with IoT Hub, see [AMQP protocol support](iot-hub-amqp-support.md).
#### Method invocation
load-balancer Load Balancer Multivip Overview https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/load-balancer/load-balancer-multivip-overview.md
For this scenario, every VM in the backend pool has three network interfaces:
* Frontend 1: a loopback interface within guest OS that is configured with IP address of Frontend 1 * Frontend 2: a loopback interface within guest OS that is configured with IP address of Frontend 2
-For each VM in the backend pool, run the following commands at a Windows Command Prompt.
-
-To get the list of interface names you have on your VM, type this command:
-
-```console
-netsh interface show interface
-```
-
-For the VM NIC (Azure managed), type this command:
-
-```console
-netsh interface ipv4 set interface ΓÇ£interfacenameΓÇ¥ weakhostreceive=enabled
-```
-
-(replace interfacename with the name of this interface)
-
-For each loopback interface you added, repeat these commands:
-
-```console
-netsh interface ipv4 set interface ΓÇ£interfacenameΓÇ¥ weakhostreceive=enabled
-```
-
-(replace interfacename with the name of this loopback interface)
-
-```console
-netsh interface ipv4 set interface ΓÇ£interfacenameΓÇ¥ weakhostsend=enabled
-```
-
-(replace interfacename with the name of this loopback interface)
-
-> [!IMPORTANT]
-> The configuration of the loopback interfaces is performed within the guest OS. This configuration is not performed or managed by Azure. Without this configuration, the rules will not function. Health probe definitions use the DIP of the VM rather than the loopback interface representing the DSR Frontend. Therefore, your service must provide probe responses on a DIP port that reflect the status of the service offered on the loopback interface representing the DSR Frontend.
-- Let's assume the same frontend configuration as in the previous scenario: | Frontend | IP address | protocol | port |
Notice that this example does not change the destination port. Even though this
The Floating IP rule type is the foundation of several load balancer configuration patterns. One example that is currently available is the [Configure one or more Always On availability group listeners](/azure/azure-sql/virtual-machines/windows/availability-group-listener-powershell-configure) configuration. Over time, we will document more of these scenarios.
+> [!NOTE]
+> For more detailed information on the specific Guest OS configurations required to enable Floating IP, please refer to [Azure Load Balancer Floating IP configuration](load-balancer-floating-ip.md).
+ ## Limitations * Multiple frontend configurations are only supported with IaaS VMs and virtual machine scale sets.
machine-learning Train Pytorch Model https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/component-reference/train-pytorch-model.md
Currently, **Train PyTorch Model** component supports both single node and distr
> In distributed training, to keep gradient descent stable, the actual learning rate is calculated by `lr * torch.distributed.get_world_size()` because batch size of the process group is world size times that of single process. > Polynomial learning rate decay is applied and can help result in a better performing model.
-8. For **Random seed**, optionally type an integer value to use as the seed. Using a seed is recommended if you want to ensure reproducibility of the experiment across runs.
+8. For **Random seed**, optionally type an integer value to use as the seed. Using a seed is recommended if you want to ensure reproducibility of the experiment across jobs.
9. For **Patience**, specify how many epochs to early stop training if validation loss does not decrease consecutively. by default 3.
Click on this component 'Metrics' tab and see training metric graphs, such as 'T
### How to enable distributed training
-To enable distributed training for **Train PyTorch Model** component, you can set in **Run settings** in the right pane of the component. Only **[AML Compute cluster](../how-to-create-attach-compute-cluster.md?tabs=python)** is supported for distributed training.
+To enable distributed training for **Train PyTorch Model** component, you can set in **Job settings** in the right pane of the component. Only **[AML Compute cluster](../how-to-create-attach-compute-cluster.md?tabs=python)** is supported for distributed training.
> [!NOTE] > **Multiple GPUs** are required to activate distributed training because NCCL backend Train PyTorch Model component uses needs cuda.
-1. Select the component and open the right panel. Expand the **Run settings** section.
+1. Select the component and open the right panel. Expand the **Job settings** section.
[![Screenshot showing how to set distributed training in runsetting](./media/module/distributed-training-run-setting.png)](./media/module/distributed-training-run-setting.png#lightbox)
You can refer to [this article](designer-error-codes.md) for more details about
## Results
-After pipeline run is completed, to use the model for scoring, connect the [Train PyTorch Model](train-PyTorch-model.md) to [Score Image Model](score-image-model.md), to predict values for new input examples.
+After pipeline job is completed, to use the model for scoring, connect the [Train PyTorch Model](train-PyTorch-model.md) to [Score Image Model](score-image-model.md), to predict values for new input examples.
## Technical notes ### Expected inputs
machine-learning Train Svd Recommender https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/component-reference/train-svd-recommender.md
From this sample, you can see that a single user has rated several movies.
## Results
-After pipeline run is completed, to use the model for scoring, connect the [Train SVD Recommender](train-svd-recommender.md) to [Score SVD Recommender](score-svd-recommender.md), to predict values for new input examples.
+After pipeline job is completed, to use the model for scoring, connect the [Train SVD Recommender](train-svd-recommender.md) to [Score SVD Recommender](score-svd-recommender.md), to predict values for new input examples.
## Next steps
machine-learning Train Vowpal Wabbit Model https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/component-reference/train-vowpal-wabbit-model.md
The data can be read from two kinds of datasets, file dataset or tabular dataset
- **VW** represents the internal format used by Vowpal Wabbit . See the [Vowpal Wabbit wiki page](https://github.com/JohnLangford/vowpal_wabbit/wiki/Input-format) for details. - **SVMLight** is a format used by some other machine learning tools.
-6. **Output readable model file**: select the option if you want the component to save the readable model to the run records. This argument corresponds to the `--readable_model` parameter in the VW command line.
+6. **Output readable model file**: select the option if you want the component to save the readable model to the job records. This argument corresponds to the `--readable_model` parameter in the VW command line.
-7. **Output inverted hash file**: select the option if you want the component to save the inverted hashing function to one file in the run records. This argument corresponds to the `--invert_hash` parameter in the VW command line.
+7. **Output inverted hash file**: select the option if you want the component to save the inverted hashing function to one file in the job records. This argument corresponds to the `--invert_hash` parameter in the VW command line.
8. Submit the pipeline.
Vowpal Wabbit supports incremental training by adding new data to an existing mo
2. Connect the previously trained model to the **Pre-trained Vowpal Wabbit Model** input port of the component. 3. Connect the new training data to the **Training data** input port of the component. 4. In the parameters pane of **Train Vowpal Wabbit Model**, specify the format of the new training data, and also the training data file name if the input dataset is a directory.
-5. Select the **Output readable model file** and **Output inverted hash file** options if the corresponding files need to be saved in the run records.
+5. Select the **Output readable model file** and **Output inverted hash file** options if the corresponding files need to be saved in the job records.
6. Submit the pipeline. 7. Select the component and select **Register dataset** under **Outputs+logs** tab in the right pane, to preserve the updated model in your Azure Machine Learning workspace. If you don't specify a new name, the updated model overwrites the existing saved model.
machine-learning Train Wide And Deep Recommender https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/component-reference/train-wide-and-deep-recommender.md
In distributed training the workload to train a model is split up and shared amo
### How to enable distributed training
-To enable distributed training for **Train Wide & Deep Recommender** component, you can set in **Run settings** in the right pane of the component. Only **[AML Compute cluster](../how-to-create-attach-compute-cluster.md?tabs=python)** is supported for distributed training.
+To enable distributed training for **Train Wide & Deep Recommender** component, you can set in **Job settings** in the right pane of the component. Only **[AML Compute cluster](../how-to-create-attach-compute-cluster.md?tabs=python)** is supported for distributed training.
-1. Select the component and open the right panel. Expand the **Run settings** section.
+1. Select the component and open the right panel. Expand the **Job settings** section.
- [![Screenshot showing how to set distributed training in run setting](./media/module/distributed-training-run-setting.png)](./media/module/distributed-training-run-setting.png#lightbox)
+ [![Screenshot showing how to set distributed training in job setting](./media/module/distributed-training-run-setting.png)](./media/module/distributed-training-run-setting.png#lightbox)
1. Make sure you have select AML compute for the compute target.
machine-learning How To Manage Environments V2 https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-manage-environments-v2.md
Azure ML will start building the image from the build context when the environme
You can define an environment using a standard conda YAML configuration file that includes the dependencies for the conda environment. See [Creating an environment manually](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually) for information on this standard format.
-You must also specify a base Docker image for this environment. Azure ML will build the conda environment on top of the Docker image provided. If you install some Python dependencies in your Docker image, those packages will not exist in the execution environment thus causing runtime failures. By default, Azure ML will build a Conda environment with dependencies you specified, and will execute the run in that environment instead of using any Python libraries that you installed on the base image.
+You must also specify a base Docker image for this environment. Azure ML will build the conda environment on top of the Docker image provided. If you install some Python dependencies in your Docker image, those packages will not exist in the execution environment thus causing runtime failures. By default, Azure ML will build a Conda environment with dependencies you specified, and will execute the job in that environment instead of using any Python libraries that you installed on the base image.
The following example is a YAML specification file for an environment defined from a conda specification. Here the relative path to the conda file from the Azure ML environment YAML file is specified via the `conda_file` property. You can alternatively define the conda specification inline using the `conda_file` property, rather than defining it in a separate file.
machine-learning How To Manage Optimize Cost https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-manage-optimize-cost.md
Last updated 06/08/2021
+[//]: # (needs PM review; ParallelJobStep or ParallelRunStep?)
+ # Manage and optimize Azure Machine Learning costs Learn how to manage and optimize costs when training and deploying machine learning models to Azure Machine Learning.
Use the following tips to help you manage and optimize your compute resource cos
- Configure your training clusters for autoscaling - Set quotas on your subscription and workspaces-- Set termination policies on your training run
+- Set termination policies on your training job
- Use low-priority virtual machines (VM) - Schedule compute instances to shut down and start up automatically - Use an Azure Reserved VM Instance
Because these compute pools are inside of Azure's IaaS infrastructure, you can d
Autoscaling clusters based on the requirements of your workload helps reduce your costs so you only use what you need.
-AmlCompute clusters are designed to scale dynamically based on your workload. The cluster can be scaled up to the maximum number of nodes you configure. As each run completes, the cluster will release nodes and scale to your configured minimum node count.
+AmlCompute clusters are designed to scale dynamically based on your workload. The cluster can be scaled up to the maximum number of nodes you configure. As each job completes, the cluster will release nodes and scale to your configured minimum node count.
[!INCLUDE [min-nodes-note](../../includes/machine-learning-min-nodes.md)]
Also configure [workspace level quota by VM family](how-to-manage-quotas.md#work
To set quotas at the workspace level, start in the [Azure portal](https://portal.azure.com). Select any workspace in your subscription, and select **Usages + quotas** in the left pane. Then select the **Configure quotas** tab to view the quotas. You need privileges at the subscription scope to set the quota, since it's a setting that affects multiple workspaces.
-## Set run autotermination policies
+## Set job autotermination policies
In some cases, you should configure your training runs to limit their duration or terminate them early. For example, when you are using Azure Machine Learning's built-in hyperparameter tuning or automated machine learning. Here are a few options that you have: * Define a parameter called `max_run_duration_seconds` in your RunConfiguration to control the maximum duration a run can extend to on the compute you choose (either local or remote cloud compute). * For [hyperparameter tuning](how-to-tune-hyperparameters.md#early-termination), define an early termination policy from a Bandit policy, a Median stopping policy, or a Truncation selection policy. To further control hyperparameter sweeps, use parameters such as `max_total_runs` or `max_duration_minutes`.
-* For [automated machine learning](how-to-configure-auto-train.md#exit), set similar termination policies using the `enable_early_stopping` flag. Also use properties such as `iteration_timeout_minutes` and `experiment_timeout_minutes` to control the maximum duration of a run or for the entire experiment.
+* For [automated machine learning](how-to-configure-auto-train.md#exit), set similar termination policies using the `enable_early_stopping` flag. Also use properties such as `iteration_timeout_minutes` and `experiment_timeout_minutes` to control the maximum duration of a job or for the entire experiment.
## <a id="low-pri-vm"></a> Use low-priority VMs
Azure Machine Learning Compute supports reserved instances inherently. If you pu
## Train locally
-When prototyping and running training jobs that are small enough to run on your local computer, consider training locally. Using the Python SDK, setting your compute target to `local` executes your script locally. For more information, see [Configure and submit training runs](how-to-set-up-training-targets.md#select-a-compute-target).
+When prototyping and running training jobs that are small enough to run on your local computer, consider training locally. Using the Python SDK, setting your compute target to `local` executes your script locally. For more information, see [Configure and submit training jobs](how-to-set-up-training-targets.md#select-a-compute-target).
Visual Studio Code provides a full-featured environment for developing your machine learning applications. Using the Azure Machine Learning visual Visual Studio Code extension and Docker, you can run and debug locally. For more information, see [interactive debugging with Visual Studio Code](how-to-debug-visual-studio-code.md).
machine-learning How To Manage Quotas https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-manage-quotas.md
The following table shows additional limits in the platform. Please reach out to
| Job lifetime on a low-priority node | 7 days<sup>2</sup> | | Parameter servers per node | 1 |
-<sup>1</sup> Maximum lifetime is the duration between when a run starts and when it finishes. Completed runs persist indefinitely. Data for runs not completed within the maximum lifetime is not accessible.
+<sup>1</sup> Maximum lifetime is the duration between when a job starts and when it finishes. Completed jobs persist indefinitely. Data for jobs not completed within the maximum lifetime is not accessible.
<sup>2</sup> Jobs on a low-priority node can be preempted whenever there's a capacity constraint. We recommend that you implement checkpoints in your job.
machine-learning How To Manage Resources Vscode https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-manage-resources-vscode.md
To view your job in Azure Machine Learning studio:
Alternatively, use the `> Azure ML: View Experiment in Studio` command respectively in the command palette.
-### Track run progress
+### Track job progress
-As you're running your job, you may want to see its progress. To track the progress of a run in Azure Machine Learning studio from the extension:
+As you're running your job, you may want to see its progress. To track the progress of a job in Azure Machine Learning studio from the extension:
1. Expand the subscription node that contains your workspace. 1. Expand the **Experiments** node inside your workspace. 1. Expand the job node you want to track progress for.
-1. Right-click the run and select **View Run in Studio**.
-1. A prompt appears asking you to open the run URL in Azure Machine Learning studio. Select **Open**.
+1. Right-click the job and select **View Job in Studio**.
+1. A prompt appears asking you to open the job URL in Azure Machine Learning studio. Select **Open**.
-### Download run logs & outputs
+### Download job logs & outputs
-Once a run is complete, you may want to download the logs and assets such as the model generated as part of a run.
+Once a job is complete, you may want to download the logs and assets such as the model generated as part of a job.
1. Expand the subscription node that contains your workspace. 1. Expand the **Experiments** node inside your workspace. 1. Expand the job node you want to download logs and outputs for.
-1. Right-click the run:
+1. Right-click the job:
- To download the outputs, select **Download outputs**. - To download the logs, select **Download logs**.
machine-learning How To Migrate From Estimators To Scriptrunconfig https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-migrate-from-estimators-to-scriptrunconfig.md
This article covers common considerations when migrating from Estimators to Scri
Azure Machine Learning documentation and samples have been updated to use [ScriptRunConfig](/python/api/azureml-core/azureml.core.script_run_config.scriptrunconfig) for job configuration and submission. For information on using ScriptRunConfig, refer to the following documentation:
-* [Configure and submit training runs](how-to-set-up-training-targets.md)
-* [Configuring PyTorch training runs](how-to-train-pytorch.md)
-* [Configuring TensorFlow training runs](how-to-train-tensorflow.md)
-* [Configuring scikit-learn training runs](how-to-train-scikit-learn.md)
+* [Configure and submit training jobs](how-to-set-up-training-targets.md)
+* [Configuring PyTorch training jobs](how-to-train-pytorch.md)
+* [Configuring TensorFlow training jobs](how-to-train-tensorflow.md)
+* [Configuring scikit-learn training jobs](how-to-train-scikit-learn.md)
In addition, refer to the following samples & tutorials: * [Azure/MachineLearningNotebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/ml-frameworks)
src.run_config
## Next steps
-* [Configure and submit training runs](how-to-set-up-training-targets.md)
+* [Configure and submit training jobs](how-to-set-up-training-targets.md)
machine-learning How To Monitor Datasets https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-monitor-datasets.md
monitor = monitor.enable_schedule()
| Features | List of features that will be analyzed for data drift over time. | Set to a model's output feature(s) to measure concept drift. Don't include features that naturally drift over time (month, year, index, etc.). You can backfill and existing data drift monitor after adjusting the list of features. | Yes | | Compute target | Azure Machine Learning compute target to run the dataset monitor jobs. | | Yes | | Enable | Enable or disable the schedule on the dataset monitor pipeline | Disable the schedule to analyze historical data with the backfill setting. It can be enabled after the dataset monitor is created. | Yes |
- | Frequency | The frequency that will be used to schedule the pipeline job and analyze historical data if running a backfill. Options include daily, weekly, or monthly. | Each run compares data in the target dataset according to the frequency: <li>Daily: Compare most recent complete day in target dataset with baseline <li>Weekly: Compare most recent complete week (Monday - Sunday) in target dataset with baseline <li>Monthly: Compare most recent complete month in target dataset with baseline | No |
+ | Frequency | The frequency that will be used to schedule the pipeline job and analyze historical data if running a backfill. Options include daily, weekly, or monthly. | Each job compares data in the target dataset according to the frequency: <li>Daily: Compare most recent complete day in target dataset with baseline <li>Weekly: Compare most recent complete week (Monday - Sunday) in target dataset with baseline <li>Monthly: Compare most recent complete month in target dataset with baseline | No |
| Latency | Time, in hours, it takes for data to arrive in the dataset. For instance, if it takes three days for data to arrive in the SQL DB the dataset encapsulates, set the latency to 72. | Cannot be changed after the dataset monitor is created | No | | Email addresses | Email addresses for alerting based on breach of the data drift percentage threshold. | Emails are sent through Azure Monitor. | Yes | | Threshold | Data drift percentage threshold for email alerting. | Further alerts and events can be set on many other metrics in the workspace's associated Application Insights resource. | Yes |
This section contains feature-level insights into the change in the selected fea
The target dataset is also profiled over time. The statistical distance between the baseline distribution of each feature is compared with the target dataset's over time. Conceptually, this is similar to the data drift magnitude. However this statistical distance is for an individual feature rather than all features. Min, max, and mean are also available.
-In the Azure Machine Learning studio, click on a bar in the graph to see the feature-level details for that date. By default, you see the baseline dataset's distribution and the most recent run's distribution of the same feature.
+In the Azure Machine Learning studio, click on a bar in the graph to see the feature-level details for that date. By default, you see the baseline dataset's distribution and the most recent job's distribution of the same feature.
:::image type="content" source="media/how-to-monitor-datasets/drift-by-feature.gif" alt-text="Drift magnitude by features":::
Limitations and known issues for data drift monitors:
* The time range when analyzing historical data is limited to 31 intervals of the monitor's frequency setting. * Limitation of 200 features, unless a feature list is not specified (all features used). * Compute size must be large enough to handle the data.
-* Ensure your dataset has data within the start and end date for a given monitor run.
+* Ensure your dataset has data within the start and end date for a given monitor job.
* Dataset monitors will only work on datasets that contain 50 rows or more. * Columns, or features, in the dataset are classified as categorical or numeric based on the conditions in the following table. If the feature does not meet these conditions - for instance, a column of type string with >100 unique values - the feature is dropped from our data drift algorithm, but is still profiled.
Limitations and known issues for data drift monitors:
* When you have created a data drift monitor but cannot see data on the **Dataset monitors** page in Azure Machine Learning studio, try the following. 1. Check if you have selected the right date range at the top of the page.
- 1. On the **Dataset Monitors** tab, select the experiment link to check run status. This link is on the far right of the table.
- 1. If run completed successfully, check driver logs to see how many metrics has been generated or if there's any warning messages. Find driver logs in the **Output + logs** tab after you click on an experiment.
+ 1. On the **Dataset Monitors** tab, select the experiment link to check job status. This link is on the far right of the table.
+ 1. If the job completed successfully, check the driver logs to see how many metrics have been generated or if there's any warning messages. Find driver logs in the **Output + logs** tab after you click on an experiment.
* If the SDK `backfill()` function does not generate the expected output, it may be due to an authentication issue. When you create the compute to pass into this function, do not use `Run.get_context().experiment.workspace.compute_targets`. Instead, use [ServicePrincipalAuthentication](/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication) such as the following to create the compute that you pass into that `backfill()` function:
machine-learning How To Monitor Tensorboard https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-monitor-tensorboard.md
Title: Visualize experiments with TensorBoard
-description: Launch TensorBoard to visualize experiment run histories and identify potential areas for hyperparameter tuning and retraining.
+description: Launch TensorBoard to visualize experiment job histories and identify potential areas for hyperparameter tuning and retraining.
-# Visualize experiment runs and metrics with TensorBoard and Azure Machine Learning
+[//]: # (needs PM review; Do URL Links names change if it includes 'Run')
+
+# Visualize experiment jobs and metrics with TensorBoard and Azure Machine Learning
[!INCLUDE [sdk v1](../../includes/machine-learning-sdk-v1.md)]
-In this article, you learn how to view your experiment runs and metrics in TensorBoard using [the `tensorboard` package](/python/api/azureml-tensorboard/) in the main Azure Machine Learning SDK. Once you've inspected your experiment runs, you can better tune and retrain your machine learning models.
+In this article, you learn how to view your experiment jobs and metrics in TensorBoard using [the `tensorboard` package](/python/api/azureml-tensorboard/) in the main Azure Machine Learning SDK. Once you've inspected your experiment jobs, you can better tune and retrain your machine learning models.
[TensorBoard](/python/api/azureml-tensorboard/azureml.tensorboard.tensorboard) is a suite of web applications for inspecting and understanding your experiment structure and performance. How you launch TensorBoard with Azure Machine Learning experiments depends on the type of experiment:
-+ If your experiment natively outputs log files that are consumable by TensorBoard, such as PyTorch, Chainer and TensorFlow experiments, then you can [launch TensorBoard directly](#launch-tensorboard) from experiment's run history.
++ If your experiment natively outputs log files that are consumable by TensorBoard, such as PyTorch, Chainer and TensorFlow experiments, then you can [launch TensorBoard directly](#launch-tensorboard) from experiment's job history.
-+ For experiments that don't natively output TensorBoard consumable files, such as like Scikit-learn or Azure Machine Learning experiments, use [the `export_to_tensorboard()` method](#export) to export the run histories as TensorBoard logs and launch TensorBoard from there.
++ For experiments that don't natively output TensorBoard consumable files, such as like Scikit-learn or Azure Machine Learning experiments, use [the `export_to_tensorboard()` method](#export) to export the job histories as TensorBoard logs and launch TensorBoard from there. > [!TIP]
-> The information in this document is primarily for data scientists and developers who want to monitor the model training process. If you are an administrator interested in monitoring resource usage and events from Azure Machine learning, such as quotas, completed training runs, or completed model deployments, see [Monitoring Azure Machine Learning](monitor-azure-machine-learning.md).
+> The information in this document is primarily for data scientists and developers who want to monitor the model training process. If you are an administrator interested in monitoring resource usage and events from Azure Machine learning, such as quotas, completed training jobs, or completed model deployments, see [Monitoring Azure Machine Learning](monitor-azure-machine-learning.md).
## Prerequisites
-* To launch TensorBoard and view your experiment run histories, your experiments need to have previously enabled logging to track its metrics and performance.
+* To launch TensorBoard and view your experiment job histories, your experiments need to have previously enabled logging to track its metrics and performance.
* The code in this document can be run in either of the following environments: * Azure Machine Learning compute instance - no downloads or installation necessary * Complete the [Quickstart: Get started with Azure Machine Learning](quickstart-create-resources.md) to create a dedicated notebook server pre-loaded with the SDK and the sample repository.
How you launch TensorBoard with Azure Machine Learning experiments depends on th
* [Create an Azure Machine Learning workspace](quickstart-create-resources.md). * [Create a workspace configuration file](how-to-configure-environment.md#workspace).
-## Option 1: Directly view run history in TensorBoard
+## Option 1: Directly view job history in TensorBoard
This option works for experiments that natively outputs log files consumable by TensorBoard, such as PyTorch, Chainer, and TensorFlow experiments. If that is not the case of your experiment, use [the `export_to_tensorboard()` method](#export) instead.
-The following example code uses the [MNIST demo experiment](https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py) from TensorFlow's repository in a remote compute target, Azure Machine Learning Compute. Next, we will configure and start a run for training the TensorFlow model, and then
+The following example code uses the [MNIST demo experiment](https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py) from TensorFlow's repository in a remote compute target, Azure Machine Learning Compute. Next, we will configure and start a job for training the TensorFlow model, and then
start TensorBoard against this TensorFlow experiment. ### Set experiment name and create project folder
tf_code = requests.get("https://raw.githubusercontent.com/tensorflow/tensorflow/
with open(os.path.join(exp_dir, "mnist_with_summaries.py"), "w") as file: file.write(tf_code.text) ```
-Throughout the MNIST code file, mnist_with_summaries.py, notice that there are lines that call `tf.summary.scalar()`, `tf.summary.histogram()`, `tf.summary.FileWriter()` etc. These methods group, log, and tag key metrics of your experiments into run history. The `tf.summary.FileWriter()` is especially important as it serializes the data from your logged experiment metrics, which allows for TensorBoard to generate visualizations off of them.
+Throughout the MNIST code file, mnist_with_summaries.py, notice that there are lines that call `tf.summary.scalar()`, `tf.summary.histogram()`, `tf.summary.FileWriter()` etc. These methods group, log, and tag key metrics of your experiments into job history. The `tf.summary.FileWriter()` is especially important as it serializes the data from your logged experiment metrics, which allows for TensorBoard to generate visualizations off of them.
### Configure experiment
-In the following, we configure our experiment and set up directories for logs and data. These logs will be uploaded to the run history, which TensorBoard accesses later.
+In the following, we configure our experiment and set up directories for logs and data. These logs will be uploaded to the job history, which TensorBoard accesses later.
> [!Note] > For this TensorFlow example, you will need to install TensorFlow on your local machine. Further, the TensorBoard module (that is, the one included with TensorFlow) must be accessible to this notebook's kernel, as the local machine is what runs TensorBoard.
if not path.exists(data_dir):
os.environ["TEST_TMPDIR"] = data_dir
-# Writing logs to ./logs results in their being uploaded to the run history,
+# Writing logs to ./logs results in their being uploaded to the job history,
# and thus, made accessible to our TensorBoard instance. args = ["--log_dir", logs_dir]
exp = Experiment(ws, experiment_name)
``` ### Create a cluster for your experiment
-We create an AmlCompute cluster for this experiment, however your experiments can be created in any environment and you are still able to launch TensorBoard against the experiment run history.
+We create an AmlCompute cluster for this experiment, however your experiments can be created in any environment and you are still able to launch TensorBoard against the experiment job history.
```Python from azureml.core.compute import ComputeTarget, AmlCompute
compute_target.wait_for_completion(show_output=True, min_node_count=None)
[!INCLUDE [low-pri-note](../../includes/machine-learning-low-pri-vm.md)]
-### Configure and submit training run
+### Configure and submit training job
Configure a training job by creating a ScriptRunConfig object.
run = exp.submit(src)
### Launch TensorBoard
-You can launch TensorBoard during your run or after it completes. In the following, we create a TensorBoard object instance, `tb`, that takes the experiment run history loaded in the `run`, and then launches TensorBoard with the `start()` method.
+You can launch TensorBoard during your run or after it completes. In the following, we create a TensorBoard object instance, `tb`, that takes the experiment job history loaded in the `job`, and then launches TensorBoard with the `start()` method.
-The [TensorBoard constructor](/python/api/azureml-tensorboard/azureml.tensorboard.tensorboard) takes an array of runs, so be sure and pass it in as a single-element array.
+The [TensorBoard constructor](/python/api/azureml-tensorboard/azureml.tensorboard.tensorboard) takes an array of jobs, so be sure and pass it in as a single-element array.
```python from azureml.tensorboard import Tensorboard
-tb = Tensorboard([run])
+tb = Tensorboard([job])
# If successful, start() returns a string with the URI of the instance. tb.start()
tb.stop()
## Option 2: Export history as log to view in TensorBoard
-The following code sets up a sample experiment, begins the logging process using the Azure Machine Learning run history APIs, and exports the experiment run history into logs consumable by TensorBoard for visualization.
+The following code sets up a sample experiment, begins the logging process using the Azure Machine Learning job history APIs, and exports the experiment job history into logs consumable by TensorBoard for visualization.
### Set up experiment
-The following code sets up a new experiment and names the run directory `root_run`.
+The following code sets up a new experiment and names the job directory `root_run`.
```python from azureml.core import Workspace, Experiment import azureml.core
-# set experiment name and run name
+# set experiment name and job name
ws = Workspace.from_config() experiment_name = 'export-to-tensorboard' exp = Experiment(ws, experiment_name)
for alpha in tqdm(alphas):
root_run.log("mse", mse) ```
-### Export runs to TensorBoard
+### Export jobs to TensorBoard
-With the SDK's [export_to_tensorboard()](/python/api/azureml-tensorboard/azureml.tensorboard.export) method, we can export the run history of our Azure machine learning experiment into TensorBoard logs, so we can view them via TensorBoard.
+With the SDK's [export_to_tensorboard()](/python/api/azureml-tensorboard/azureml.tensorboard.export) method, we can export the job history of our Azure machine learning experiment into TensorBoard logs, so we can view them via TensorBoard.
-In the following code, we create the folder `logdir` in our current working directory. This folder is where we will export our experiment run history and logs from `root_run` and then mark that run as completed.
+In the following code, we create the folder `logdir` in our current working directory. This folder is where we will export our experiment job history and logs from `root_run` and then mark that job as completed.
```Python from azureml.tensorboard.export import export_to_tensorboard
except os.error:
os.mkdir(log_path) print(logdir)
-# export run history for the project
+# export job history for the project
export_to_tensorboard(root_run, logdir) root_run.complete()
root_run.complete()
> You can also export a particular run to TensorBoard by specifying the name of the run `export_to_tensorboard(run_name, logdir)` ### Start and stop TensorBoard
-Once our run history for this experiment is exported, we can launch TensorBoard with the [start()](/python/api/azureml-tensorboard/azureml.tensorboard.tensorboard#start-start-browser-false-) method.
+Once our job history for this experiment is exported, we can launch TensorBoard with the [start()](/python/api/azureml-tensorboard/azureml.tensorboard.tensorboard#start-start-browser-false-) method.
```Python from azureml.tensorboard import Tensorboard
-# The TensorBoard constructor takes an array of runs, so be sure and pass it in as a single-element array here
+# The TensorBoard constructor takes an array of jobs, so be sure and pass it in as a single-element array here
tb = Tensorboard([], local_root=logdir, port=6006) # If successful, start() returns a string with the URI of the instance.
tb.stop()
## Next steps
-In this how-to you, created two experiments and learned how to launch TensorBoard against their run histories to identify areas for potential tuning and retraining.
+In this how-to you, created two experiments and learned how to launch TensorBoard against their job histories to identify areas for potential tuning and retraining.
* If you are satisfied with your model, head over to our [How to deploy a model](how-to-deploy-and-where.md) article. * Learn more about [hyperparameter tuning](how-to-tune-hyperparameters.md).
machine-learning How To Move Data In Out Of Pipelines https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/how-to-move-data-in-out-of-pipelines.md
step1_output_ds = step1_output_data.register_on_complete(name='processed_data',
Azure does not automatically delete intermediate data written with `OutputFileDatasetConfig`. To avoid storage charges for large amounts of unneeded data, you should either:
-* Programmatically delete intermediate data at the end of a pipeline run, when it is no longer needed
+* Programmatically delete intermediate data at the end of a pipeline job, when it is no longer needed
* Use blob storage with a short-term storage policy for intermediate data (see [Optimize costs by automating Azure Blob Storage access tiers](../storage/blobs/lifecycle-management-overview.md?tabs=azure-portal)) * Regularly review and delete no-longer-needed data
machine-learning Samples Designer https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/samples-designer.md
Here's how to use a designer sample:
1. In the dialog that appears, select an existing compute target or create a new one. Select **Save**.
- 1. Select **Submit** at the top of the canvas to submit a pipeline run.
+ 1. Select **Submit** at the top of the canvas to submit a pipeline job.
- Depending on the sample pipeline and compute settings, runs may take some time to complete. The default compute settings have a minimum node size of 0, which means that the designer must allocate resources after being idle. Repeated pipeline runs will take less time since the compute resources are already allocated. Additionally, the designer uses cached results for each component to further improve efficiency.
+ Depending on the sample pipeline and compute settings, jobs may take some time to complete. The default compute settings have a minimum node size of 0, which means that the designer must allocate resources after being idle. Repeated pipeline jobs will take less time since the compute resources are already allocated. Additionally, the designer uses cached results for each component to further improve efficiency.
1. After the pipeline finishes running, you can review the pipeline and view the output for each component to learn more. Use the following steps to view component outputs:
machine-learning Tutorial 1St Experiment Bring Data https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-1st-experiment-bring-data.md
Title: "Tutorial: Upload data and train a model"
-description: How to upload and use your own data in a remote training run. This is part 3 of a three-part getting-started series.
+description: How to upload and use your own data in a remote training job. This is part 3 of a three-part getting-started series.
This code will print a URL to the experiment in the Azure Machine Learning studi
### <a name="inspect-log"></a> Inspect the log file
-In the studio, go to the experiment run (by selecting the previous URL output) followed by **Outputs + logs**. Select the `std_log.txt` file. Scroll down through the log file until you see the following output:
+In the studio, go to the experiment job (by selecting the previous URL output) followed by **Outputs + logs**. Select the `std_log.txt` file. Scroll down through the log file until you see the following output:
```txt Processing 'input'.
Processing dataset FileDataset
Mounting input to /tmp/tmp9kituvp3. Mounted input to /tmp/tmp9kituvp3 as folder. Exit __enter__ of DatasetContextManager
-Entering Run History Context Manager.
+Entering Job History Context Manager.
Current directory: /mnt/batch/tasks/shared/LS_root/jobs/dsvm-aml/azureml/tutorial-session-3_1600171983_763c5381/mounts/workspaceblobstore/azureml/tutorial-session-3_1600171983_763c5381 Preparing to call script [ train.py ] with arguments: ['--data_path', '$input', '--learning_rate', '0.003', '--momentum', '0.92'] After variable expansion, calling script [ train.py ] with arguments: ['--data_path', '/tmp/tmp9kituvp3', '--learning_rate', '0.003', '--momentum', '0.92']
Notice:
## Clean up resources
-If you plan to continue now to another tutorial, or to start your own training runs, skip to [Next steps](#next-steps).
+If you plan to continue now to another tutorial, or to start your own training jobs, skip to [Next steps](#next-steps).
### Stop compute instance
You can also keep the resource group but delete a single workspace. Display the
In this tutorial, we saw how to upload data to Azure by using `Datastore`. The datastore served as cloud storage for your workspace, giving you a persistent and flexible place to keep your data.
-You saw how to modify your training script to accept a data path via the command line. By using `Dataset`, you were able to mount a directory to the remote run.
+You saw how to modify your training script to accept a data path via the command line. By using `Dataset`, you were able to mount a directory to the remote job.
Now that you have a model, learn:
machine-learning Tutorial 1St Experiment Hello World https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-1st-experiment-hello-world.md
Here's a description of how the control script works:
`experiment = Experiment( ... )` :::column-end::: :::column span="2":::
- [Experiment](/python/api/azureml-core/azureml.core.experiment.experiment) provides a simple way to organize multiple runs under a single name. Later you can see how experiments make it easy to compare metrics between dozens of runs.
+ [Experiment](/python/api/azureml-core/azureml.core.experiment.experiment) provides a simple way to organize multiple jobs under a single name. Later you can see how experiments make it easy to compare metrics between dozens of jobs.
:::column-end::: :::row-end::: :::row:::
Here's a description of how the control script works:
`run = experiment.submit(config)` :::column-end::: :::column span="2":::
- Submits your script. This submission is called a [run](/python/api/azureml-core/azureml.core.run%28class%29). A run encapsulates a single execution of your code. Use a run to monitor the script progress, capture the output, analyze the results, visualize metrics, and more.
+ Submits your script. This submission is called a [run](/python/api/azureml-core/azureml.core.run%28class%29). A job encapsulates a single execution of your code. Use a job to monitor the script progress, capture the output, analyze the results, visualize metrics, and more.
:::column-end::: :::row-end::: :::row:::
Here's a description of how the control script works:
1. In the terminal, you may be asked to sign in to authenticate. Copy the code and follow the link to complete this step.
-1. Once you're authenticated, you'll see a link in the terminal. Select the link to view the run.
+1. Once you're authenticated, you'll see a link in the terminal. Select the link to view the job.
[!INCLUDE [amlinclude-info](../../includes/machine-learning-py38-ignore.md)] ## View the output
-1. In the page that opens, you'll see the run status.
-1. When the status of the run is **Completed**, select **Output + logs** at the top of the page.
-1. Select **std_log.txt** to view the output of your run.
+1. In the page that opens, you'll see the job status.
+1. When the status of the job is **Completed**, select **Output + logs** at the top of the page.
+1. Select **std_log.txt** to view the output of your job.
## <a name="monitor"></a>Monitor your code in the cloud in the studio
Follow the link. At first, you'll see a status of **Queued** or **Preparing**.
* The compute cluster is resized from 0 to 1 node * The docker image is downloaded to the compute.
-Subsequent runs are much quicker (~15 seconds) as the docker image is cached on the compute. You can test this by resubmitting the code below after the first run has completed.
+Subsequent jobs are much quicker (~15 seconds) as the docker image is cached on the compute. You can test this by resubmitting the code below after the first job has completed.
-Wait about 10 minutes. You'll see a message that the run has completed. Then use **Refresh** to see the status change to *Completed*. Once the job completes, go to the **Outputs + logs** tab. There you can see a `std_log.txt` file that looks like this:
+Wait about 10 minutes. You'll see a message that the job has completed. Then use **Refresh** to see the status change to *Completed*. Once the job completes, go to the **Outputs + logs** tab. There you can see a `std_log.txt` file that looks like this:
```txt 1: [2020-08-04T22:15:44.407305] Entering context manager injector. 2: [context_manager_injector.py] Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath', 'RunHistory:context_managers.RunHistory', 'TrackUserError:context_managers.TrackUserError', 'UserExceptions:context_managers.UserExceptions'], invocation=['hello.py']) 3: Starting the daemon thread to refresh tokens in background for process with pid = 31263
- 4: Entering Run History Context Manager.
+ 4: Entering Job History Context Manager.
5: Preparing to call script [ hello.py ] with arguments: [] 6: After variable expansion, calling script [ hello.py ] with arguments: [] 7:
Wait about 10 minutes. You'll see a message that the run has completed. Then us
9: Starting the daemon thread to refresh tokens in background for process with pid = 31263 10: 11:
-12: The experiment completed successfully. Finalizing run...
+12: The experiment completed successfully. Finalizing job...
13: Logging experiment finalizing status in history service. 14: [2020-08-04T22:15:46.541334] TimeoutHandler __init__ 15: [2020-08-04T22:15:46.541396] TimeoutHandler __enter__
-16: Cleaning up all outstanding Run operations, waiting 300.0 seconds
+16: Cleaning up all outstanding Job operations, waiting 300.0 seconds
17: 1 items cleaning up... 18: Cleanup took 0.1812913417816162 seconds 19: [2020-08-04T22:15:47.040203] TimeoutHandler __exit__
Wait about 10 minutes. You'll see a message that the run has completed. Then us
On line 8, you see the "Hello world!" output.
-The `70_driver_log.txt` file contains the standard output from a run. This file can be useful when you're debugging remote runs in the cloud.
+The `70_driver_log.txt` file contains the standard output from a job. This file can be useful when you're debugging remote jobs in the cloud.
## Next steps
machine-learning Tutorial 1St Experiment Sdk Train https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-1st-experiment-sdk-train.md
if __name__ == "__main__":
1. Select **Save and run script in terminal** to run the *run-pytorch.py* script.
-1. You'll see a link in the terminal window that opens. Select the link to view the run.
+1. You'll see a link in the terminal window that opens. Select the link to view the job.
[!INCLUDE [amlinclude-info](../../includes/machine-learning-py38-ignore.md)] ### View the output
-1. In the page that opens, you'll see the run status. The first time you run this script, Azure Machine Learning will build a new Docker image from your PyTorch environment. The whole run might around 10 minutes to complete. This image will be reused in future runs to make them run much quicker.
+1. In the page that opens, you'll see the job status. The first time you run this script, Azure Machine Learning will build a new Docker image from your PyTorch environment. The whole job might around 10 minutes to complete. This image will be reused in future jobs to make them run much quicker.
1. You can see view Docker build logs in the Azure Machine Learning studio. Select the **Outputs + logs** tab, and then select **20_image_build_log.txt**.
-1. When the status of the run is **Completed**, select **Output + logs**.
-1. Select **std_log.txt** to view the output of your run.
+1. When the status of the job is **Completed**, select **Output + logs**.
+1. Select **std_log.txt** to view the output of your job.
```txt Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ../data/cifar-10-python.tar.gz
Select the **...** at the end of the folder, then select **Move** to move **data
Now that you have a model training in Azure Machine Learning, start tracking some performance metrics.
-The current training script prints metrics to the terminal. Azure Machine Learning provides a mechanism for logging metrics with more functionality. By adding a few lines of code, you gain the ability to visualize metrics in the studio and to compare metrics between multiple runs.
+The current training script prints metrics to the terminal. Azure Machine Learning provides a mechanism for logging metrics with more functionality. By adding a few lines of code, you gain the ability to visualize metrics in the studio and to compare metrics between multiple jobs.
### Modify *train.py* to include logging
machine-learning Tutorial Auto Train Models https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-auto-train-models.md
automl_config = AutoMLConfig(task='regression',
### Train the automatic regression model
-Create an experiment object in your workspace. An experiment acts as a container for your individual runs. Pass the defined `automl_config` object to the experiment, and set the output to `True` to view progress during the run.
+Create an experiment object in your workspace. An experiment acts as a container for your individual jobs. Pass the defined `automl_config` object to the experiment, and set the output to `True` to view progress during the job.
After starting the experiment, the output shown updates live as the experiment runs. For each iteration, you see the model type, the run duration, and the training accuracy. The field `BEST` tracks the best running training score based on your metric type.
BEST: The best observed score thus far.
## Explore the results
-Explore the results of automatic training with a [Jupyter widget](/python/api/azureml-widgets/azureml.widgets). The widget allows you to see a graph and table of all individual run iterations, along with training accuracy metrics and metadata. Additionally, you can filter on different accuracy metrics than your primary metric with the dropdown selector.
+Explore the results of automatic training with a [Jupyter widget](/python/api/azureml-widgets/azureml.widgets). The widget allows you to see a graph and table of all individual job iterations, along with training accuracy metrics and metadata. Additionally, you can filter on different accuracy metrics than your primary metric with the dropdown selector.
```python from azureml.widgets import RunDetails
machine-learning Tutorial Automated Ml Forecast https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-automated-ml-forecast.md
For this tutorial, you create your automated ML experiment run in Azure Machine
1. In the left pane, select **Automated ML** under the **Author** section.
-1. Select **+New automated ML run**.
+1. Select **+New automated ML job**.
## Create and load dataset
Before you configure your experiment, upload your data file to your workspace in
1. Select **Next**.
-## Configure run
+## Configure job
After you load and configure your data, set up your remote compute target and select which column in your data you want to predict.
-1. Populate the **Configure run** form as follows:
+1. Populate the **Configure job** form as follows:
1. Enter an experiment name: `automl-bikeshare` 1. Select **cnt** as the target column, what you want to predict. This column indicates the number of total bike share rentals.
Complete the setup for your automated ML experiment by specifying the machine le
## Run experiment
-To run your experiment, select **Finish**. The **Run details** screen opens with the **Run status** at the top next to the run number. This status updates as the experiment progresses. Notifications also appear in the top right corner of the studio, to inform you of the status of your experiment.
+To run your experiment, select **Finish**. The **Job details** screen opens with the **Job status** at the top next to the job number. This status updates as the experiment progresses. Notifications also appear in the top right corner of the studio, to inform you of the status of your experiment.
>[!IMPORTANT]
-> Preparation takes **10-15 minutes** to prepare the experiment run.
+> Preparation takes **10-15 minutes** to prepare the experiment job.
> Once running, it takes **2-3 minutes more for each iteration**.<br> <br> > In production, you'd likely walk away for a bit as this process takes time. While you wait, we suggest you start exploring the tested algorithms on the **Models** tab as they complete.
Automated machine learning in Azure Machine Learning studio allows you to deploy
For this experiment, deployment to a web service means that the bike share company now has an iterative and scalable web solution for forecasting bike share rental demand.
-Once the run is complete, navigate back to parent run page by selecting **Run 1** at the top of your screen.
+Once the job is complete, navigate back to parent job page by selecting **Job 1** at the top of your screen.
In the **Best model summary** section, the best model in the context of this experiment, is selected based on the **Normalized root mean squared error metric.**
We deploy this model, but be advised, deployment takes about 20 minutes to compl
1. Select **Deploy**.
- A green success message appears at the top of the **Run** screen stating that the deployment was started successfully. The progress of the deployment can be found in the **Model summary** pane under **Deploy status**.
+ A green success message appears at the top of the **Job** screen stating that the deployment was started successfully. The progress of the deployment can be found in the **Model summary** pane under **Deploy status**.
Once deployment succeeds, you have an operational web service to generate predictions.
machine-learning Tutorial Convert Ml Experiment To Production https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-convert-ml-experiment-to-production.md
if __name__ == '__main__':
`train.py` can now be invoked from a terminal by running `python train.py`. The functions from `train.py` can also be called from other files.
-The `train_aml.py` file found in the `diabetes_regression/training` directory in the MLOpsPython repository calls the functions defined in `train.py` in the context of an Azure Machine Learning experiment run. The functions can also be called in unit tests, covered later in this guide.
+The `train_aml.py` file found in the `diabetes_regression/training` directory in the MLOpsPython repository calls the functions defined in `train.py` in the context of an Azure Machine Learning experiment job. The functions can also be called in unit tests, covered later in this guide.
### Create Python file for the Diabetes Ridge Regression Scoring notebook
def test_train_model():
Now that you understand how to convert from an experiment to production code, see the following links for more information and next steps: + [MLOpsPython](https://github.com/microsoft/MLOpsPython/blob/master/docs/custom_model.md): Build a CI/CD pipeline to train, evaluate and deploy your own model using Azure Pipelines and Azure Machine Learning
-+ [Monitor Azure ML experiment runs and metrics](./how-to-log-view-metrics.md)
++ [Monitor Azure ML experiment jobs and metrics](./how-to-log-view-metrics.md) + [Monitor and collect data from ML web service endpoints](./how-to-enable-app-insights.md)
machine-learning Tutorial Designer Automobile Price Deploy https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-designer-automobile-price-deploy.md
To deploy your pipeline, you must first convert the training pipeline into a rea
1. Select **Submit**, and use the same compute target and experiment that you used in part one.
- If this is the first run, it may take up to 20 minutes for your pipeline to finish running. The default compute settings have a minimum node size of 0, which means that the designer must allocate resources after being idle. Repeated pipeline runs will take less time since the compute resources are already allocated. Additionally, the designer uses cached results for each component to further improve efficiency.
+ If this is the first job, it may take up to 20 minutes for your pipeline to finish running. The default compute settings have a minimum node size of 0, which means that the designer must allocate resources after being idle. Repeated pipeline jobs will take less time since the compute resources are already allocated. Additionally, the designer uses cached results for each component to further improve efficiency.
1. Go to the real-time inference pipeline job detail by selecting **Job detail** link in the left pane.
machine-learning Tutorial Designer Automobile Price Train Score https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-designer-automobile-price-train-score.md
You need an Azure Machine Learning workspace to use the designer. The workspace
## Set the default compute target
-A pipeline runs on a compute target, which is a compute resource that's attached to your workspace. After you create a compute target, you can reuse it for future runs.
+A pipeline jobs on a compute target, which is a compute resource that's attached to your workspace. After you create a compute target, you can reuse it for future jobs.
> [!Important]
You can set a **Default compute target** for the entire pipeline, which will tel
1. Select **Create**. > [!NOTE]
- > It takes approximately five minutes to create a compute resource. After the resource is created, you can reuse it and skip this wait time for future runs.
+ > It takes approximately five minutes to create a compute resource. After the resource is created, you can reuse it and skip this wait time for future jobs.
> > The compute resource autoscales to zero nodes when it's idle to save cost. When you use it again after a delay, you might experience approximately five minutes of wait time while it scales back up.
Use the **Evaluate Model** component to evaluate how well your model scored the
## Submit the pipeline
-Now that your pipeline is all setup, you can submit a pipeline run to train your machine learning model. You can submit a valid pipeline run at any point, which can be used to review changes to your pipeline during development.
+Now that your pipeline is all setup, you can submit a pipeline job to train your machine learning model. You can submit a valid pipeline job at any point, which can be used to review changes to your pipeline during development.
1. At the top of the canvas, select **Submit**. 1. In the **Set up pipeline job** dialog box, select **Create new**. > [!NOTE]
- > Experiments group similar pipeline runs together. If you run a pipeline multiple times, you can select the same experiment for successive runs.
+ > Experiments group similar pipeline jobs together. If you run a pipeline multiple times, you can select the same experiment for successive jobs.
1. For **New experiment Name**, enter **Tutorial-CarPrices**.
Now that your pipeline is all setup, you can submit a pipeline run to train your
:::image type="content" source="./media/how-to-run-batch-predictions-designer/submission-list.png" alt-text="Screenshot of the submitted jobs list with a success notification.":::
- If this is the first run, it may take up to 20 minutes for your pipeline to finish running. The default compute settings have a minimum node size of 0, which means that the designer must allocate resources after being idle. Repeated pipeline runs will take less time since the compute resources are already allocated. Additionally, the designer uses cached results for each component to further improve efficiency.
+ If this is the first job, it may take up to 20 minutes for your pipeline to finish running. The default compute settings have a minimum node size of 0, which means that the designer must allocate resources after being idle. Repeated pipeline jobs will take less time since the compute resources are already allocated. Additionally, the designer uses cached results for each component to further improve efficiency.
### View scored labels
In the job detail page, you can check the pipeline job status, results and logs.
:::image type="content" source="./media/tutorial-designer-automobile-price-train-score/score-result.png" alt-text="Screenshot showing the pipeline job detail page.":::
-After the run completes, you can view the results of the pipeline run. First, look at the predictions generated by the regression model.
+After the job completes, you can view the results of the pipeline job. First, look at the predictions generated by the regression model.
1. Right-click the **Score Model** component, and select **Preview data** > **Scored dataset** to view its output.
machine-learning Tutorial First Experiment Automated Ml https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-first-experiment-automated-ml.md
You complete the following experiment set-up and run steps via the Azure Machin
![Get started page](./media/tutorial-first-experiment-automated-ml/get-started.png)
-1. Select **+New automated ML run**.
+1. Select **+New automated ML job**.
## Create and load dataset
Before you configure your experiment, upload your data file to your workspace in
1. Select **Next**.
-## Configure run
+## Configure job
After you load and configure your data, you can set up your experiment. This setup includes experiment design tasks such as, selecting the size of your compute environment and specifying what column you want to predict. 1. Select the **Create new** radio button.
-1. Populate the **Configure Run** form as follows:
+1. Populate the **Configure Job** form as follows:
1. Enter this experiment name: `my-1st-automl-experiment` 1. Select **y** as the target column, what you want to predict. This column indicates whether the client subscribed to a term deposit or not.
After you load and configure your data, you can set up your experiment. This set
1. Select k-fold cross-validation as your **Validation type**. 1. Select 2 as your **Number of cross validations**.
-1. Select **Finish** to run the experiment. The **Run Detail** screen opens with the **Run status** at the top as the experiment preparation begins. This status updates as the experiment progresses. Notifications also appear in the top right corner of the studio to inform you of the status of your experiment.
+1. Select **Finish** to run the experiment. The **Job Detail** screen opens with the **Job status** at the top as the experiment preparation begins. This status updates as the experiment progresses. Notifications also appear in the top right corner of the studio to inform you of the status of your experiment.
>[!IMPORTANT] > Preparation takes **10-15 minutes** to prepare the experiment run.
These model explanations can be generated on demand, and are summarized in the m
To generate model explanations,
-1. Select **Run 1** at the top to navigate back to the **Models** screen.
+1. Select **Job 1** at the top to navigate back to the **Models** screen.
1. Select the **Models** tab. 1. For this tutorial, select the first **MaxAbsScaler, LightGBM** model. 1. Select the **Explain model** button at the top. On the right, the **Explain model** pane appears.
-1. Select the **automl-compute** that you created previously. This compute cluster initiates a child run to generate the model explanations.
+1. Select the **automl-compute** that you created previously. This compute cluster initiates a child job to generate the model explanations.
1. Select **Create** at the bottom. A green success message appears towards the top of your screen. >[!NOTE]
- > The explainability run takes about 2-5 minutes to complete.
+ > The explainability job takes about 2-5 minutes to complete.
1. Select the **Explanations (preview)** button. This tab populates once the explainability run completes. 1. On the left hand side, expand the pane and select the row that says **raw** under **Features**. 1. Select the **Aggregate feature importance** tab on the right. This chart shows which data features influenced the predictions of the selected model.
The automated machine learning interface allows you to deploy the best model as
For this experiment, deployment to a web service means that the financial institution now has an iterative and scalable web solution for identifying potential fixed term deposit customers.
-Check to see if your experiment run is complete. To do so, navigate back to the parent run page by selecting **Run 1** at the top of your screen. A **Completed** status is shown on the top left of the screen.
+Check to see if your experiment run is complete. To do so, navigate back to the parent job page by selecting **Job 1** at the top of your screen. A **Completed** status is shown on the top left of the screen.
Once the experiment run is complete, the **Details** page is populated with a **Best model summary** section. In this experiment context, **VotingEnsemble** is considered the best model, based on the **AUC_weighted** metric.
We deploy this model, but be advised, deployment takes about 20 minutes to compl
1. Select **Deploy**.
- A green success message appears at the top of the **Run** screen, and in the **Model summary** pane, a status message appears under **Deploy status**. Select **Refresh** periodically to check the deployment status.
+ A green success message appears at the top of the **Job** screen, and in the **Model summary** pane, a status message appears under **Deploy status**. Select **Refresh** periodically to check the deployment status.
Now you have an operational web service to generate predictions.
machine-learning Tutorial Train Deploy Image Classification Model Vscode https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/tutorial-train-deploy-image-classification-model-vscode.md
To submit the training job:
1. Open the *job.yml* file. 1. Right-click the file in the text editor and select **Azure ML: Execute YAML**.
-At this point, a request is sent to Azure to run your experiment on the selected compute target in your workspace. This process takes several minutes. The amount of time to run the training job is impacted by several factors like the compute type and training data size. To track the progress of your experiment, right-click the current run node and select **View Run in Azure portal**.
+At this point, a request is sent to Azure to run your experiment on the selected compute target in your workspace. This process takes several minutes. The amount of time to run the training job is impacted by several factors like the compute type and training data size. To track the progress of your experiment, right-click the current run node and select **View Job in Azure portal**.
When the dialog requesting to open an external website appears, select **Open**.
machine-learning Concept Automated Ml V1 https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/concept-automated-ml-v1.md
These settings can be applied to the best model as a result of your automated ML
|**Enable/disable ONNX model compatibility**|Γ£ô|| |**Test the model** | Γ£ô| Γ£ô (preview)|
-### Run control settings
+### Job control settings
-These settings allow you to review and control your experiment runs and its child runs.
+These settings allow you to review and control your experiment jobs and its child jobs.
| |The Python SDK|The studio web experience| |-|:-:|:-:|
-|**Run summary table**| Γ£ô|Γ£ô|
-|**Cancel runs & child runs**| Γ£ô|Γ£ô|
+|**Job summary table**| Γ£ô|Γ£ô|
+|**Cancel jobs & child jobs**| Γ£ô|Γ£ô|
|**Get guardrails**| Γ£ô|Γ£ô|
-|**Pause & resume runs**| Γ£ô| |
+|**Pause & resume jobs**| Γ£ô| |
## When to use AutoML: classification, regression, forecasting, computer vision & NLP
With this capability you can:
* Download or deploy the resulting model as a web service in Azure Machine Learning. * Operationalize at scale, leveraging Azure Machine Learning [MLOps](concept-model-management-and-deployment.md) and [ML Pipelines (v1)](../concept-ml-pipelines.md) capabilities.
-Authoring AutoML models for vision tasks is supported via the Azure ML Python SDK. The resulting experimentation runs, models, and outputs can be accessed from the Azure Machine Learning studio UI.
+Authoring AutoML models for vision tasks is supported via the Azure ML Python SDK. The resulting experimentation jobs, models, and outputs can be accessed from the Azure Machine Learning studio UI.
Learn how to [set up AutoML training for computer vision models](../how-to-auto-train-image-models.md).
Instance segmentation | Tasks to identify objects in an image at the pixel level
[!INCLUDE [preview disclaimer](../../../includes/machine-learning-preview-generic-disclaimer.md)]
-Support for natural language processing (NLP) tasks in automated ML allows you to easily generate models trained on text data for text classification and named entity recognition scenarios. Authoring automated ML trained NLP models is supported via the Azure Machine Learning Python SDK. The resulting experimentation runs, models, and outputs can be accessed from the Azure Machine Learning studio UI.
+Support for natural language processing (NLP) tasks in automated ML allows you to easily generate models trained on text data for text classification and named entity recognition scenarios. Authoring automated ML trained NLP models is supported via the Azure Machine Learning Python SDK. The resulting experimentation jobs, models, and outputs can be accessed from the Azure Machine Learning studio UI.
The NLP capability supports:
Using **Azure Machine Learning**, you can design and run your automated ML train
1. **Configure the compute target for model training**, such as your [local computer, Azure Machine Learning Computes, remote VMs, or Azure Databricks with SDK v1](../how-to-set-up-training-targets.md). 1. **Configure the automated machine learning parameters** that determine how many iterations over different models, hyperparameter settings, advanced preprocessing/featurization, and what metrics to look at when determining the best model.
-1. **Submit the training run.**
+1. **Submit the training job.**
1. **Review the results**
The following diagram illustrates this process.
-You can also inspect the logged run information, which [contains metrics](../how-to-understand-automated-ml.md) gathered during the run. The training run produces a Python serialized object (`.pkl` file) that contains the model and data preprocessing.
+You can also inspect the logged job information, which [contains metrics](../how-to-understand-automated-ml.md) gathered during the job. The training job produces a Python serialized object (`.pkl` file) that contains the model and data preprocessing.
While model building is automated, you can also [learn how important or relevant features are](../how-to-configure-auto-train.md) to the generated models.
The web interface for automated ML always uses a remote [compute target](../conc
### Choose compute target Consider these factors when choosing your compute target:
- * **Choose a local compute**: If your scenario is about initial explorations or demos using small data and short trains (i.e. seconds or a couple of minutes per child run), training on your local computer might be a better choice. There is no setup time, the infrastructure resources (your PC or VM) are directly available.
- * **Choose a remote ML compute cluster**: If you are training with larger datasets like in production training creating models which need longer trains, remote compute will provide much better end-to-end time performance because `AutoML` will parallelize trains across the cluster's nodes. On a remote compute, the start-up time for the internal infrastructure will add around 1.5 minutes per child run, plus additional minutes for the cluster infrastructure if the VMs are not yet up and running.
+ * **Choose a local compute**: If your scenario is about initial explorations or demos using small data and short trains (i.e. seconds or a couple of minutes per child job), training on your local computer might be a better choice. There is no setup time, the infrastructure resources (your PC or VM) are directly available.
+ * **Choose a remote ML compute cluster**: If you are training with larger datasets like in production training creating models which need longer trains, remote compute will provide much better end-to-end time performance because `AutoML` will parallelize trains across the cluster's nodes. On a remote compute, the start-up time for the internal infrastructure will add around 1.5 minutes per child job, plus additional minutes for the cluster infrastructure if the VMs are not yet up and running.
### Pros and cons Consider these pros and cons when choosing to use local vs. remote. | | Pros (Advantages) |Cons (Handicaps) | |||||
-|**Local compute target** | <li> No environment start-up time | <li> Subset of features<li> Can't parallelize runs <li> Worse for large data. <li>No data streaming while training <li> No DNN-based featurization <li> Python SDK only |
-|**Remote ML compute clusters**| <li> Full set of features <li> Parallelize child runs <li> Large data support<li> DNN-based featurization <li> Dynamic scalability of compute cluster on demand <li> No-code experience (web UI) also available | <li> Start-up time for cluster nodes <li> Start-up time for each child run |
+|**Local compute target** | <li> No environment start-up time | <li> Subset of features<li> Can't parallelize jobs <li> Worse for large data. <li>No data streaming while training <li> No DNN-based featurization <li> Python SDK only |
+|**Remote ML compute clusters**| <li> Full set of features <li> Parallelize child jobs <li> Large data support<li> DNN-based featurization <li> Dynamic scalability of compute cluster on demand <li> No-code experience (web UI) also available | <li> Start-up time for cluster nodes <li> Start-up time for each child job |
### Feature availability
More features are available when you use the remote compute, as shown in the tab
| Out-of-the-box GPU support (training and inference) | Γ£ô | | | Image Classification and Labeling support | Γ£ô | | | Auto-ARIMA, Prophet and ForecastTCN models for forecasting | Γ£ô | |
-| Multiple runs/iterations in parallel | Γ£ô | |
+| Multiple jobs/iterations in parallel | Γ£ô | |
| Create models with interpretability in AutoML studio web experience UI | Γ£ô | | | Feature engineering customization in studio web experience UI| Γ£ô | | | Azure ML hyperparameter tuning | Γ£ô | | | Azure ML Pipeline workflow support | Γ£ô | |
-| Continue a run | Γ£ô | |
+| Continue a job | Γ£ô | |
| Forecasting | Γ£ô | Γ£ô | | Create and run experiments in notebooks | Γ£ô | Γ£ô | | Register and visualize experiment's info and metrics in UI | Γ£ô | Γ£ô |
To help confirm that such bias isn't applied to the final recommended model, aut
Learn how to [configure AutoML experiments to use test data (preview) with the SDK (v1)](../how-to-configure-cross-validation-data-splits.md#provide-test-data-preview) or with the [Azure Machine Learning studio](../how-to-use-automated-ml-for-ml-models.md#create-and-run-experiment).
-You can also [test any existing automated ML model (preview) (v1)](../how-to-configure-auto-train.md)), including models from child runs, by providing your own test data or by setting aside a portion of your training data.
+You can also [test any existing automated ML model (preview) (v1)](../how-to-configure-auto-train.md)), including models from child jobs, by providing your own test data or by setting aside a portion of your training data.
## Feature engineering
Enable this setting with:
## <a name="ensemble"></a> Ensemble models
-Automated machine learning supports ensemble models, which are enabled by default. Ensemble learning improves machine learning results and predictive performance by combining multiple models as opposed to using single models. The ensemble iterations appear as the final iterations of your run. Automated machine learning uses both voting and stacking ensemble methods for combining models:
+Automated machine learning supports ensemble models, which are enabled by default. Ensemble learning improves machine learning results and predictive performance by combining multiple models as opposed to using single models. The ensemble iterations appear as the final iterations of your job. Automated machine learning uses both voting and stacking ensemble methods for combining models:
* **Voting**: predicts based on the weighted average of predicted class probabilities (for classification tasks) or predicted regression targets (for regression tasks). * **Stacking**: stacking combines heterogenous models and trains a meta-model based on the output from the individual models. The current default meta-models are LogisticRegression for classification tasks and ElasticNet for regression/forecasting tasks.
machine-learning How To Deploy Mlflow Models https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-deploy-mlflow-models.md
The following diagram demonstrates that with the MLflow deploy API and Azure Mac
## Prerequisites * A machine learning model. If you don't have a trained model, find the notebook example that best fits your compute scenario in [this repo](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/ml-frameworks/using-mlflow) and follow its instructions.
-* [Set up the MLflow Tracking URI to connect Azure Machine Learning](how-to-use-mlflow.md#track-local-runs).
+* [Set up the MLflow Tracking URI to connect Azure Machine Learning](how-to-use-mlflow.md#track-runs-from-your-local-machine-or-remote-compute).
* Install the `azureml-mlflow` package. * This package automatically brings in `azureml-core` of the [The Azure Machine Learning Python SDK](/python/api/overview/azure/ml/install), which provides the connectivity for MLflow to access your workspace. * See which [access permissions you need to perform your MLflow operations with your workspace](../how-to-assign-roles.md#mlflow-operations).
machine-learning How To Use Mlflow https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/machine-learning/v1/how-to-use-mlflow.md
> * [v1](how-to-use-mlflow.md) > * [v2 (current version)](../how-to-use-mlflow-cli-runs.md)
+In this article, learn how to enable [MLflow Tracking](https://mlflow.org/docs/latest/quickstart.html#using-the-tracking-api) to connect Azure Machine Learning as the backend of your MLflow experiments.
-In this article, learn how to enable MLflow's tracking URI and logging API, collectively known as [MLflow Tracking](https://mlflow.org/docs/latest/quickstart.html#using-the-tracking-api), to connect Azure Machine Learning as the backend of your MLflow experiments.
+[MLflow](https://www.mlflow.org) is an open-source library for managing the lifecycle of your machine learning experiments. MLflow Tracking is a component of MLflow that logs and tracks your training run metrics and model artifacts, no matter your experiment's environment--locally on your computer, on a remote compute target, a virtual machine, or an [Azure Databricks cluster](../how-to-use-mlflow-azure-databricks.md).
+See [MLflow and Azure Machine Learning](../concept-mlflow.md) for all supported MLflow and Azure Machine Learning functionality including MLflow Project support (preview) and model deployment.
+
> [!TIP]
-> For a more streamlined experience, see how to [Track experiments with the MLflow SDK or the Azure Machine Learning CLI (v2) (preview)](../how-to-use-mlflow-cli-runs.md)
+> The information in this document is primarily for data scientists and developers who want to monitor the model training process. If you are an administrator interested in monitoring resource usage and events from Azure Machine Learning, such as quotas, completed training runs, or completed model deployments, see [Monitoring Azure Machine Learning](../monitor-azure-machine-learning.md).
-Supported capabilities include:
+> [!NOTE]
+> You can use the [MLflow Skinny client](https://github.com/mlflow/mlflow/blob/master/README_SKINNY.rst) which is a lightweight MLflow package without SQL storage, server, UI, or data science dependencies. This is recommended for users who primarily need the tracking and logging capabilities without importing the full suite of MLflow features including deployments.
-+ Track and log experiment metrics and artifacts in your [Azure Machine Learning workspace](concept-azure-machine-learning-architecture.md#workspace). If you already use MLflow Tracking for your experiments, the workspace provides a centralized, secure, and scalable location to store training metrics and models.
+## Prerequisites
-+ [Submit training jobs with MLflow Projects with Azure Machine Learning backend support (preview)](../how-to-train-mlflow-projects.md). You can submit jobs locally with Azure Machine Learning tracking or migrate your runs to the cloud like via an [Azure Machine Learning Compute](../how-to-create-attach-compute-cluster.md).
+* Install the `azureml-mlflow` package.
+* [Create an Azure Machine Learning Workspace](../quickstart-create-resources.md).
+ * See which [access permissions you need to perform your MLflow operations with your workspace](../how-to-assign-roles.md#mlflow-operations).
-+ Track and manage models in MLflow and Azure Machine Learning model registry.
+* Install and [set up Azure Machine Learning CLI](reference-azure-machine-learning-cli.md) and make sure you install the ml extension.
+* Install and set up [Azure Machine Learning SDK for Python](introduction.md#sdk-v1).
-[MLflow](https://www.mlflow.org) is an open-source library for managing the life cycle of your machine learning experiments. MLflow Tracking is a component of MLflow that logs and tracks your training run metrics and model artifacts, no matter your experiment's environment--locally on your computer, on a remote compute target, a virtual machine, or an [Azure Databricks cluster](../how-to-use-mlflow-azure-databricks.md).
+## Track runs from your local machine or remote compute
-See [MLflow and Azure Machine Learning](concept-mlflow-v1.md) for additional MLflow and Azure Machine Learning functionality integrations.
+Tracking using MLflow with Azure Machine Learning lets you store the logged metrics and artifacts runs that were executed on your local machine into your Azure Machine Learning workspace.
-The following diagram illustrates that with MLflow Tracking, you track an experiment's run metrics and store model artifacts in your Azure Machine Learning workspace.
+### Set up tracking environment
-![mlflow with azure machine learning diagram](./media/how-to-use-mlflow/mlflow-diagram-track.png)
+To track a run that is not running on Azure Machine Learning compute (from now on referred to as *"local compute"*), you need to point your local compute to the Azure Machine Learning MLflow Tracking URI.
-> [!TIP]
-> The information in this document is primarily for data scientists and developers who want to monitor the model training process. If you are an administrator interested in monitoring resource usage and events from Azure Machine Learning, such as quotas, completed training runs, or completed model deployments, see [Monitoring Azure Machine Learning](../monitor-azure-machine-learning.md).
+> [!NOTE]
+> When running on Azure Compute (Azure Notebooks, Jupyter Notebooks hosted on Azure Compute Instances or Compute Clusters) you don't have to configure the tracking URI. It's automatically configured for you.
-> [!NOTE]
-> You can use the [MLflow Skinny client](https://github.com/mlflow/mlflow/blob/master/README_SKINNY.rst) which is a lightweight MLflow package without SQL storage, server, UI, or data science dependencies. This is recommended for users who primarily need the tracking and logging capabilities without importing the full suite of MLflow features including deployments.
+# [Using the Azure ML SDK](#tab/azuremlsdk)
-## Prerequisites
-* Install the `azureml-mlflow` package.
- * This package automatically brings in `azureml-core` of the [The Azure Machine Learning Python SDK](/python/api/overview/azure/ml/install), which provides the connectivity for MLflow to access your workspace.
-* [Create an Azure Machine Learning Workspace](../quickstart-create-resources.md).
- * See which [access permissions you need to perform your MLflow operations with your workspace](../how-to-assign-roles.md#mlflow-operations).
+You can get the Azure ML MLflow tracking URI using the [Azure Machine Learning SDK v1 for Python](introduction.md#sdk-v1). Ensure you have the library `azureml-sdk` installed in the cluster you are using. The following sample gets the unique MLFLow tracking URI associated with your workspace. Then the method [`set_tracking_uri()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_tracking_uri) points the MLflow tracking URI to that URI.
-## Track local runs
+1. Using the workspace configuration file:
-MLflow Tracking with Azure Machine Learning lets you store the logged metrics and artifacts runs that were executed on your local machine into your Azure Machine Learning workspace. For more information, see [How to log and view metrics (v2)](how-to-log-view-metrics.md).
+ ```Python
+ from azureml.core import Workspace
+ import mlflow
+
+ ws = Workspace.from_config()
+ mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
+ ```
-### Set up tracking environment
+ > [!TIP]
+ > You can download the workspace configuration file by:
+ > 1. Navigate to [Azure ML studio](https://ml.azure.com)
+ > 2. Click on the uper-right corner of the page -> Download config file.
+ > 3. Save the file `config.json` in the same directory where you are working on.
-To track a local run, you need to point your local machine to the Azure Machine Learning MLflow Tracking URI.
+1. Using the subscription ID, resource group name and workspace name:
-Import the `mlflow` and [`Workspace`](/python/api/azureml-core/azureml.core.workspace%28class%29) classes to access MLflow's tracking URI and configure your workspace.
+ ```Python
+ from azureml.core import Workspace
+ import mlflow
-In the following code, the `get_mlflow_tracking_uri()` method assigns a unique tracking URI address to the workspace, `ws`, and `set_tracking_uri()` points the MLflow tracking URI to that address.
+ #Enter details of your AzureML workspace
+ subscription_id = '<SUBSCRIPTION_ID>'
+ resource_group = '<RESOURCE_GROUP>'
+ workspace_name = '<AZUREML_WORKSPACE_NAME>'
-```Python
+ ws = Workspace.get(name=workspace_name,
+ subscription_id=subscription_id,
+ resource_group=resource_group)
+
+ mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
+ ```
+
+# [Using an environment variable](#tab/environ)
++
+Another option is to set one of the MLflow environment variables [MLFLOW_TRACKING_URI](https://mlflow.org/docs/latest/tracking.html#logging-to-a-tracking-server) directly in your terminal.
+
+```Azure CLI
+export MLFLOW_TRACKING_URI=$(az ml workspace show --query mlflow_tracking_uri | sed 's/"//g')
+```
+
+>[!IMPORTANT]
+> Make sure you are logged in to your Azure account on your local machine, otherwise the tracking URI returns an empty string. If you are using any Azure ML compute the tracking environment and experiment name is already configured.
+
+# [Building the MLflow tracking URI](#tab/build)
+
+The Azure Machine Learning Tracking URI can be constructed using the subscription ID, region of where the resource is deployed, resource group name and workspace name. The following code sample shows how:
+
+```python
import mlflow
-from azureml.core import Workspace
-ws = Workspace.from_config()
+region = ""
+subscription_id = ""
+resource_group = ""
+workspace_name = ""
-mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
+azureml_mlflow_uri = f"azureml://{region}.api.azureml.ms/mlflow/v1.0/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace_name}"
+mlflow.set_tracking_uri(azureml_mlflow_uri)
```
-### Set experiment name
+> [!NOTE]
+> You can also get this URL by:
+> 1. Navigate to [Azure ML studio](https://ml.azure.com)
+> 2. Click on the uper-right corner of the page -> View all properties in Azure Portal -> MLflow tracking URI.
+> 3. Copy the URI and use it with the method `mlflow.set_tracking_uri`.
-All MLflow runs are logged to the active experiment, which can be set with the MLflow SDK or Azure CLI.
+
-Set the MLflow experiment name with [`set_experiment()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_experiment) command.
+### Set experiment name
+All MLflow runs are logged to the active experiment. By default, runs are logged to an experiment named `Default` that is automatically created for you. To configure the experiment you want to work on use MLflow command [`mlflow.set_experiment()`](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_experiment).
+
```Python experiment_name = 'experiment_with_mlflow' mlflow.set_experiment(experiment_name) ```
+> [!TIP]
+> When submitting jobs using Azure ML SDK, you can set the experiment name using the property `experiment_name` when you submit it. You don't have to configure it on your training script.
+ ### Start training run After you set the MLflow experiment name, you can start your training run with `start_run()`. Then use `log_metric()` to activate the MLflow logging API and begin logging your training run metrics.
with mlflow.start_run() as mlflow_run:
mlflow.log_artifact("helloworld.txt") ```
-## Track remote runs
+For details about how to log metrics, parameters and artifacts in a run using MLflow view [How to log and view metrics](../how-to-log-view-metrics.md).
-Remote runs let you train your models on more powerful computes, such as GPU enabled virtual machines, or Machine Learning Compute clusters. See [Use compute targets for model training](../how-to-set-up-training-targets.md) to learn about different compute options.
+## Track runs running on Azure Machine Learning
-MLflow Tracking with Azure Machine Learning lets you store the logged metrics and artifacts from your remote runs into your Azure Machine Learning workspace. Any run with MLflow Tracking code in it will have metrics logged automatically to the workspace.
+
+Remote runs (jobs) let you train your models in a more robust and repetitive way. They can also leverage more powerful computes, such as Machine Learning Compute clusters. See [Use compute targets for model training](../how-to-set-up-training-targets.md) to learn about different compute options.
+
+When submitting runs, Azure Machine Learning automatically configures MLflow to work with the workspace the run is running in. This means that there is no need to configure the MLflow tracking URI. On top of that, experiments are automatically named based on the details of the experiment submission.
+
+> [!IMPORTANT]
+> When submitting training jobs to Azure Machine Learning, you don't have to configure the MLflow tracking URI on your training logic as it is already configured for you. You don't need to configure the experiment name in your training routine neither.
+
+### Creating a training routine
First, you should create a `src` subdirectory and create a file with your training code in a `train.py` file in the `src` subdirectory. All your training code will go into the `src` subdirectory, including `train.py`.
if __name__ == "__main__":
main() ```
-Load training script to submit an experiment.
-
-```Python
-script_dir = "src"
-training_script = 'train.py'
-with open("{}/{}".format(script_dir,training_script), 'r') as f:
- print(f.read())
-```
+### Configuring the experiment
-In your script, configure your compute and training run environment with the [`Environment`](/python/api/azureml-core/azureml.core.environment.environment) class.
+You will need to use Python to submit the experiment to Azure Machine Learning. In a notebook or Python file, configure your compute and training run environment with the [`Environment`](/python/api/azureml-core/azureml.core.environment.environment) class.
```Python from azureml.core import Environment
The metrics and artifacts from MLflow logging are tracked in your workspace. To
Retrieve run metric using MLflow [get_run()](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.get_run). ```Python
-from mlflow.entities import ViewType
from mlflow.tracking import MlflowClient
-# Retrieve run ID for the last run experiement
-current_experiment=mlflow.get_experiment_by_name(experiment_name)
-runs = mlflow.search_runs(experiment_ids=current_experiment.experiment_id, run_view_type=ViewType.ALL)
-run_id = runs.tail(1)["run_id"].tolist()[0]
- # Use MlFlow to retrieve the run that was just completed client = MlflowClient()
+run_id = mlflow_run.info.run_id
finished_mlflow_run = MlflowClient().get_run(run_id) metrics = finished_mlflow_run.data.metrics
params = finished_mlflow_run.data.params
print(metrics,tags,params) ```
-### Retrieve artifacts with MLFLow
- To view the artifacts of a run, you can use [MlFlowClient.list_artifacts()](https://mlflow.org/docs/latest/python_api/mlflow.tracking.html#mlflow.tracking.MlflowClient.list_artifacts) ```Python
To download an artifact to the current directory, you can use [MLFlowClient.down
client.download_artifacts(run_id, "helloworld.txt", ".") ```
-### Compare and query
+For more details about how to retrieve information from experiments and runs in Azure Machine Learning using MLflow view [Manage experiments and runs with MLflow](../how-to-track-experiments-mlflow.md).
+
+## Compare and query
Compare and query all MLflow runs in your Azure Machine Learning workspace with the following code. [Learn more about how to query runs with MLflow](https://mlflow.org/docs/latest/search-syntax.html#programmatically-searching-runs).
mlflow.autolog()
[Learn more about Automatic logging with MLflow](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.autolog).
-## Manage models
+## Manage models
-Register and track your models with the [Azure Machine Learning model registry](concept-model-management-and-deployment.md#register-package-and-deploy-models-from-anywhere), which supports the MLflow model registry. Azure Machine Learning models are aligned with the MLflow model schema making it easy to export and import these models across different workflows. The MLflow related metadata such as, run ID is also tagged with the registered model for traceability. Users can submit training runs, register, and deploy models produced from MLflow runs.
+Register and track your models with the [Azure Machine Learning model registry](concept-model-management-and-deployment.md#register-package-and-deploy-models-from-anywhere), which supports the MLflow model registry. Azure Machine Learning models are aligned with the MLflow model schema making it easy to export and import these models across different workflows. The MLflow-related metadata, such as run ID, is also tracked with the registered model for traceability. Users can submit training runs, register, and deploy models produced from MLflow runs.
-If you want to deploy and register your production ready model in one step, see [Deploy and register MLflow models](how-to-deploy-mlflow-models.md).
+If you want to deploy and register your production ready model in one step, see [Deploy and register MLflow models](../how-to-deploy-mlflow-models.md).
To register and view a model from a run, use the following steps:
To register and view a model from a run, use the following steps:
In the following example the registered model, `my-model` has MLflow tracking metadata tagged.
- ![register-mlflow-model](./media/how-to-use-mlflow/registered-mlflow-model.png)
+ ![register-mlflow-model](../media/how-to-use-mlflow-cli-runs/registered-mlflow-model.png)
1. Select the **Artifacts** tab to see all the model files that align with the MLflow model schema (conda.yaml, MLmodel, model.pkl).
- ![model-schema](./media/how-to-use-mlflow/mlflow-model-schema.png)
+ ![model-schema](../media/how-to-use-mlflow-cli-runs/mlflow-model-schema.png)
1. Select MLmodel to see the MLmodel file generated by the run.
- ![MLmodel-schema](./media/how-to-use-mlflow/mlmodel-view.png)
+ ![MLmodel-schema](../media/how-to-use-mlflow-cli-runs/mlmodel-view.png)
## Clean up resources
marketplace Marketplace Geo Availability Currencies https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/marketplace/marketplace-geo-availability-currencies.md
Previously updated : 12/03/2021 Last updated : 07/14/2022 # Geographic availability and currency support for the commercial marketplace
openshift Troubleshoot https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/openshift/troubleshoot.md
Title: Troubleshoot Azure Red Hat OpenShift description: Troubleshoot and resolve common issues with Azure Red Hat OpenShift--++ Last updated 05/08/2019
This article details some common issues encountered while creating or managing M
## Retrying the creation of a failed cluster If creating an Azure Red Hat OpenShift cluster using the `az` CLI command fails, retrying the create will continue to fail.
-Use `az openshift delete` to delete the failed cluster, then create an entirely new cluster.
+Use `az aro delete` to delete the failed cluster, then create an entirely new cluster.
## Hidden Azure Red Hat OpenShift cluster resource group
-Currently, the `Microsoft.ContainerService/openShiftManagedClusters` resource that's automatically created by the Azure CLI (`az openshift create` command) is hidden in the Azure portal. In the **Resource group** view, check **Show hidden types** to view the resource group.
+Currently, the `RedHatOpenShift/OpenShiftClusters` resource that's automatically created by the Azure CLI (`az aro create` command) is hidden in the Azure portal. In the **Resource group** view, check **Show hidden types** to view the resource group.
![Screenshot of the hidden type checkbox in the portal](./media/aro-portal-hidden-type.png)
If creating a cluster results in an error that `No registered resource provider
## Next steps -- Try the [Red Hat OpenShift Help Center](https://help.openshift.com/) for more on OpenShift troubleshooting.
+- Visit the [OpenShift documentation](https://docs.openshift.com/container-platform)
-- Find answers to [frequently asked questions about Azure Red Hat OpenShift](openshift-faq.yml).
+- [Azure Support](https://azure.microsoft.com/support/) or [Red Hat Support](https://support.redhat.com/) to open a support case.
+
+- Find answers to [frequently asked questions about Azure Red Hat OpenShift](openshift-faq.yml).
orbital Prepare Network https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/orbital/prepare-network.md
Prerequisites:
- An entire subnet that can be dedicated to Orbital GSaaS in your virtual network in your resource group. Steps:
-1. Delegate a subnet to service named: Microsoft.Orbital/orbitalGateways. Follow instructions here: [Add or remove a subnet delegation in an Azure virtual network](/azure/virtual-network/manage-subnet-delegation).
+1. Delegate a subnet to service named: Microsoft.Orbital/orbitalGateways. Follow instructions here: [Add or remove a subnet delegation in an Azure virtual network](../virtual-network/manage-subnet-delegation.md).
> [!NOTE] > Address range needs to be at least /24 (example 10.0.0.0/23)
orbital Receive Real Time Telemetry https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/orbital/receive-real-time-telemetry.md
To verify that events are being received in your Event Hubs, you can check the g
### Verify content of telemetry data You can enable Event Hubs Capture feature that will automatically deliver the telemetry data to an Azure Blob storage account of your choosing.
-Follow the [instructions to enable Capture](/azure/event-hubs/event-hubs-capture-enable-through-portal). Once enabled, you can check your container and view/download the data.
+Follow the [instructions to enable Capture](../event-hubs/event-hubs-capture-enable-through-portal.md). Once enabled, you can check your container and view/download the data.
## Event Hubs consumer Code: Event Hubs Consumer. Event Hubs documentation provides guidance on how to write simple consumer apps to receive events from your Event Hubs:-- [Python](/azure/event-hubs/event-hubs-python-get-started-send)-- [.NET](/azure/event-hubs/event-hubs-dotnet-standard-getstarted-send)-- [Java](/azure/event-hubs/event-hubs-java-get-started-send)-- [JavaScript](/azure/event-hubs/event-hubs-node-get-started-send)
+- [Python](../event-hubs/event-hubs-python-get-started-send.md)
+- [.NET](../event-hubs/event-hubs-dotnet-standard-getstarted-send.md)
+- [Java](../event-hubs/event-hubs-java-get-started-send.md)
+- [JavaScript](../event-hubs/event-hubs-node-get-started-send.md)
## Understanding telemetry points
The ground station provides telemetry using Avro as a schema. The schema is belo
## Next steps -- [Event Hubs using Python Getting Started](/azure/event-hubs/event-hubs-python-get-started-send)-- [Azure Event Hubs client library for Python code samples](/azure-sdk-for-python/tree/main/sdk/eventhub/azure-eventhub/samples/async_samples)-
+- [Event Hubs using Python Getting Started](../event-hubs/event-hubs-python-get-started-send.md)
+- [Azure Event Hubs client library for Python code samples](/azure-sdk-for-python/tree/main/sdk/eventhub/azure-eventhub/samples/async_samples)
purview Register Scan Power Bi Tenant https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/purview/register-scan-power-bi-tenant.md
Use any of the following deployment checklists during the setup or for troublesh
3. Under **Authentication**, **Allow public client flows** is enabled. 2. Review network configuration and validate if: 1. A [private endpoint for Power BI tenant](/power-bi/enterprise/service-security-private-links) is deployed. (Optional)
- 2. All required [private endpoints for Microsoft Purview](/azure/purview/catalog-private-link-end-to-end) are deployed.
+ 2. All required [private endpoints for Microsoft Purview](./catalog-private-link-end-to-end.md) are deployed.
3. Network connectivity from Self-hosted runtime to Power BI tenant is enabled. 3. Network connectivity from Self-hosted runtime to Microsoft services is enabled through private network.
Now that you've registered your source, follow the below guides to learn more ab
- [Data Estate Insights in Microsoft Purview](concept-insights.md) - [Lineage in Microsoft Purview](catalog-lineage-user-guide.md)-- [Search Data Catalog](how-to-search-catalog.md)
+- [Search Data Catalog](how-to-search-catalog.md)
remote-rendering Troubleshoot https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/remote-rendering/resources/troubleshoot.md
models
## HoloLens2 'Take a Picture' (MRC) does not show any local or remote content
-This problem usually occurs if a project is updated from WMR to OpenXR and the project accessed the [HolographicViewConfiguration Class (Windows.Graphics.Holographic)](https://docs.microsoft.com/uwp/api/windows.graphics.holographic.holographicviewconfiguration?view=winrt-22621) settings. This API is not supported in OpenXR and must not be accessed.
+This problem usually occurs if a project is updated from WMR to OpenXR and the project accessed the [HolographicViewConfiguration Class (Windows.Graphics.Holographic)](/uwp/api/windows.graphics.holographic.holographicviewconfiguration?view=winrt-22621) settings. This API is not supported in OpenXR and must not be accessed.
## Next steps
search Monitor Azure Cognitive Search https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/search/monitor-azure-cognitive-search.md
# Monitoring Azure Cognitive Search
-[Azure Monitor](../azure-monitor/overview.md) is enabled with every subscription to provide monitoring capabilities over all Azure resources, including Cognitive Search. When you sign up for search, Azure Monitor collects [**activity logs**](/azure/azure-monitor/agents/data-sources#azure-activity-log) and [**metrics**](/azure/azure-monitor/essentials/data-platform-metrics) as soon as you start using the service.
+[Azure Monitor](../azure-monitor/overview.md) is enabled with every subscription to provide monitoring capabilities over all Azure resources, including Cognitive Search. When you sign up for search, Azure Monitor collects [**activity logs**](../azure-monitor/agents/data-sources.md#azure-activity-log) and [**metrics**](../azure-monitor/essentials/data-platform-metrics.md) as soon as you start using the service.
Optionally, you can enable diagnostic settings to collect [**resource logs**](../azure-monitor/essentials/resource-logs.md). Resource logs contain detailed information about search service operations that's useful for deeper analysis and investigation.
For REST calls, use an [admin API key](search-security-api-keys.md) and [Postman
## Monitor activity logs
-In Azure Cognitive Search, [**activity logs**](/azure/azure-monitor/agents/data-sources#azure-activity-log) reflect control plane activity, such as service and capacity updates, or API key usage or management. Activity logs are collected [free of charge](/azure/azure-monitor/usage-estimated-costs#pricing-model), with no configuration required. Data retention is 90 days, but you can configure durable storage for longer retention.
+In Azure Cognitive Search, [**activity logs**](../azure-monitor/agents/data-sources.md#azure-activity-log) reflect control plane activity, such as service and capacity updates, or API key usage or management. Activity logs are collected [free of charge](../azure-monitor/usage-estimated-costs.md#pricing-model), with no configuration required. Data retention is 90 days, but you can configure durable storage for longer retention.
1. In the Azure portal, find your search service. From the menu on the left, select **Activity logs** to view the logs for your search service. 1. Entries will often include **Get Admin Key**, one entry for every call that [provided an admin API key](search-security-api-keys.md) on the request. There are no details about the call itself, just a notification that the admin key was used. For insights into content (or data plane) operations, you'll need to enable diagnostic settings and collect resource logs.
-1. See [Azure Monitor activity log](/azure/azure-monitor/essentials/activity-log) for general guidance on working with activity logs.
+1. See [Azure Monitor activity log](../azure-monitor/essentials/activity-log.md) for general guidance on working with activity logs.
1. See [Management REST API reference](/rest/api/searchmanagement/) for control plane activity that might appear in the log.
The following screenshot shows the activity log signals that can be configured i
## Monitor metrics
-In Azure Cognitive Search, [**platform metrics**](/azure/azure-monitor/essentials/data-platform-metrics) measure query performance, indexing volume, and skillset invocation.
+In Azure Cognitive Search, [**platform metrics**](../azure-monitor/essentials/data-platform-metrics.md) measure query performance, indexing volume, and skillset invocation.
-Metrics are collected [free of charge](/azure/azure-monitor/usage-estimated-costs#pricing-model), with no configuration required. Platform metrics are stored for 93 days. However, in the portal you can only query a maximum of 30 days' worth of metrics data on any single chart. This limitation doesn't apply to log-based metrics.
+Metrics are collected [free of charge](../azure-monitor/usage-estimated-costs.md#pricing-model), with no configuration required. Platform metrics are stored for 93 days. However, in the portal you can only query a maximum of 30 days' worth of metrics data on any single chart. This limitation doesn't apply to log-based metrics.
1. In the Azure portal, find your search service. From the menu on the left, under Monitoring, select **Metrics** to open metrics explorer.
-1. See [Tutorial: Analyze metrics for an Azure resource](/azure/azure-monitor/essentials/tutorial-metrics) for general guidance on using metrics explorer.
+1. See [Tutorial: Analyze metrics for an Azure resource](../azure-monitor/essentials/tutorial-metrics.md) for general guidance on using metrics explorer.
-1. See [Microsoft.Search/searchServices (Azure Monitor)](/azure/azure-monitor/essentials/metrics-supported#microsoftsearchsearchservices) for the platform metrics of Azure Cognitive Search.
+1. See [Microsoft.Search/searchServices (Azure Monitor)](../azure-monitor/essentials/metrics-supported.md#microsoftsearchsearchservices) for the platform metrics of Azure Cognitive Search.
1. See [Monitoring data reference](monitor-azure-cognitive-search-data-reference.md) for supplemental descriptions and dimensions.
Metrics are collected [free of charge](/azure/azure-monitor/usage-estimated-cost
## Set up alerts
-Alerts help you to identify and address issues before they become a problem for application users. You can set alerts on [metrics](../azure-monitor/alerts/alerts-metric-overview.md), [resource logs](../azure-monitor/alerts/alerts-unified-log.md), and [activity logs](../azure-monitor/alerts/activity-log-alerts.md). Alerts are billable (see the [Pricing model](/azure/azure-monitor/usage-estimated-costs#pricing-model) for details).
+Alerts help you to identify and address issues before they become a problem for application users. You can set alerts on [metrics](../azure-monitor/alerts/alerts-metric-overview.md), [resource logs](../azure-monitor/alerts/alerts-unified-log.md), and [activity logs](../azure-monitor/alerts/activity-log-alerts.md). Alerts are billable (see the [Pricing model](../azure-monitor/usage-estimated-costs.md#pricing-model) for details).
1. In the Azure portal, find your search service. From the menu on the left, under Monitoring, select **Alerts** to open metrics explorer.
-1. See [Tutorial: Create a metric alert for an Azure resource](/azure/azure-monitor/alerts/tutorial-metric-alert) for general guidance on setting up alerts from metrics explorer.
+1. See [Tutorial: Create a metric alert for an Azure resource](../azure-monitor/alerts/tutorial-metric-alert.md) for general guidance on setting up alerts from metrics explorer.
The following table describes several rules. On a search service, throttling or query latency that exceeds a given threshold are the most commonly used alerts, but you might also want to be notified if a search service is deleted.
The monitoring framework for Azure Cognitive Search is provided by [Azure Monito
+ [Analyze performance in Azure Cognitive Search](search-performance-analysis.md) + [Monitor queries](search-monitor-queries.md)
-+ [Monitor indexer-based indexing](search-howto-monitor-indexers.md)
++ [Monitor indexer-based indexing](search-howto-monitor-indexers.md)
security Secure Deploy https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/security/develop/secure-deploy.md
The focus of the release phase is readying a project for public release. This in
### Check your applicationΓÇÖs performance before you launch
-Check your application's performance before you launch it or deploy updates to production. Use Azure Load Testing to run cloud-based [load tests](/azure/load-testing/) to find performance problems in your application, improve deployment quality, make sure that your application is always up or available, and that your application can handle traffic for your launch.
+Check your application's performance before you launch it or deploy updates to production. Use Azure Load Testing to run cloud-based [load tests](../../load-testing/index.yml) to find performance problems in your application, improve deployment quality, make sure that your application is always up or available, and that your application can handle traffic for your launch.
### Install a web application firewall
Defender for Cloud Standard helps you:
In the following articles, we recommend security controls and activities that can help you design and develop secure applications. - [Design secure applications](secure-design.md)-- [Develop secure applications](secure-develop.md)
+- [Develop secure applications](secure-develop.md)
security Operational Best Practices https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/security/fundamentals/operational-best-practices.md
You can use [Azure Resource Manager](../../azure-resource-manager/templates/synt
**Detail**: [Azure Pipelines](/azure/devops/pipelines/index) is a solution for automating multiple-stage deployment and managing the release process. Create managed continuous deployment pipelines to release quickly, easily, and often. With Azure Pipelines, you can automate your release process, and you can have predefined approval workflows. Deploy on-premises and to the cloud, extend, and customize as required. **Best practice**: Check your app's performance before you launch it or deploy updates to production.
-**Detail**: Run cloud-based [load tests](/azure/load-testing/) to:
+**Detail**: Run cloud-based [load tests](../../load-testing/index.yml) to:
- Find performance problems in your app. - Improve deployment quality.
See [Azure security best practices and patterns](best-practices-and-patterns.md)
The following resources are available to provide more general information about Azure security and related Microsoft * [Azure Security Team Blog](/archive/blogs/azuresecurity/) - for up to date information on the latest in Azure Security
-* [Microsoft Security Response Center](https://technet.microsoft.com/library/dn440717.aspx) - where Microsoft security vulnerabilities, including issues with Azure, can be reported or via email to secure@microsoft.com
+* [Microsoft Security Response Center](https://technet.microsoft.com/library/dn440717.aspx) - where Microsoft security vulnerabilities, including issues with Azure, can be reported or via email to secure@microsoft.com
sentinel Billing https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sentinel/billing.md
Title: Plan costs for Microsoft Sentinel
-description: Learn how to estimate your costs and billing for Microsoft Sentinel by using the pricing calculator and other methods.
+ Title: Plan costs, understand Microsoft Sentinel pricing and billing
+description: Learn how to plan your Microsoft Sentinel costs, and understand pricing and billing using the pricing calculator and other methods.
Previously updated : 02/22/2022 Last updated : 07/14/2022
+#Customer intent: As a SOC manager, plan Microsoft Sentinel costs so I can understand and optimize the costs of my SIEM.
-# Plan costs for Microsoft Sentinel
+# Plan costs and understand Microsoft Sentinel pricing and billing
-Microsoft Sentinel provides intelligent security analytics across your enterprise. The data for this analysis is stored in an Azure Monitor Log Analytics workspace. Microsoft Sentinel is billed based on the volume of data for analysis in Microsoft Sentinel and storage in the Azure Monitor Log Analytics workspace. For more information, see the [Microsoft Sentinel Pricing Page](https://azure.microsoft.com/pricing/details/microsoft-sentinel/).
+As you plan your Microsoft Sentinel deployment, you typically want to understand the Microsoft Sentinel pricing and billing models, so you can optimize your costs. Microsoft Sentinel security analytics data is stored in an Azure Monitor Log Analytics workspace. Billing is based on the volume of that data in Microsoft Sentinel and the Azure Monitor Log Analytics workspace storage. Learn more about [Microsoft Sentinel pricing](https://azure.microsoft.com/pricing/details/microsoft-sentinel/).
-Before you add any resources for the Microsoft Sentinel, use the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/) to help estimate your costs.
+Before you add any resources for Microsoft Sentinel, use the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/) to help estimate your costs.
Costs for Microsoft Sentinel are only a portion of the monthly costs in your Azure bill. Although this article explains how to plan costs and understand the billing for Microsoft Sentinel, you're billed for all Azure services and resources your Azure subscription uses, including Partner services.
Usage beyond these limits will be charged per the pricing listed on the [Microso
During your free trial, find resources for cost management, training, and more on the **News & guides > Free trial** tab in Microsoft Sentinel. This tab also displays details about the dates of your free trial, and how many days you have left until it expires. -
-## Identify data sources
+## Identify data sources and plan costs accordingly
Identify the data sources you're ingesting or plan to ingest to your workspace in Microsoft Sentinel. Microsoft Sentinel allows you to bring in data from one or more data sources. Some of these data sources are free, and others incur charges. For more information, see [Free data sources](#free-data-sources).
-## Estimate costs before using Microsoft Sentinel
+## Estimate costs and billing before using Microsoft Sentinel
If you're not yet using Microsoft Sentinel, you can use the [Microsoft Sentinel pricing calculator](https://azure.microsoft.com/pricing/calculator/?service=azure-sentinel) to estimate potential costs. Enter *Microsoft Sentinel* in the Search box and select the resulting Microsoft Sentinel tile. The pricing calculator helps you estimate your likely costs based on your expected data ingestion and retention.
To see your Azure bill, select **Cost Analysis** in the left navigation of **Cos
The costs shown in the following image are for example purposes only. They're not intended to reflect actual costs.
-![Screenshot showing the Microsoft Sentinel section of a sample Azure bill.](media/billing/sample-bill.png)
Microsoft Sentinel and Log Analytics charges appear on your Azure bill as separate line items based on your selected pricing plan. If you exceed your workspace's Commitment Tier usage in a given month, the Azure bill shows one line item for the Commitment Tier with its associated fixed cost, and a separate line item for the ingestion beyond the Commitment Tier, billed at your same Commitment Tier rate.
The following tabs show how Microsoft Sentinel and Log Analytics costs appear in
#### [Commitment tier](#tab/commitment-tier)
-If you're billed at the commitment tier rate, the following table shows how Microsoft Sentinel and Log Analytics costs appear in the **Service name** and **Meter** columns of your Azure bill.
+If you're billed at the commitment tier rate, this table shows how Microsoft Sentinel and Log Analytics costs appear in the **Service name** and **Meter** columns of your Azure bill.
| Cost description | Service name | Meter | |--|--|--|
If you're billed at the commitment tier rate, the following table shows how Micr
#### [Pay-As-You-Go](#tab/pay-as-you-go)
-If you're billed at Pay-As-You-Go rate, the following table shows how Microsoft Sentinel and Log Analytics costs appear in the **Service name** and **Meter** columns of your Azure bill.
+If you're billed at Pay-As-You-Go rate, this table shows how Microsoft Sentinel and Log Analytics costs appear in the **Service name** and **Meter** columns of your Azure bill.
Cost description | Service name | Meter | |--|--|--|
If you're billed at Pay-As-You-Go rate, the following table shows how Microsoft
#### [Free data meters](#tab/free-data-meters)
-The following table shows how Microsoft Sentinel and Log Analytics costs appear in the **Service name** and **Meter** columns of your Azure bill for free data services. For more information, see [View Data Allocation Benefits](../azure-monitor/usage-estimated-costs.md#view-data-allocation-benefits).
+This table shows how Microsoft Sentinel and Log Analytics costs appear in the **Service name** and **Meter** columns of your Azure bill for free data services. For more information, see [View Data Allocation Benefits](../azure-monitor/usage-estimated-costs.md#view-data-allocation-benefits).
Cost description | Service name | Meter | |--|--|--|
The following table shows how Microsoft Sentinel and Log Analytics costs appear
-For more information on viewing and downloading your Azure bill, see [Azure cost and billing information](../cost-management-billing/understand/download-azure-daily-usage.md).
+Learn how to [view and download your Azure bill](../cost-management-billing/understand/download-azure-daily-usage.md).
-## Costs for other services
+## Costs and pricing for other services
-Microsoft Sentinel integrates with many other Azure services to provide enhanced capabilities. These services include Azure Logic Apps, Azure Notebooks, and bring your own machine learning (BYOML) models. Some of these services may have extra charges. Some of Microsoft Sentinel's data connectors and solutions use Azure Functions for data ingestion, which also has a separate associated cost.
+Microsoft Sentinel integrates with many other Azure services, including Azure Logic Apps, Azure Notebooks, and bring your own machine learning (BYOML) models. Some of these services may have extra charges. Some of Microsoft Sentinel's data connectors and solutions use Azure Functions for data ingestion, which also has a separate associated cost.
-For pricing details for these services, see:
+Learn about pricing for these
- [Automation-Logic Apps pricing](https://azure.microsoft.com/pricing/details/logic-apps/) - [Notebooks pricing](https://azure.microsoft.com/pricing/details/machine-learning/)
Any other services you use could have associated costs.
## Data retention and archived logs costs
-After you enable Microsoft Sentinel on a Log Analytics workspace, you can retain all data ingested into the workspace at no charge for the first 90 days. Retention beyond 90 days is charged per the standard [Log Analytics retention prices](https://azure.microsoft.com/pricing/details/monitor/).
+After you enable Microsoft Sentinel on a Log Analytics workspace:
-You can specify different retention settings for individual data types. For more information, see [Retention by data type](../azure-monitor/logs/data-retention-archive.md#set-retention-and-archive-policy-by-table). You can also enable long-term retention for your data and have access to historical logs by enabling archived logs. Data archive is a low-cost retention layer for archival storage. It's charged based on the volume of data stored and scanned. For more information, see [Configure data retention and archive policies in Azure Monitor Logs](../azure-monitor/logs/data-retention-archive.md). Archived logs are in public preview.
+- You can retain all data ingested into the workspace at no charge for the first 90 days. Retention beyond 90 days is charged per the standard [Log Analytics retention prices](https://azure.microsoft.com/pricing/details/monitor/).
+- You can specify different retention settings for individual data types. Learn about [retention by data type](../azure-monitor/logs/data-retention-archive.md#set-retention-and-archive-policy-by-table).
+- You can also enable long-term retention for your data and have access to historical logs by enabling archived logs. Data archive is a low-cost retention layer for archival storage. It's charged based on the volume of data stored and scanned. Learn how to [configure data retention and archive policies in Azure Monitor Logs](../azure-monitor/logs/data-retention-archive.md). Archived logs are in public preview.
The 90 day retention doesn't apply to basic logs. If you want to extend data retention for basic logs beyond eight days, you can store that data in archived logs for up to seven years.
The 90 day retention doesn't apply to basic logs. If you want to extend data ret
CEF is a supported Syslog events format in Microsoft Sentinel. You can use CEF to bring in valuable security information from various sources to your Microsoft Sentinel workspace. CEF logs land in the CommonSecurityLog table in Microsoft Sentinel, which includes all the standard up-to-date CEF fields.
-Many devices and data sources allow for logging fields beyond the standard CEF schema. These extra fields land in the AdditionalExtensions table. These fields could have higher ingestion volumes than the standard CEF fields, because the event content within these fields can be variable.
+Many devices and data sources support logging fields beyond the standard CEF schema. These extra fields land in the AdditionalExtensions table. These fields could have higher ingestion volumes than the standard CEF fields, because the event content within these fields can be variable.
## Costs that might accrue after resource deletion
For data connectors that include both free and paid data types, you can select w
:::image type="content" source="media/billing/data-types.png" alt-text="Screenshot of the Data connector page for Defender for Cloud Apps, with the free security alerts selected and the paid M C A S Shadow I T Reporting not selected." lightbox="media/billing/data-types.png":::
-For more information about free and paid data sources and connectors, see [Connect data sources](connect-data-sources.md).
+Learn more about how to [connect data sources](connect-data-sources.md), including free and paid data sources.
Data connectors listed as public preview don't generate cost. Data connectors generate cost only once becoming Generally Available (GA). - ## Next steps - [Monitor costs for Microsoft Sentinel](billing-monitor-costs.md)
sentinel Connect Data Sources https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sentinel/connect-data-sources.md
After you onboard Microsoft Sentinel into your workspace, you can use data connectors to start ingesting your data into Microsoft Sentinel. Microsoft Sentinel comes with many out of the box connectors for Microsoft services, which you can integrate in real time. For example, the Microsoft 365 Defender connector is a [service-to-service connector](#service-to-service-integration-for-data-connectors) that integrates data from Office 365, Azure Active Directory (Azure AD), Microsoft Defender for Identity, and Microsoft Defender for Cloud Apps.
-You can also enable out-of-the-box connectors to the broader security ecosystem for non-Microsoft products. For example, you can use [Syslog](#syslog), [Common Event Format (CEF)](#common-event-format-cef), or [REST APIs](#rest-api-integration-using-azure-functions) to connect your data sources with Microsoft Sentinel.
+You can also enable built-in connectors to the broader security ecosystem for non-Microsoft products. For example, you can use [Syslog](#syslog), [Common Event Format (CEF)](#common-event-format-cef), or [REST APIs](#rest-api-integration-for-data-connectors) to connect your data sources with Microsoft Sentinel.
Learn about [types of Microsoft Sentinel data connectors](data-connectors-reference.md) or learn about the [Microsoft Sentinel solutions catalog](sentinel-solutions-catalog.md).
sentinel Data Connectors Reference https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sentinel/data-connectors-reference.md
Add http://localhost:8081/ under **Authorized redirect URIs** while creating [We
| **DCR support** | [Workspace transformation DCR](../azure-monitor/logs/tutorial-ingestion-time-transformations.md) | | **Kusto function alias:** | Morphisec | | **Kusto function URL** | https://github.com/Azure/Azure-Sentinel/blob/master/Solutions/Morphisec/Parsers/Morphisec/ |
-| **Supported by** | [Morphisec](https://support.morphisec.com/support/home) |
+| **Supported by** | [Morphisec](https://www.morphisec.com) |
Follow the instructions to obtain the credentials.
| **DCR support** | [Workspace transformation DCR](../azure-monitor/logs/tutorial-ingestion-time-transformations.md) | | **Kusto function alias:** | WatchGuardFirebox | | **Kusto function URL:** | https://aka.ms/Sentinel-watchguardfirebox-parser |
-| **Vendor documentation/<br>installation instructions** | [Microsoft Sentinel Integration Guide](https://www.watchguard.com/help/docs/help-center/en-US/Content/Integration-Guides/General/Microsoft%20Azure%20Sentinel.html) |
+| **Vendor documentation/<br>installation instructions** | [Microsoft Sentinel Integration Guide](https://www.watchguard.com/help/docs/help-center/en-us/Content/Integration-Guides/General/Microsoft_Azure_Sentinel.html) |
| **Supported by** | [WatchGuard Technologies](https://www.watchguard.com/wgrd-support/overview) |
sentinel Quickstart Onboard https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sentinel/quickstart-onboard.md
Title: 'Quickstart: Onboard in Microsoft Sentinel'
-description: In this quickstart, learn how to on-board Microsoft Sentinel by first enabling it, and then connecting data sources.
+description: In this quickstart, you enable Microsoft Sentinel, and set up data connectors to monitor and protect your environment.
Previously updated : 11/09/2021 Last updated : 07/14/2022
-#Customer intent: As a security operator, connect all my data sources in one place so I can monitor and protect my environment.
+#Customer intent: As a security operator, set up data connectors in one place so I can monitor and protect my environment.
-# Quickstart: On-board Microsoft Sentinel
+# Quickstart: Onboard Microsoft Sentinel
+In this quickstart, you enable Microsoft Sentinel, and then set up data connectors to monitor and protect your environment. After you connect your data sources using data connectors, you choose from a gallery of expertly created workbooks that surface insights based on your data. These workbooks can be easily customized to your needs.
-In this quickstart, learn how to on-board Microsoft Sentinel. To on-board Microsoft Sentinel, you first need to enable Microsoft Sentinel, and then connect your data sources.
-
-Microsoft Sentinel comes with a number of connectors for Microsoft solutions, available out of the box and providing real-time integration, including Microsoft 365 Defender (formerly Microsoft Threat Protection) solutions, Microsoft 365 sources (including Office 365), Azure AD, Microsoft Defender for Identity (formerly Azure ATP), Microsoft Defender for Cloud Apps, security alerts from Microsoft Defender for Cloud, and more. In addition, there are built-in connectors to the broader security ecosystem for non-Microsoft solutions. You can also use Common Event Format (CEF), Syslog or REST-API to connect your data sources with Microsoft Sentinel.
-
-After you connect your data sources, choose from a gallery of expertly created workbooks that surface insights based on your data. These workbooks can be easily customized to your needs.
+Microsoft Sentinel comes with many connectors for Microsoft products, for example, the Microsoft 365 Defender service-to-service connector. You can also enable built-in connectors for non-Microsoft products, for example, Syslog or Common Event Format (CEF). [Learn more about data connectors](connect-data-sources.md).
>[!IMPORTANT]
-> For information about the charges incurred when using Microsoft Sentinel, see [Microsoft Sentinel pricing](https://azure.microsoft.com/pricing/details/azure-sentinel/) and [Microsoft Sentinel costs and billing](billing.md).
+> Review the [Microsoft Sentinel pricing](https://azure.microsoft.com/pricing/details/azure-sentinel/) and [Microsoft Sentinel costs and billing](billing.md) information.
## Global prerequisites
After you connect your data sources, choose from a gallery of expertly created w
- **Log Analytics workspace**. Learn how to [create a Log Analytics workspace](../azure-monitor/logs/quick-create-workspace.md). For more information about Log Analytics workspaces, see [Designing your Azure Monitor Logs deployment](../azure-monitor/logs/workspace-design.md).
- By default, you may have a default of [30 days retention](../azure-monitor/logs/cost-logs.md#legacy-pricing-tiers) in the Log Analytics workspace used for Microsoft Sentinel. To make sure that you can use the full extent of Microsoft Sentinel functionality, raise this to 90 days. For more information, see [Configure data retention and archive policies in Azure Monitor Logs](../azure-monitor/logs/data-retention-archive.md).
+ You may have a default of [30 days retention](../azure-monitor/logs/cost-logs.md#legacy-pricing-tiers) in the Log Analytics workspace used for Microsoft Sentinel. To make sure that you can use all Microsoft Sentinel functionality and features, raise the retention to 90 days. [Configure data retention and archive policies in Azure Monitor Logs](../azure-monitor/logs/data-retention-archive.md).
- **Permissions**:
After you connect your data sources, choose from a gallery of expertly created w
- To use Microsoft Sentinel, you need either **contributor** or **reader** permissions on the resource group that the workspace belongs to.
- - Additional permissions may be needed to connect specific data sources.
+ - You might need other permissions to connect specific data sources.
-- **Microsoft Sentinel is a paid service**. For more information, see [About Microsoft Sentinel](https://go.microsoft.com/fwlink/?linkid=2104058) and the [Microsoft Sentinel pricing page](https://azure.microsoft.com/pricing/details/azure-sentinel/)
+- **Microsoft Sentinel is a paid service**. Review the [pricing options](https://go.microsoft.com/fwlink/?linkid=2104058) and the [Microsoft Sentinel pricing page](https://azure.microsoft.com/pricing/details/azure-sentinel/).
-For more information, see [Pre-deployment activities and prerequisites for deploying Microsoft Sentinel](prerequisites.md).
+- Review the full [pre-deployment activities and prerequisites for deploying Microsoft Sentinel](prerequisites.md).
### Geographical availability and data residency
For more information, see [Pre-deployment activities and prerequisites for deplo
1. Search for and select **Microsoft Sentinel**.
- ![Services search](./media/quickstart-onboard/search-product.png)
+ :::image type="content" source="media/quickstart-onboard/search-product.png" alt-text="Screenshot of searching for a service while enabling Microsoft Sentinel.":::
1. Select **Add**.
-1. Select the workspace you want to use or create a new one. You can run Microsoft Sentinel on more than one workspace, but the data is isolated to a single workspace.
-
- ![Choose a workspace](./media/quickstart-onboard/choose-workspace.png)
-
- >[!NOTE]
- > - Default workspaces created by Microsoft Defender for Cloud will not appear in the list; you can't install Microsoft Sentinel on them.
- >
+1. Select the workspace you want to use or create a new one. You can run Microsoft Sentinel on more than one workspace, but the data is isolated to a single workspace. Note that default workspaces created by Microsoft Defender for Cloud are not shown in the list. You can't install Microsoft Sentinel on these workspaces.
+ :::image type="content" source="media/quickstart-onboard/choose-workspace.png" alt-text="Screenshot of choosing a workspace while enabling Microsoft Sentinel.":::
+
>[!IMPORTANT] > > - Once deployed on a workspace, Microsoft Sentinel **does not currently support** the moving of that workspace to other resource groups or subscriptions.
For more information, see [Pre-deployment activities and prerequisites for deplo
1. Select **Add Microsoft Sentinel**.
-## Connect data sources
-
-Microsoft Sentinel ingests data from services and apps by connecting to the service and forwarding the events and logs to Microsoft Sentinel. For physical and virtual machines, you can install the Log Analytics agent that collects the logs and forwards them to Microsoft Sentinel. For Firewalls and proxies, Microsoft Sentinel installs the Log Analytics agent on a Linux Syslog server, from which the agent collects the log files and forwards them to Microsoft Sentinel.
-
-1. From the main menu, select **Data connectors**. This opens the data connectors gallery.
+## Set up data connectors
-1. The gallery is a list of all the data sources you can connect. Select a data source and then the **Open connector page** button.
+Microsoft Sentinel ingests data from services and apps by connecting to the service and forwarding the events and logs to Microsoft Sentinel.
-1. The connector page shows instructions for configuring the connector, and any additional instructions that may be necessary.
+- For physical and virtual machines, you can install the Log Analytics agent that collects the logs and forwards them to Microsoft Sentinel.
+- For firewalls and proxies, Microsoft Sentinel installs the Log Analytics agent on a Linux Syslog server, from which the agent collects the log files and forwards them to Microsoft Sentinel.
+
+1. From the main menu, select **Data connectors**. This opens the data connectors gallery.
+1. Select a data connector, and then select the **Open connector page** button.
+1. The connector page shows instructions for configuring the connector, and any other instructions that may be necessary.
- For example, if you select the **Azure Active Directory** data source, which lets you stream logs from Azure AD into Microsoft Sentinel, you can select what type of logs you want to get - sign-in logs and/or audit logs. <br> Follow the installation instructions or [refer to the relevant connection guide](data-connectors-reference.md) for more information. For information about data connectors, see [Microsoft Sentinel data connectors](connect-data-sources.md).
+ For example, if you select the **Azure Active Directory** data connector, which lets you stream logs from Azure AD into Microsoft Sentinel, you can select what type of logs you want to get - sign-in logs and/or audit logs. <br>Follow the installation instructions. To learn more, [read the relevant connection guide](data-connectors-reference.md) or learn about [Microsoft Sentinel data connectors](connect-data-sources.md).
1. The **Next steps** tab on the connector page shows relevant built-in workbooks, sample queries, and analytics rule templates that accompany the data connector. You can use these as-is or modify them - either way you can immediately get interesting insights across your data.
-After your data sources are connected, your data starts streaming into Microsoft Sentinel and is ready for you to start working with. You can view the logs in the [built-in workbooks](get-visibility.md) and start building queries in Log Analytics to [investigate the data](investigate-cases.md).
+After you set up your data connectors, your data starts streaming into Microsoft Sentinel and is ready for you to start working with. You can view the logs in the [built-in workbooks](get-visibility.md) and start building queries in Log Analytics to [investigate the data](investigate-cases.md).
-For more information, see [Data collection best practices](best-practices-data.md).
+Review the [data collection best practices](best-practices-data.md).
## Next steps
sentinel Roles https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/sentinel/roles.md
Title: Permissions in Microsoft Sentinel | Microsoft Docs
-description: This article explains how Microsoft Sentinel uses Azure role-based access control to assign permissions to users, and identifies the allowed actions for each role.
+ Title: Roles and permissions in Microsoft Sentinel
+description: Learn how Microsoft Sentinel assigns permissions to users using Azure role-based access control, and identify the allowed actions for each role.
Previously updated : 11/09/2021 Last updated : 07/14/2022 -
-# Permissions in Microsoft Sentinel
+# Roles and permissions in Microsoft Sentinel
+This article explains how Microsoft Sentinel assigns permissions to user roles and identifies the allowed actions for each role. Microsoft Sentinel uses [Azure role-based access control (Azure RBAC)](../role-based-access-control/role-assignments-portal.md) to provide [built-in roles](../role-based-access-control/built-in-roles.md) that can be assigned to users, groups, and services in Azure.
-Microsoft Sentinel uses [Azure role-based access control (Azure RBAC)](../role-based-access-control/role-assignments-portal.md) to provide [built-in roles](../role-based-access-control/built-in-roles.md) that can be assigned to users, groups, and services in Azure.
+Use Azure RBAC to create and assign roles within your security operations team to grant appropriate access to Microsoft Sentinel. The different roles give you fine-grained control over what Microsoft Sentinel users can see and do. Azure roles can be assigned in the Microsoft Sentinel workspace directly (see note below), or in a subscription or resource group that the workspace belongs to, which Microsoft Sentinel inherits.
-Use Azure RBAC to create and assign roles within your security operations team to grant appropriate access to Microsoft Sentinel. The different roles give you fine-grained control over what users of Microsoft Sentinel can see and do. Azure roles can be assigned in the Microsoft Sentinel workspace directly (see note below), or in a subscription or resource group that the workspace belongs to, which Microsoft Sentinel will inherit.
-
-## Roles for working in Microsoft Sentinel
+## Roles and permissions for working in Microsoft Sentinel
### Microsoft Sentinel-specific roles
Use Azure RBAC to create and assign roles within your security operations team t
- [Microsoft Sentinel Reader](../role-based-access-control/built-in-roles.md#microsoft-sentinel-reader) can view data, incidents, workbooks, and other Microsoft Sentinel resources. -- [Microsoft Sentinel Responder](../role-based-access-control/built-in-roles.md#microsoft-sentinel-responder) can, in addition to the above, manage incidents (assign, dismiss, etc.)
+- [Microsoft Sentinel Responder](../role-based-access-control/built-in-roles.md#microsoft-sentinel-responder) can, in addition to the above, manage incidents (assign, dismiss, etc.).
- [Microsoft Sentinel Contributor](../role-based-access-control/built-in-roles.md#microsoft-sentinel-contributor) can, in addition to the above, create and edit workbooks, analytics rules, and other Microsoft Sentinel resources. -- [Microsoft Sentinel Automation Contributor](../role-based-access-control/built-in-roles.md#microsoft-sentinel-automation-contributor) allows Microsoft Sentinel to add playbooks to automation rules. It is not meant for user accounts.
+- [Microsoft Sentinel Automation Contributor](../role-based-access-control/built-in-roles.md#microsoft-sentinel-automation-contributor) allows Microsoft Sentinel to add playbooks to automation rules. It isn't meant for user accounts.
> [!NOTE] >
-> - For best results, these roles should be assigned on the **resource group** that contains the Microsoft Sentinel workspace. This way, the roles will apply to all the resources that are deployed to support Microsoft Sentinel, as those resources should also be placed in that same resource group.
+> - For best results, assign these roles to the **resource group** that contains the Microsoft Sentinel workspace. This way, the roles apply to all the resources that support Microsoft Sentinel, as those resources should also be placed in the same resource group.
>
-> - Another option is to assign the roles directly on the Microsoft Sentinel **workspace** itself. If you do this, you must also assign the same roles on the SecurityInsights **solution resource** in that workspace. You may need to assign them on other resources as well, and you will need to be constantly managing role assignments on resources.
+> - As another option, assign the roles directly to the Microsoft Sentinel **workspace** itself. If you do this, you must also assign the same roles to the SecurityInsights **solution resource** in that workspace. You may need to assign them to other resources as well, and you will need to constantly manage role assignments to resources.
-### Additional roles and permissions
+### Other roles and permissions
-Users with particular job requirements may need to be assigned additional roles or specific permissions in order to accomplish their tasks.
+Users with particular job requirements may need to be assigned other roles or specific permissions in order to accomplish their tasks.
- **Working with playbooks to automate responses to threats**
- Microsoft Sentinel uses **playbooks** for automated threat response. Playbooks are built on **Azure Logic Apps**, and are a separate Azure resource. You might want to assign to specific members of your security operations team the ability to use Logic Apps for Security Orchestration, Automation, and Response (SOAR) operations. You can use the [Logic App Contributor](../role-based-access-control/built-in-roles.md#logic-app-contributor) role to assign explicit permission for using playbooks.
+ Microsoft Sentinel uses **playbooks** for automated threat response. Playbooks are built on **Azure Logic Apps**, and are a separate Azure resource. For specific members of your security operations team, you might want to assign the ability to use Logic Apps for Security Orchestration, Automation, and Response (SOAR) operations. You can use the [Logic App Contributor](../role-based-access-control/built-in-roles.md#logic-app-contributor) role to assign explicit permission for using playbooks.
- **Giving Microsoft Sentinel permissions to run playbooks** Microsoft Sentinel uses a special service account to run incident-trigger playbooks manually or to call them from automation rules. The use of this account (as opposed to your user account) increases the security level of the service.
- In order for an automation rule to run a playbook, this account must be granted explicit permissions to the resource group where the playbook resides. At that point, any automation rule will be able to run any playbook in that resource group. To grant these permissions to this service account, your account must have **Owner** permissions on the resource groups containing the playbooks.
+ For an automation rule to run a playbook, this account must be granted explicit permissions to the resource group where the playbook resides. At that point, any automation rule can run any playbook in that resource group. To grant these permissions to this service account, your account must have **Owner** permissions to the resource groups containing the playbooks.
- **Connecting data sources to Microsoft Sentinel**
- For a user to add **data connectors**, you must assign the user write permissions on the Microsoft Sentinel workspace. Also, note the required additional permissions for each connector, as listed on the relevant connector page.
+ For a user to add **data connectors**, you must assign the user write permissions on the Microsoft Sentinel workspace. Note the required extra permissions for each connector, as listed on the relevant connector page.
- **Guest users assigning incidents**
- If a guest user needs to be able to assign incidents, then in addition to the Microsoft Sentinel Responder role, the user will also need to be assigned the role of [Directory Reader](../active-directory/roles/permissions-reference.md#directory-readers). Note that this role is *not* an Azure role but an **Azure Active Directory** role, and that regular (non-guest) users have this role assigned by default.
+ If a guest user needs to be able to assign incidents, you need to assign the [Directory Reader](../active-directory/roles/permissions-reference.md#directory-readers) to the user, in addition to the Microsoft Sentinel Responder role. Note that the Directory Reader role is *not* an Azure role but an **Azure Active Directory** role, and that regular (non-guest) users have this role assigned by default.
- **Creating and deleting workbooks**
- To create and delete a Microsoft Sentinel workbook, the user requires either the Microsoft Sentinel Contributor role or a lesser Microsoft Sentinel role plus the Azure Monitor role of [Workbook Contributor](../role-based-access-control/built-in-roles.md#workbook-contributor). This role is not necessary for *using* workbooks, but only for creating and deleting.
+ To create and delete a Microsoft Sentinel workbook, the user needs either the Microsoft Sentinel Contributor role or a lesser Microsoft Sentinel role, together with the [Workbook Contributor](../role-based-access-control/built-in-roles.md#workbook-contributor) Azure Monitor role. This role isn't necessary for *using* workbooks, only for creating and deleting.
-### Other roles you might see assigned
+### Azure and Log Analytics roles you might see assigned
-In assigning Microsoft Sentinel-specific Azure roles, you may come across other Azure and Log Analytics Azure roles that may have been assigned to users for other purposes. You should be aware that these roles grant a wider set of permissions that includes access to your Microsoft Sentinel workspace and other resources:
+When you assign Microsoft Sentinel-specific Azure roles, you may come across other Azure and Log Analytics roles that may have been assigned to users for other purposes. Note that these roles grant a wider set of permissions that include access to your Microsoft Sentinel workspace and other resources:
- **Azure roles:** [Owner](../role-based-access-control/built-in-roles.md#owner), [Contributor](../role-based-access-control/built-in-roles.md#contributor), and [Reader](../role-based-access-control/built-in-roles.md#reader). Azure roles grant access across all your Azure resources, including Log Analytics workspaces and Microsoft Sentinel resources. - **Log Analytics roles:** [Log Analytics Contributor](../role-based-access-control/built-in-roles.md#log-analytics-contributor) and [Log Analytics Reader](../role-based-access-control/built-in-roles.md#log-analytics-reader). Log Analytics roles grant access to your Log Analytics workspaces.
-For example, a user who is assigned the **Microsoft Sentinel Reader** role, but not the **Microsoft Sentinel Contributor** role, will still be able to edit items in Microsoft Sentinel if assigned the Azure-level **Contributor** role. Therefore, if you want to grant permissions to a user only in Microsoft Sentinel, you should carefully remove this userΓÇÖs prior permissions, making sure you do not break any needed access to another resource.
+For example, a user assigned the **Microsoft Sentinel Reader** role, but not the **Microsoft Sentinel Contributor** role, can still edit items in Microsoft Sentinel, if that user is also assigned the Azure-level **Contributor** role. Therefore, if you want to grant permissions to a user only in Microsoft Sentinel, carefully remove this userΓÇÖs prior permissions, making sure you do not break any needed access to another resource.
-## Microsoft Sentinel roles and allowed actions
+## Microsoft Sentinel roles, permissions, and allowed actions
-The following table summarizes the Microsoft Sentinel roles and their allowed actions in Microsoft Sentinel.
+This table summarizes the Microsoft Sentinel roles and their allowed actions in Microsoft Sentinel.
| Role | Create and run playbooks| Create and edit analytics rules, workbooks, and other Microsoft Sentinel resources | Manage incidents (dismiss, assign, etc.) | View data, incidents, workbooks, and other Microsoft Sentinel resources | ||||||
The following table summarizes the Microsoft Sentinel roles and their allowed ac
| Microsoft Sentinel Contributor + Logic App Contributor | &#10003; | &#10003; | &#10003; | &#10003; |
-<a name=workbooks></a>* Users with these roles can create and delete workbooks with the additional [Workbook Contributor](../role-based-access-control/built-in-roles.md#workbook-contributor) role. For more information, see [Additional roles and permissions](#additional-roles-and-permissions).
+<a name=workbooks></a>* Users with these roles can create and delete workbooks with the [Workbook Contributor](../role-based-access-control/built-in-roles.md#workbook-contributor) role. Learn about [Other roles and permissions](#other-roles-and-permissions).
-Consult the [Role recommendations](#role-recommendations) section for best practices in which roles to assign to which users in your SOC.
+Review the [role recommendations](#role-and-permissions-recommendations) for which roles to assign to which users in your SOC.
## Custom roles and advanced Azure RBAC -- **Custom roles**. In addition to, or instead of, using Azure built-in roles, you can create Azure custom roles for Microsoft Sentinel. Azure custom roles for Microsoft Sentinel are created the same way you create other [Azure custom roles](../role-based-access-control/custom-roles-rest.md#create-a-custom-role), based on [specific permissions to Microsoft Sentinel](../role-based-access-control/resource-provider-operations.md#microsoftsecurityinsights) and to [Azure Log Analytics resources](../role-based-access-control/resource-provider-operations.md#microsoftoperationalinsights).
+- **Custom roles**. In addition to, or instead of, using Azure built-in roles, you can create Azure custom roles for Microsoft Sentinel. You create Azure custom roles for Microsoft Sentinel in the same way as [Azure custom roles](../role-based-access-control/custom-roles-rest.md#create-a-custom-role), based on [specific permissions to Microsoft Sentinel](../role-based-access-control/resource-provider-operations.md#microsoftsecurityinsights) and to [Azure Log Analytics resources](../role-based-access-control/resource-provider-operations.md#microsoftoperationalinsights).
-- **Log Analytics RBAC**. You can use the Log Analytics advanced Azure role-based access control across the data in your Microsoft Sentinel workspace. This includes both data type-based Azure RBAC and resource-context Azure RBAC. For more information, see:
+- **Log Analytics RBAC**. You can use the Log Analytics advanced Azure RBAC across the data in your Microsoft Sentinel workspace. This includes both data type-based Azure RBAC and resource-context Azure RBAC. To learn more:
- [Manage log data and workspaces in Azure Monitor](../azure-monitor/logs/manage-access.md#azure-rbac)- - [Resource-context RBAC for Microsoft Sentinel](resource-context-rbac.md) - [Table-level RBAC](https://techcommunity.microsoft.com/t5/azure-sentinel/table-level-rbac-in-azure-sentinel/ba-p/965043)
- Resource-context and table-level RBAC are two methods of providing access to specific data in your Microsoft Sentinel workspace without allowing access to the entire Microsoft Sentinel experience.
+ Resource-context and table-level RBAC are two ways to give access to specific data in your Microsoft Sentinel workspace, without allowing access to the entire Microsoft Sentinel experience.
-## Role recommendations
+## Role and permissions recommendations
-After understanding how roles and permissions work in Microsoft Sentinel, you may want to use the following best practice guidance for applying roles to your users:
+After understanding how roles and permissions work in Microsoft Sentinel, you can review these best practices for applying roles to your users:
|User type |Role |Resource group |Description | |||||
After understanding how roles and permissions work in Microsoft Sentinel, you ma
> [!TIP]
-> Additional roles may be required depending on the data you are ingesting or monitoring. For example, Azure AD roles may be required, such as the global admin or security admin roles, to set up data connectors for services in other Microsoft portals.
+> More roles may be required depending on the data you ingest or monitor. For example, Azure AD roles may be required, such as the global admin or security admin roles, to set up data connectors for services in other Microsoft portals.
> ## Next steps
-In this document, you learned how to work with roles for Microsoft Sentinel users and what each role enables users to do.
+In this article, you learned how to work with roles for Microsoft Sentinel users and what each role enables users to do.
-Find blog posts about Azure security and compliance at the [Microsoft Sentinel Blog](https://aka.ms/azuresentinelblog).
+Find blog posts about Azure security and compliance at the [Microsoft Sentinel Blog](https://aka.ms/azuresentinelblog).
service-fabric Service Fabric Cluster Capacity https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/service-fabric/service-fabric-cluster-capacity.md
The capacity needs of your cluster will be determined by your specific workload
By default, local SSD is configured to 64 GB. This can be configured in the MaxDiskQuotaInMB setting of the Diagnostics section of cluster settings.
-For instructions on how to adjust the cluster settings of a cluster hosted in Azure, see [Upgrade the configuration of a cluster in Azure](/azure/service-fabric/service-fabric-cluster-config-upgrade-azure#customize-cluster-settings-using-resource-manager-templates)
+For instructions on how to adjust the cluster settings of a cluster hosted in Azure, see [Upgrade the configuration of a cluster in Azure](./service-fabric-cluster-config-upgrade-azure.md#customize-cluster-settings-using-resource-manager-templates)
-For instructions on how to adjust the cluster settings of a standalone cluster hosted in Windows, see [Upgrade the configuration of a standalone cluster](/azure/service-fabric/service-fabric-cluster-config-upgrade-windows-server#customize-cluster-settings-in-the-clusterconfigjson-file)
+For instructions on how to adjust the cluster settings of a standalone cluster hosted in Windows, see [Upgrade the configuration of a standalone cluster](./service-fabric-cluster-config-upgrade-windows-server.md#customize-cluster-settings-in-the-clusterconfigjson-file)
When choosing other [VM sizes](../virtual-machines/sizes-general.md) for production workloads, keep in mind the following constraints:
service-health Resource Health Overview https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/service-health/resource-health-overview.md
Different resources have their own criteria for when they report that they are d
![Status of *Degraded* for a virtual machine](./media/resource-health-overview/degraded.png)
-For VMSS, visit [Resource health state is "Degraded" in Azure Virtual Machine Scale Set](https://docs.microsoft.com/troubleshoot/azure/virtual-machine-scale-sets/resource-health-degraded-state) page for more information.
+For VMSS, visit [Resource health state is "Degraded" in Azure Virtual Machine Scale Set](/troubleshoot/azure/virtual-machine-scale-sets/resource-health-degraded-state) page for more information.
## History information
You can also access Resource Health by selecting **All services** and typing **r
Check out these references to learn more about Resource Health: - [Resource types and health checks in Azure Resource Health](resource-health-checks-resource-types.md)-- [Frequently asked questions about Azure Resource Health](resource-health-faq.yml)
+- [Frequently asked questions about Azure Resource Health](resource-health-faq.yml)
site-recovery Migrate Tutorial Aws Azure https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/site-recovery/migrate-tutorial-aws-azure.md
This article describes options for migrating Amazon Web Services (AWS) instances to Azure. > [!NOTE]
-> On Linux distributions, only the stock kernels that are part of the distribution minor version release/update are supported. [Learn more](https://docs.microsoft.com/azure/site-recovery/vmware-physical-azure-support-matrix#for-linux).
+> On Linux distributions, only the stock kernels that are part of the distribution minor version release/update are supported. [Learn more](./vmware-physical-azure-support-matrix.md#for-linux).
## Migrate with Azure Migrate
If you're already using Azure Site Recovery, and you want to continue using it f
## Next steps > [!div class="nextstepaction"]
-> [Review common questions](../migrate/resources-faq.md) about Azure Migrate.
+> [Review common questions](../migrate/resources-faq.md) about Azure Migrate.
storage Storage Blob Javascript Get Started https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/storage/blobs/storage-blob-javascript-get-started.md
The [BlobServiceClient](/javascript/api/@azure/storage-blob/blobserviceclient) o
## Connect with Azure AD
-Azure Active Directory (Azure AD) provides the most secure connection by managing the connection identity ([**managed identity**](/azure/active-directory/managed-identities-azure-resources/overview)). This functionality allows you to develop code that doesn't require any secrets (keys or connection strings) stored in the code or environment. Managed identity requires [**setup**](assign-azure-role-data-access.md?tabs=portal) for any identities such as developer (personal) or cloud (hosting) environments. You need to complete the setup before using the code in this section.
+Azure Active Directory (Azure AD) provides the most secure connection by managing the connection identity ([**managed identity**](../../active-directory/managed-identities-azure-resources/overview.md)). This functionality allows you to develop code that doesn't require any secrets (keys or connection strings) stored in the code or environment. Managed identity requires [**setup**](assign-azure-role-data-access.md?tabs=portal) for any identities such as developer (personal) or cloud (hosting) environments. You need to complete the setup before using the code in this section.
After you complete the setup, your Storage resource needs to have one or more of the following roles assigned to the identity resource you plan to connect with:
The following guides show you how to use each of these clients to build your app
- [Samples](../common/storage-samples-javascript.md?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#blob-samples) - [API reference](/javascript/api/@azure/storage-blob/) - [Library source code](https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/storage/storage-blob)-- [Give Feedback](https://github.com/Azure/azure-sdk-for-js/issues)
+- [Give Feedback](https://github.com/Azure/azure-sdk-for-js/issues)
synapse-analytics Implementation Success Evaluate Serverless Sql Pool Design https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/guidance/implementation-success-evaluate-serverless-sql-pool-design.md
Unlike traditional database engines, SQL serverless doesn't rely on its own opti
For reliability, evaluate the following points. - **Availability:** Validate any availability requirements that were identified during the [assessment stage](implementation-success-assess-environment.md). While there aren't any specific SLAs for SQL serverless, there's a 30-minute timeout for query execution. Identify the longest running queries from your assessment and validate them against your serverless SQL design. A 30-minute timeout could break the expectations for your workload and appear as a service problem.-- **Consistency:** SQL serverless is designed primarily for read workloads. So, validate whether all consistency checks have been performed during the data lake data provisioning and formation process. Keep abreast of new capabilities, like [Delta Lake](/azure/synapse-analytics/spark/apache-spark-what-is-delta-lake) open-source storage layer, which provides support for ACID (atomicity, consistency, isolation, and durability) guarantees for transactions. This capability allows you to implement effective [lambda or kappa architectures](/azure/architecture/data-guide/big-data/) to support both streaming and batch use cases. Be sure to evaluate your design for opportunities to apply new capabilities but not at the expense of your project's timeline or cost.
+- **Consistency:** SQL serverless is designed primarily for read workloads. So, validate whether all consistency checks have been performed during the data lake data provisioning and formation process. Keep abreast of new capabilities, like [Delta Lake](../spark/apache-spark-what-is-delta-lake.md) open-source storage layer, which provides support for ACID (atomicity, consistency, isolation, and durability) guarantees for transactions. This capability allows you to implement effective [lambda or kappa architectures](/azure/architecture/data-guide/big-data/) to support both streaming and batch use cases. Be sure to evaluate your design for opportunities to apply new capabilities but not at the expense of your project's timeline or cost.
- **Backup:** Review any disaster recovery requirements that were identified during the assessment. Validate them against your SQL serverless design for recovery. SQL serverless itself doesn't have its own storage layer and that would require handling snapshots and backup copies of your data. The data store accessed by serverless SQL is external (ADLS Gen2). Review the recovery design in your project for these datasets. ### Security
Review your design and check whether you have put in place [best practices and r
## Next steps
-In the [next article](implementation-success-evaluate-spark-pool-design.md) in the *Azure Synapse success by design* series, learn how to evaluate your Spark pool design to identify issues and validate that it meets guidelines and requirements.
+In the [next article](implementation-success-evaluate-spark-pool-design.md) in the *Azure Synapse success by design* series, learn how to evaluate your Spark pool design to identify issues and validate that it meets guidelines and requirements.
synapse-analytics Implementation Success Evaluate Solution Development Environment Design https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/guidance/implementation-success-evaluate-solution-development-environment-design.md
Promoting a workspace to another workspace is a two-part process:
Ensure that integration with Azure DevOps or GitHub is properly set up. Design a repeatable process that releases changes across development, Test/QA/UAT, and production environments.  >[!IMPORTANT]
-> We recommend that sensitive configuration data always be stored securely in [Azure Key Vault](/azure/key-vault/general/basic-concepts). Use Azure Key Vault to maintain a central, secure location for sensitive configuration data, like database connection strings. That way, appropriate services can access configuration data from within each environment.
+> We recommend that sensitive configuration data always be stored securely in [Azure Key Vault](../../key-vault/general/basic-concepts.md). Use Azure Key Vault to maintain a central, secure location for sensitive configuration data, like database connection strings. That way, appropriate services can access configuration data from within each environment.
## Next steps
synapse-analytics Proof Of Concept Playbook Spark Pool https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/guidance/proof-of-concept-playbook-spark-pool.md
Before you begin planning your Spark POC project:
> - Identify executive or business sponsors for a big data and advanced analytics platform project. Secure their support for migration to the cloud. > - Identify availability of technical experts and business users to support you during the POC execution.
-Before you start preparing for the POC project, we recommend you first read the [Apache Spark documentation](/azure/hdinsight/spark/apache-spark-overview).
+Before you start preparing for the POC project, we recommend you first read the [Apache Spark documentation](../../hdinsight/spark/apache-spark-overview.md).
> [!TIP] > If you're new to Spark pools, we recommend you work through the [Perform data engineering with Azure Synapse Apache Spark Pools](/learn/paths/perform-data-engineering-with-azure-synapse-apache-spark-pools/) learning path.
Here's an example of the needed level of specificity in planning:
- Plan an approach to migrate historical data. - **Output C:** We will have tested and determined the data ingestion rate achievable in our environment and can determine whether our data ingestion rate is sufficient to migrate historical data during the available time window. - **Test C1:** Test different approaches of historical data migration. For more information, see [Transfer data to and from Azure](/azure/architecture/data-guide/scenarios/data-transfer).
- - **Test C2:** Identify allocated bandwidth of ExpressRoute and if there is any throttling setup by the infra team. For more information, see [What is Azure ExpressRoute? (Bandwidth options)](/azure/expressroute/expressroute-introduction#bandwidth-options.md).
- - **Test C3:** Test data transfer rate for both online and offline data migration. For more information, see [Copy activity performance and scalability guide](/azure/data-factory/copy-activity-performance#copy-performance-and-scalability-achievable-using-azure-data-factory-and-synapse-pipelines).
- - **Test C4:** Test data transfer from the data lake to the SQL pool by using either ADF, Polybase, or the COPY command. For more information, see [Data loading strategies for dedicated SQL pool in Azure Synapse Analytics](/azure/synapse-analytics/sql-data-warehouse/design-elt-data-loading).
+ - **Test C2:** Identify allocated bandwidth of ExpressRoute and if there is any throttling setup by the infra team. For more information, see [What is Azure ExpressRoute? (Bandwidth options)](../../expressroute/expressroute-introduction.md#bandwidth-options).
+ - **Test C3:** Test data transfer rate for both online and offline data migration. For more information, see [Copy activity performance and scalability guide](../../data-factory/copy-activity-performance.md#copy-performance-and-scalability-achievable-using-azure-data-factory-and-synapse-pipelines).
+ - **Test C4:** Test data transfer from the data lake to the SQL pool by using either ADF, Polybase, or the COPY command. For more information, see [Data loading strategies for dedicated SQL pool in Azure Synapse Analytics](../sql-data-warehouse/design-elt-data-loading.md).
- **Goal D:** We will have tested the data ingestion rate of incremental data loading and will have the data points to estimate the data ingestion and processing time window to the data lake and/or the dedicated SQL pool. - **Output D:** We will have tested the data ingestion rate and can determine whether our data ingestion and processing requirements can be met with the identified approach. - **Test D1:** Test the daily update data ingestion and processing.
Here are some examples of high-level tasks:
resources identified in the POC plan. 1. Load POC dataset: - Make data available in Azure by extracting from the source or by creating sample data in Azure. For more information, see:
- - [Transferring data to and from Azure](/azure/databox/data-box-overview#use-cases)
+ - [Transferring data to and from Azure](../../databox/data-box-overview.md#use-cases)
- [Azure Data Box](https://azure.microsoft.com/services/databox/)
- - [Copy activity performance and scalability guide](/azure/data-factory/copy-activity-performance#copy-performance-and-scalability-achievable-using-adf.md)
- - [Data loading strategies for dedicated SQL pool in Azure Synapse Analytics](/azure/synapse-analytics/sql-data-warehouse/design-elt-data-loading)
+ - [Copy activity performance and scalability guide](../../data-factory/copy-activity-performance.md)
+ - [Data loading strategies for dedicated SQL pool in Azure Synapse Analytics](../sql-data-warehouse/design-elt-data-loading.md)
- [Bulk load data using the COPY statement](../sql-data-warehouse/quickstart-bulk-load-copy-tsql.md?view=azure-sqldw-latest&preserve-view=true) - Test the dedicated connector for the Spark pool and the dedicated SQL pool. 1. Migrate existing code to the Spark pool:
synapse-analytics 1 Design Performance Migration https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/netezza/1-design-performance-migration.md
Previously updated : 05/31/2022 Last updated : 07/12/2022 # Design and performance for Netezza migrations
-This article is part one of a seven part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. This article provides best practices for design and performance.
+This article is part one of a seven-part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. The focus of this article is best practices for design and performance.
## Overview
-Due to end of support from IBM, many existing users of Netezza data warehouse systems want to take advantage of the innovations provided by newer environments such as cloud, IaaS, and PaaS, and to delegate tasks like infrastructure maintenance and platform development to the cloud provider.
+Due to end of support from IBM, many existing users of Netezza data warehouse systems want to take advantage of the innovations provided by modern cloud environments. Infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) cloud environments let you delegate tasks like infrastructure maintenance and platform development to the cloud provider.
-> [!TIP]
-> More than just a database&mdash;the Azure environment includes a comprehensive set of capabilities and tools.
+>[!TIP]
+>More than just a database&mdash;the Azure environment includes a comprehensive set of capabilities and tools.
-Although Netezza and Azure Synapse Analytics are both SQL databases designed to use massively parallel processing (MPP) techniques to achieve high query performance on exceptionally large data volumes, there are some basic differences in approach:
+Although Netezza and Azure Synapse Analytics are both SQL databases that use massively parallel processing (MPP) techniques to achieve high query performance on exceptionally large data volumes, there are some basic differences in approach:
- Legacy Netezza systems are often installed on-premises and use proprietary hardware, while Azure Synapse is cloud-based and uses Azure storage and compute resources. -- Upgrading a Netezza configuration is a major task involving additional physical hardware and potentially lengthy database reconfiguration, or dump and reload. Since storage and compute resources are separate in the Azure environment, these resources can be scaled upwards or downwards independently, leveraging the elastic scaling capability.
+- Upgrading a Netezza configuration is a major task involving extra physical hardware and potentially lengthy database reconfiguration, or dump and reload. Because storage and compute resources are separate in the Azure environment and have elastic scaling capability, those resources can be scaled upwards or downwards independently.
-- Azure Synapse can be paused or resized as required to reduce resource utilization and cost.
+- You can pause or resize Azure Synapse as needed to reduce resource utilization and cost.
Microsoft Azure is a globally available, highly secure, scalable cloud environment that includes Azure Synapse and an ecosystem of supporting tools and capabilities. The next diagram summarizes the Azure Synapse ecosystem. :::image type="content" source="../media/1-design-performance-migration/azure-synapse-ecosystem.png" border="true" alt-text="Chart showing the Azure Synapse ecosystem of supporting tools and capabilities.":::
-> [!TIP]
-> Azure Synapse gives best-of-breed performance and price-performance in independent benchmarks.
-
-Azure Synapse provides best-of-breed relational database performance by using techniques such as massively parallel processing (MPP) and multiple levels of automated caching for frequently used data. See the results of this approach in independent benchmarks such as the one run recently by [GigaOm](https://research.gigaom.com/report/data-warehouse-cloud-benchmark/), which compares Azure Synapse to other popular cloud data warehouse offerings. Customers who have migrated to this environment have seen many benefits including:
+Azure Synapse provides best-of-breed relational database performance by using techniques such as MPP and multiple levels of automated caching for frequently used data. You can see the results of these techniques in independent benchmarks such as the one run recently by [GigaOm](https://research.gigaom.com/report/data-warehouse-cloud-benchmark/), which compares Azure Synapse to other popular cloud data warehouse offerings. Customers who migrate to the Azure Synapse environment see many benefits, including:
- Improved performance and price/performance.
Azure Synapse provides best-of-breed relational database performance by using te
- Lower overall TCO, better cost control, and streamlined operational expenditure (OPEX).
-To maximize these benefits, migrate new or existing data and applications to the Azure Synapse platform. In many organizations, this will include migrating an existing data warehouse from legacy on-premises platforms such as Netezza. At a high level, the basic process includes these steps:
--
-This paper looks at schema migration with a goal of equivalent or better performance of your migrated Netezza data warehouse and data marts on Azure Synapse. This paper applies specifically to migrations from an existing Netezza environment.
+To maximize these benefits, migrate new or existing data and applications to the Azure Synapse platform. In many organizations, migration includes moving an existing data warehouse from a legacy on-premises platform, such as Netezza, to Azure Synapse. At a high level, the migration process includes these steps:
+
+ :::column span="":::
+ &#160;&#160;&#160; **Preparation** &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#129094;
+
+ - Define scope&mdash;what is to be migrated.
+
+ - Build inventory of data and processes for migration.
+
+ - Define data model changes (if any).
+
+ - Define source data extract mechanism.
+
+ - Identify the appropriate Azure and third-party tools and features to be used.
+
+ - Train staff early on the new platform.
+
+ - Set up the Azure target platform.
+
+ :::column-end:::
+ :::column span="":::
+ &#160;&#160;&#160; **Migration** &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#129094;
+
+ - Start small and simple.
+
+ - Automate wherever possible.
+
+ - Leverage Azure built-in tools and features to reduce migration effort.
+
+ - Migrate metadata for tables and views.
+
+ - Migrate historical data to be maintained.
+
+ - Migrate or refactor stored procedures and business processes.
+
+ - Migrate or refactor ETL/ELT incremental load processes.
+
+ :::column-end:::
+ :::column span="":::
+ &#160;&#160;&#160; **Post migration**
+
+ - Monitor and document all stages of the process.
+
+ - Use the experience gained to build a template for future migrations.
+
+ - Re-engineer the data model if required (using new platform performance and scalability).
+
+ - Test applications and query tools.
+
+ - Benchmark and optimize query performance.
+
+ :::column-end:::
+
+This article provides general information and guidelines for performance optimization when migrating a data warehouse from an existing Netezza environment to Azure Synapse. The goal of performance optimization is to achieve the same or better data warehouse performance in Azure Synapse after schema migration.
## Design considerations ### Migration scope
-> [!TIP]
-> Create an inventory of objects to be migrated and document the migration process.
-
-#### Preparation for migration
-
-When migrating from a Netezza environment, there are some specific topics to consider in addition to the more general subjects described in this article.
+When you're preparing to migrate from a Netezza environment, consider the following migration choices.
#### Choose the workload for the initial migration
-Legacy Netezza environments have typically evolved over time to encompass multiple subject areas and mixed workloads. When deciding where to start on an initial migration project, choose an area that can:
+Typically, legacy Netezza environments have evolved over time to encompass multiple subject areas and mixed workloads. When you're deciding where to start on a migration project, choose an area where you'll be able to:
- Prove the viability of migrating to Azure Synapse by quickly delivering the benefits of the new environment. -- Allow the in-house technical staff to gain relevant experience of the processes and tools involved, which can be used in migrations to other areas.
+- Allow your in-house technical staff to gain relevant experience with the processes and tools that they'll use when they migrate other areas.
+
+- Create a template for further migrations that's specific to the source Netezza environment and the current tools and processes that are already in place.
-- Create a template for further migrations specific to the source Netezza environment and the current tools and processes that are already in place.
+A good candidate for an initial migration from a Netezza environment supports the preceding items, and:
-A good candidate for an initial migration from the Netezza environment that would enable the preceding items is typically one that implements a BI/Analytics workload, rather than an online transaction processing (OLTP) workload, with a data model that can be migrated with minimal modification, normally a star or snowflake schema.
+- Implements a BI/Analytics workload rather than an online transaction processing (OLTP) workload.
-The migration data volume for the initial exercise should be large enough to demonstrate the capabilities and benefits of the Azure Synapse environment while quickly demonstrating the value&mdash;typically in the 1-10 TB range.
+- Has a data model, such as a star or snowflake schema, that can be migrated with minimal modification.
-To minimize the risk and reduce implementation time for the initial migration project, confine the scope of the migration to just the data marts. However, this won't address the broader topics such as ETL migration and historical data migration as part of the initial migration project. Address these topics in later phases of the project, once the migrated data mart layer is backfilled with the data and processes required to build them.
+>[!TIP]
+>Create an inventory of objects that need to be migrated, and document the migration process.
-#### Lift and shift as-is versus a phased approach incorporating changes
+The volume of migrated data in an initial migration should be large enough to demonstrate the capabilities and benefits of the Azure Synapse environment but not too large to quickly demonstrate value. A size in the 1-10 terabyte range is typical.
-> [!TIP]
-> "Lift and shift" is a good starting point, even if subsequent phases will implement changes to the data model.
+For your initial migration project, minimize the risk, effort, and migration time so you can quickly see the benefits of the Azure cloud environment. Both the lift-and-shift and phased migration approaches limit the scope of the initial migration to just the data marts and don't address broader migration aspects such as ETL migration and historical data migration. However, you can address those aspects in later phases of the project once the migrated data mart layer is backfilled with data and the required build processes.
-Whatever the drive and scope of the intended migration, there are&mdash;broadly speaking&mdash;two types of migration:
+<a id="lift-and-shift-as-is-versus-a-phased-approach-incorporating-changes"></a>
+#### Lift and shift migration vs. Phased approach
+
+In general, there are two types of migration regardless of the purpose and scope of the planned migration: lift and shift as-is and a phased approach that incorporates changes.
##### Lift and shift
-In this case, the existing data model&mdash;such as a star schema&mdash;is migrated unchanged to the new Azure Synapse platform. The emphasis is on minimizing risk and the migration time required by reducing the work needed to realize the benefits of moving to the Azure cloud environment.
+In a lift and shift migration, an existing data model, like a star schema, is migrated unchanged to the new Azure Synapse platform. This approach minimizes risk and migration time by reducing the work needed to realize the benefits of moving to the Azure cloud environment. Lift and shift migration is a good fit for these scenarios:
+
+- You have an existing Netezza environment with a single data mart to migrate, or
+- You have an existing Netezza environment with data that's already in a well-designed star or snowflake schema, or
+- You're under time and cost pressures to move to a modern cloud environment.
-This is a good fit for existing Netezza environments where a single data mart is being migrated, or where the data is already in a well-designed star or snowflake schema&mdash;or there are other pressures to move to a more modern cloud environment.
+>[!TIP]
+>Lift and shift is a good starting point, even if subsequent phases implement changes to the data model.
-##### Phased approach incorporating modifications
+<a id="phased-approach-incorporating-modifications"></a>
+##### Phased approach that incorporates changes
-In cases where a legacy warehouse has evolved over a long time, you might need to re-engineer to maintain the required performance levels or to support new data, such as Internet of Things (IoT) streams. Migrate to Azure Synapse to get the benefits of a scalable cloud environment as part of the re-engineering process. Migration could include a change in the underlying data model, such as a move from an Inmon model to a data vault.
+If a legacy data warehouse has evolved over a long period of time, you might need to re-engineer it to maintain the required performance levels. You might also have to re-engineer to support new data like Internet of Things (IoT) streams. As part of the re-engineering process, migrate to Azure Synapse to get the benefits of a scalable cloud environment. Migration can also include a change in the underlying data model, such as a move from an Inmon model to a data vault.
-Microsoft recommends moving the existing data model as-is to Azure and using the performance and flexibility of the Azure environment to apply the re-engineering changes, leveraging Azure's capabilities to make the changes without impacting the existing source system.
+Microsoft recommends moving your existing data model as-is to Azure and using the performance and flexibility of the Azure environment to apply the re-engineering changes. That way, you can use Azure's capabilities to make the changes without impacting the existing source system.
#### Use Azure Data Factory to implement a metadata-driven migration
-Automate and orchestrate the migration process by using the capabilities of the Azure environment. This approach minimizes the impact on the existing Netezza environment, which may already be running close to full capacity.
+You can automate and orchestrate the migration process by using the capabilities of the Azure environment. This approach minimizes the performance hit on the existing Netezza environment, which may already be running close to capacity.
-Azure Data Factory is a cloud-based data integration service that allows creation of data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. Using Data Factory, you can create and schedule data-driven workflows&mdash;called pipelines&mdash;to ingest data from disparate data stores. Data Factory can process and transform data by using compute services such as Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Azure Machine Learning.
+[Azure Data Factory](../../../data-factory/introduction.md) is a cloud-based data integration service that supports creating data-driven workflows in the cloud that orchestrate and automate data movement and data transformation. You can use Data Factory to create and schedule data-driven workflows (pipelines) that ingest data from disparate data stores. Data Factory can process and transform data by using compute services such as [Azure HDInsight Hadoop](/azure/hdinsight/hadoop/apache-hadoop-introduction), Spark, Azure Data Lake Analytics, and Azure Machine Learning.
-By creating metadata to list the data tables to be migrated and their location, you can use the Data Factory facilities to manage the migration process.
+When you're planning to use Data Factory facilities to manage the migration process, create metadata that lists all the data tables to be migrated and their location.
### Design differences between Netezza and Azure Synapse
-#### Multiple databases versus a single database and schemas
-
-> [!TIP]
-> Combine multiple databases into a single database in Azure Synapse and use schemas to logically separate the tables.
+As mentioned earlier, there are some basic differences in approach between Netezza and Azure Synapse Analytics databases and these differences are discussed next.
-In a Netezza environment, there are often multiple separate databases for individual parts of the overall environment. For example, there may be a separate database for data ingestion and staging tables, a database for the core warehouse tables, and another database for data marts, sometimes called a semantic layer. Processing these as ETL/ELT pipelines may implement cross-database joins and will move data between these separate databases.
+<a id="multiple-databases-versus-a-single-database-and-schemas"></a>
+#### Multiple databases vs. a single database and schemas
-> [!TIP]
-> Replace Netezza-specific features with Azure Synapse features.
+The Netezza environment often contains multiple separate databases. For instance, there could be separate databases for: data ingestion and staging tables, core warehouse tables, and data marts (sometimes referred to as the semantic layer). ETL or ELT pipeline processes might implement cross-database joins and move data between the separate databases.
-Querying within the Azure Synapse environment is limited to a single database. Schemas are used to separate the tables into logically separate groups. Therefore, we recommend using a series of schemas within the target Azure Synapse database to mimic any separate databases migrated from the Netezza environment. If the Netezza environment already uses schemas, you may need to use a new naming convention to move the existing Netezza tables and views to the new environment&mdash;for example, concatenate the existing Netezza schema and table names into the new Azure Synapse table name and use schema names in the new environment to maintain the original separate database names. Schema consolidation naming can have dots&mdash;however, Azure Synapse Spark may have issues. You can use SQL views over the underlying tables to maintain the logical structures, but there are some potential downsides to this approach:
+In contrast, the Azure Synapse environment contains a single database and uses schemas to separate tables into logically separate groups. We recommend that you use a series of schemas within the target Azure Synapse database to mimic the separate databases migrated from the Netezza environment. If the Netezza environment already uses schemas, you may need to use a new naming convention when you move the existing Netezza tables and views to the new environment. For example, you could concatenate the existing Netezza schema and table names into the new Azure Synapse table name, and use schema names in the new environment to maintain the original separate database names. If schema consolidation naming has dots, Azure Synapse Spark might have issues. Although you can use SQL views on top of the underlying tables to maintain the logical structures, there are potential downsides to that approach:
- Views in Azure Synapse are read-only, so any updates to the data must take place on the underlying base tables. -- There may already be one or more layers of views in existence, and adding an extra layer of views might impact performance and supportability as nested views are difficult to troubleshoot.
+- There may already be one or more layers of views in existence and adding an extra layer of views could affect performance and supportability because nested views are difficult to troubleshoot.
+
+>[!TIP]
+>Combine multiple databases into a single database within Azure Synapse and use schema names to logically separate the tables.
#### Table considerations
-> [!TIP]
-> Use existing indexes to indicate candidates for indexing in the migrated warehouse.
+When you migrate tables between different environments, typically only the raw data and the metadata that describes it physically migrate. Other database elements from the source system, such as indexes, usually aren't migrated because they might be unnecessary or implemented differently in the new environment.
-When migrating tables between different technologies, only the raw data and the metadata that describes it gets physically moved between the two environments. Other database elements from the source system&mdash;such as indexes&mdash;aren't migrated as these may not be needed or may be implemented differently within the new target environment.
+Performance optimizations in the source environment, such as indexes, indicate where you might add performance optimization in the new environment. For example, if queries in the source Netezza environment frequently use zone maps, that suggests that a non-clustered index should be created within Azure Synapse. Other native performance optimization techniques like table replication may be more applicable than straight like-for-like index creation.
-However, it's important to understand where performance optimizations such as indexes have been used in the source environment, as this can indicate where to add performance optimization in the new target environment. For example, if queries in the source Netezza environment frequently use zone maps, it may indicate that a non-clustered index should be created within the migrated Azure Synapse. Other native performance optimization techniques, such as table replication, may be more applicable than a straight "like-for-like" index creation.
+>[!TIP]
+>Existing indexes indicate candidates for indexing in the migrated warehouse.
#### Unsupported Netezza database object types
-> [!TIP]
-> Assess the impact of unsupported data types as part of the preparation phase
-
-Netezza implements some database objects that aren't directly supported in Azure Synapse, but there are methods to achieve the same functionality within the new environment:
+Netezza-specific features can often be replaced by Azure Synapse features. However, some Netezza database objects aren't directly supported in Azure Synapse. The following list of unsupported Netezza database objects describes how you can achieve an equivalent functionality in Azure Synapse.
-- Zone maps: in Netezza, zone maps are automatically created and maintained for some column types and are used at query time to restrict the amount of data to be scanned. Zone maps are created on the following column types:
+- **Zone maps**: in Netezza, zone maps are automatically created and maintained for the following column types and are used at query time to restrict the amount of data to be scanned:
- `INTEGER` columns of length 8 bytes or less.
- - Temporal columns. For instance, `DATE`, `TIME`, and `TIMESTAMP`.
- - `CHAR` columns, if these are part of a materialized view and mentioned in the `ORDER BY` clause.
+ - Temporal columns, such as `DATE`, `TIME`, and `TIMESTAMP`.
+ - `CHAR` columns if they're part of a materialized view and mentioned in the `ORDER BY` clause.
You can find out which columns have zone maps by using the `nz_zonemap` utility, which is part of the NZ Toolkit. Azure Synapse doesn't include zone maps, but you can achieve similar results by using other user-defined index types and/or partitioning. -- Clustered base tables (CBT): in Netezza, CBTs are commonly used for fact tables, which can have billions of records. Scanning such a huge table requires a lot of processing time, since a full table scan might be needed to get relevant records. Organizing records on restrictive CBT allows Netezza to group records in same or nearby extents. This process also creates zone maps that improve the performance by reducing the amount of data to be scanned.
+- **Clustered base tables (CBT)**: in Netezza, CBTs are commonly used for fact tables, which can have billions of records. Scanning such a huge table requires considerable processing time because a full table scan might be needed to get the relevant records. Organizing records on restrictive CBTs allows Netezza to group records in same or nearby extents. This process also creates zone maps that improve the performance by reducing the amount of data that needs to be scanned.
- In Azure Synapse, you can achieve a similar effect by use of partitioning and/or use of other indexes.
+ In Azure Synapse, you can achieve a similar effect by partitioning and/or using other indexes.
-- Materialized views: Netezza supports materialized views and recommends creating one or more of these over large tables having many columns where only a few of those columns are regularly used in queries. The system automatically maintains materialized views when data in the base table is updated.
+- **Materialized views**: Netezza supports materialized views and recommends using one or more materialized views for large tables with many columns if only a few columns are regularly used in queries. Materialized views are automatically refreshed by the system when data in the base table is updated.
Azure Synapse supports materialized views, with the same functionality as Netezza. #### Netezza data type mapping
-Most Netezza data types have a direct equivalent in Azure Synapse. This table shows these data types together with the recommended approach for handling them.
+Most Netezza data types have a direct equivalent in Azure Synapse. The following table shows the recommended approach for mapping Netezza data types to Azure Synapse.
| Netezza Data Type | Azure Synapse Data Type | |--|-|
Most Netezza data types have a direct equivalent in Azure Synapse. This table sh
| TIME WITH TIME ZONE | DATETIMEOFFSET | | TIMESTAMP | DATETIME |
-> [!TIP]
-> Assess the number and type of non-data objects to be migrated as part of the preparation phase.
+>[!TIP]
+>Assess the number and type of unsupported data types during the migration preparation phase.
-There are third-party vendors who offer tools and services to automate migration, including the mapping of data types. If a third-party ETL tool such as Informatica or Talend is already in use in the Netezza environment, those tools can implement any required data transformations.
+Third-party vendors offer tools and services to automate migration, including the mapping of data types. If a [third-party](../../partner/data-integration.md) ETL tool is already in use in the Netezza environment, use that tool to implement any required data transformations.
#### SQL DML syntax differences
-There are a few differences in SQL Data Manipulation Language (DML) syntax between Netezza SQL and Azure Synapse (T-SQL) that you should be aware of during migration:
+SQL DML syntax differences exist between Netezza SQL and Azure Synapse T-SQL. Those differences are discussed in detail in [Minimize SQL issues for Netezza migrations](5-minimize-sql-issues.md#sql-ddl-differences-between-netezza-and-azure-synapse).
-- `STRPOS`: in Netezza, the `STRPOS` function returns the position of a substring within a string. The equivalent function in Azure Synapse is `CHARINDEX`, with the order of the arguments reversed. For example, `SELECT STRPOS('abcdef','def')...` in Netezza is equivalent to `SELECT CHARINDEX('def','abcdef')...` in Azure Synapse.
+- `STRPOS`: in Netezza, the `STRPOS` function returns the position of a substring within a string. The equivalent function in Azure Synapse is `CHARINDEX` with the order of the arguments reversed. For example, `SELECT STRPOS('abcdef','def')...` in Netezza is equivalent to `SELECT CHARINDEX('def','abcdef')...` in Azure Synapse.
-- `AGE`: Netezza supports the `AGE` operator to give the interval between two temporal values, such as timestamps or dates. For example, `SELECT AGE('23-03-1956','01-01-2019') FROM...`. In Azure Synapse, `DATEDIFF` gives the interval. For example, `SELECT DATEDIFF(day, '1956-03-26','2019-01-01') FROM...`. Note the date representation sequence.
+- `AGE`: Netezza supports the `AGE` operator to give the interval between two temporal values, such as timestamps or dates, for example: `SELECT AGE('23-03-1956','01-01-2019') FROM...`. In Azure Synapse, use `DATEDIFF` to get the interval, for example: `SELECT DATEDIFF(day, '1956-03-26','2019-01-01') FROM...`. Note the date representation sequence.
- `NOW()`: Netezza uses `NOW()` to represent `CURRENT_TIMESTAMP` in Azure Synapse. #### Functions, stored procedures, and sequences
-> [!TIP]
-> Assess the number and type of non-data objects to be migrated as part of the preparation phase.
-
-When migrating from a mature legacy data warehouse environment such as Netezza, you must often migrate elements other than simple tables and views to the new target environment. Examples include functions, stored procedures, and sequences.
+When migrating a data warehouse from a mature environment like Netezza, you probably need to migrate elements other than simple tables and views. Check whether tools within the Azure environment can replace the functionality of functions, stored procedures, and sequences because it's usually more efficient to use built-in Azure tools than to recode those elements for Azure Synapse.
-As part of the preparation phase, create an inventory of these objects to be migrated, and define the method of handling them. Assign an appropriate allocation of resources in the project plan.
+As part of your preparation phase, create an inventory of objects that need to be migrated, define a method for handling them, and allocate appropriate resources in your migration plan.
-There may be facilities in the Azure environment that replace the functionality implemented as functions or stored procedures in the Netezza environment. In this case, it's more efficient to use the built-in Azure facilities rather than recoding the Netezza functions.
+[Data integration partners](../../partner/data-integration.md) offer tools and services that can automate the migration of functions, stored procedures, and sequences.
-[Data integration partners](../../partner/data-integration.md) offer tools and services that can automate the migration.
+The following sections further discuss the migration of functions, stored procedures, and sequences.
##### Functions
-As with most database products, Netezza supports system functions and user-defined functions within an SQL implementation. When migrating to another database platform such as Azure Synapse, common system functions are available and can be migrated without change. Some system functions may have slightly different syntax, but the required changes can be automated if so.
+As with most database products, Netezza supports system and user-defined functions within a SQL implementation. When you migrate a legacy database platform to Azure Synapse, common system functions can usually be migrated without change. Some system functions might have a slightly different syntax, but any required changes can be automated.
-For system functions where there's no equivalent, or for arbitrary user-defined functions, recode these using the language(s) available in the target environment. Netezza user-defined functions are coded in nzlua or C++ languages while Azure Synapse uses the popular Transact-SQL language to implement user-defined functions.
+For Netezza system functions or arbitrary user-defined functions that have no equivalent in Azure Synapse, recode those functions using a target environment language. Netezza user-defined functions are coded in nzlua or C++ languages. Azure Synapse uses the Transact-SQL language to implement user-defined functions.
##### Stored procedures
-Most modern database products allow for procedures to be stored within the database. Netezza provides the NZPLSQL language for this purpose. NZPLSQL is based on Postgres PL/pgSQL.
+Most modern database products support storing procedures within the database. Netezza provides the NZPLSQL language, which is based on Postgres PL/pgSQL, for this purpose. A stored procedure typically contains both SQL statements and procedural logic, and returns data or a status.
-A stored procedure typically contains SQL statements and some procedural logic, and may return data or a status.
-
-Azure Synapse Analytics also supports stored procedures using T-SQL. If you must migrate stored procedures, recode these procedures for their new environment.
+Azure Synapse supports stored procedures using T-SQL, so you need to recode any migrated stored procedures in that language.
##### Sequences
-In Netezza, a sequence is a named database object created via `CREATE SEQUENCE` that can provide the unique value via the `NEXT VALUE FOR` method. Use these to generate unique numbers for use as surrogate key values for primary key values.
+In Netezza, a sequence is a named database object created using `CREATE SEQUENCE`. A sequence provides unique numeric values via the `NEXT VALUE FOR` method. You can use the generated unique numbers as surrogate key values for primary keys.
-Within Azure Synapse, there's no `CREATE SEQUENCE`. Sequences are handled via use of [IDENTITY](/sql/t-sql/statements/create-table-transact-sql-identity-property?msclkid=8ab663accfd311ec87a587f5923eaa7b) columns or SQL code to create the next sequence number in a series.
+Azure Synapse doesn't implement `CREATE SEQUENCE`, but you can implement sequences using [IDENTITY](/sql/t-sql/statements/create-table-transact-sql-identity-property) columns or SQL code that generates the next sequence number in a series.
### Extract metadata and data from a Netezza environment #### Data Definition Language (DDL) generation
-> [!TIP]
-> Use Netezza external tables for most efficient data extract.
-
-You can edit existing Netezza CREATE TABLE and CREATE VIEW scripts to create the equivalent definitions with modified data types, if necessary, as described in the previous section. Typically, this involves removing or modifying any extra Netezza-specific clauses such as `ORGANIZE ON`.
+The ANSI SQL standard defines the basic syntax for Data Definition Language (DDL) commands. Some DDL commands, such as `CREATE TABLE` and `CREATE VIEW`, are common to both Netezza and Azure Synapse but have been extended to provide implementation-specific features.
-However, all the information that specifies the current definitions of tables and views within the existing Netezza environment is maintained within system catalog tables. These tables are the best source of this information, as it's guaranteed to be up to date and complete. User-maintained documentation may not be in sync with the current table definitions.
+You can edit existing Netezza `CREATE TABLE` and `CREATE VIEW` scripts to achieve equivalent definitions in Azure Synapse. To do so, you might need to use [modified data types](#netezza-data-type-mapping) and remove or modify Netezza-specific clauses such as `ORGANIZE ON`.
-Access the information in these tables via utilities such as `nz_ddl_table` and generate the `CREATE TABLE` DDL statements for the equivalent tables in Azure Synapse.
+Within the Netezza environment, system catalog tables specify the current table and view definition. Unlike user-maintained documentation, system catalog information is always complete and in sync with current table definitions. By using utilities such as `nz_ddl_table`, you can access system catalog information to generate `CREATE TABLE` DDL statements that create equivalent tables in Azure Synapse.
-Third-party migration and ETL tools also use the catalog information to achieve the same result.
+You can also use [third-party](../../partner/data-integration.md) migration and ETL tools that process system catalog information to achieve similar results.
#### Data extraction from Netezza
-Migrate the raw data from existing Netezza tables into flat delimited files using standard Netezza utilities, such as nzsql, nzunload, and via external tables. Compress these files using gzip and upload them to Azure Blob Storage via AzCopy or by using Azure data transport facilities such as Azure Data Box.
-
-During a migration exercise, extract the data as efficiently as possible. Use the external tables approach as this is the fastest method. Perform multiple extracts in parallel to maximize the throughput for data extraction.
+You can extract raw table data from Netezza tables to flat delimited files, such as CSV files, using standard Netezza utilities like nzsql and nzunload, or via external tables. Then, you can compress the flat delimited files using gzip, and upload the compressed files to Azure Blob Storage using AzCopy or Azure data transport tools like Azure Data Box.
-This is a simple example of an external table extract:
+Extract table data as efficiently as possible. Use the external tables approach because it's the fastest extract method. Perform multiple extracts in parallel to maximize data extraction throughput. The following SQL statement performs an external table extract:
```sql CREATE EXTERNAL TABLE '/tmp/export_tab1.csv' USING (DELIM ',') AS SELECT * from <TABLENAME>; ```
-If sufficient network bandwidth is available, extract data directly from an on-premises Netezza system into Azure Synapse tables or Azure Blob Data Storage by using Azure Data Factory processes or third-party data migration or ETL products.
+If sufficient network bandwidth is available, you can extract data from an on-premises Netezza system directly into Azure Synapse tables or Azure Blob Data Storage. To do so, use Data Factory processes or [third-party](../../partner/data-integration.md) data migration or ETL products.
-Recommended data formats for the extracted data include delimited text files (also called Comma Separated Values or CSV), Optimized Row Columnar (ORC), or Parquet files.
+>[!TIP]
+>Use Netezza external tables for the most efficient data extraction.
-For more information about the process of migrating data and ETL from a Netezza environment, see [Data migration, ETL, and load for Netezza migrations](2-etl-load-migration-considerations.md).
+Extracted data files should contain delimited text in CSV, Optimized Row Columnar (ORC), or Parquet format.
+
+For more information about migrating data and ETL from a Netezza environment, see [Data migration, ETL, and load for Netezza migrations](2-etl-load-migration-considerations.md).
## Performance recommendations for Netezza migrations
-This article provides general information and guidelines about use of performance optimization techniques for Azure Synapse and adds specific recommendations for use when migrating from a Netezza environment.
+The goal of performance optimization is same or better data warehouse performance after migration to Azure Synapse.
### Similarities in performance tuning approach concepts
-> [!TIP]
-> Many Netezza tuning concepts hold true for Azure Synapse.
+Many performance tuning concepts for Netezza databases hold true for Azure Synapse databases. For example:
-When moving from a Netezza environment, many of the performance tuning concepts for Azure Data Warehouse will be remarkably familiar. For example:
+- Use data distribution to collocate data-to-be-joined onto the same processing node.
-- Using data distribution to collocate data to be joined onto the same processing node.
+- Use the smallest data type for a given column to save storage space and accelerate query processing.
-- Using the smallest data type for a given column will save storage space and accelerate query processing.
+- Ensure that columns to be joined have the same data type in order to optimize join processing and reduce the need for data transforms.
-- Ensuring data types of columns to be joined are identical will optimize join processing by reducing the need to transform data for matching.
+- To help the optimizer produce the best execution plan, ensure statistics are up to date.
-- Ensuring statistics are up to date will help the optimizer produce the best execution plan.
+- Monitor performance using built-in database capabilities to ensure that resources are being used efficiently.
-### Differences in performance tuning approach
+>[!TIP]
+>Prioritize familiarity with the tuning options in Azure Synapse at the start of a migration.
-> [!TIP]
-> Prioritize early familiarity with Azure Synapse tuning options in a migration exercise.
+### Differences in performance tuning approach
-This section highlights lower-level implementation differences between Netezza and Azure Synapse for performance tuning.
+This section highlights low-level performance tuning implementation differences between Netezza and Azure Synapse.
#### Data distribution options
-`CREATE TABLE` statements in both Netezza and Azure Synapse allow for specification of a distribution definition&mdash;via `DISTRIBUTE ON` in Netezza, and `DISTRIBUTION =` in Azure Synapse.
+For performance, Azure Synapse was designed with multi-node architecture and uses parallel processing. To optimize table performance, you can define a data distribution option in `CREATE TABLE` statements using `DISTRIBUTION` in Azure Synapse and `DISTRIBUTE ON` in Netezza.
-Compared to Netezza, Azure Synapse provides an additional way to achieve local joins for small table-large table joins (typically dimension table to fact table in a star schema model), which is to replicate the smaller dimension table across all nodes. This ensures that any value of the join key of the larger table will have a matching dimension row locally available. The overhead of replicating the dimension tables is relatively low, provided the tables aren't very large (see [Design guidance for replicated tables](../../sql-data-warehouse/design-guidance-for-replicated-tables.md))&mdash;in which case, the hash distribution approach as described previously is more appropriate. For more information, see [Distributed tables design](../../sql-data-warehouse/sql-data-warehouse-tables-distribute.md).
+Unlike Netezza, Azure Synapse supports local joins between a small table and a large table through small table replication. For instance, consider a small dimension table and a large fact table within a star schema model. Azure Synapse can replicate the smaller dimension table across all nodes to ensure that the value of any join key for the large table has a matching, locally available dimension row. The overhead of dimension table replication is relatively low for a small dimension table. For large dimension tables, a hash distribution approach is more appropriate. For more information about data distribution options, see [Design guidance for using replicated tables](../../sql-data-warehouse/design-guidance-for-replicated-tables.md) and [Guidance for designing distributed tables](../../sql-data-warehouse/sql-data-warehouse-tables-distribute.md).
#### Data indexing
-Azure Synapse provides several user-definable indexing options, but these are different from the system-managed zone maps in Netezza. For more information about the different indexing options, see [table indexes](/azure/sql-data-warehouse/sql-data-warehouse-tables-index).
+Azure Synapse supports several user-definable indexing options that have a different operation and usage compared to system-managed zone maps in Netezza. For more information about the different indexing options in Azure Synapse, see [Indexes on dedicated SQL pool tables](../../sql-data-warehouse/sql-data-warehouse-tables-index.md).
-The existing system-managed zone maps within the source Netezza environment can indicate how the data is currently used. They can identify candidate columns for indexing within the Azure Synapse environment.
+The existing system-managed zone maps within a source Netezza environment provide a useful indication of data usage and the candidate columns for indexing in the Azure Synapse environment.
#### Data partitioning
-In an enterprise data warehouse, fact tables can contain many billions of rows. Partitioning optimizes the maintenance and querying of these tables by splitting them into separate parts to reduce the amount of data processed. The `CREATE TABLE` statement defines the partitioning specification for a table.
+In an enterprise data warehouse, fact tables can contain billions of rows. Partitioning optimizes the maintenance and query performance of these tables by splitting them into separate parts to reduce the amount of data processed. In Azure Synapse, the `CREATE TABLE` statement defines the partitioning specification for a table.
-Only one field per table can be used for partitioning. That field is frequently a date field since many queries are filtered by date or a date range. It's possible to change the partitioning of a table after initial load by recreating the table with the new distribution using the `CREATE TABLE AS` (or CTAS) statement. See [table partitions](/azure/sql-data-warehouse/sql-data-warehouse-tables-partition) for a detailed discussion of partitioning in Azure Synapse.
+You can only use one field per table for partitioning. That field is often a date field because many queries are filtered by date or date range. It's possible to change the partitioning of a table after initial load by using the `CREATE TABLE AS` (CTAS) statement to recreate the table with a new distribution. For a detailed discussion of partitioning in Azure Synapse, see [Partitioning tables in dedicated SQL pool](/azure/sql-data-warehouse/sql-data-warehouse-tables-partition).
#### Data table statistics
-Ensure that statistics on data tables are up to date by building in a [statistics](../../sql/develop-tables-statistics.md) step to ETL/ELT jobs.
+You should ensure that statistics on data tables are up to date by building in a [statistics](../../sql/develop-tables-statistics.md) step to ETL/ELT jobs.
+
+<a id="polybase-for-data-loading"></a>
+#### PolyBase or COPY INTO for data loading
+
+[PolyBase](/sql/relational-databases/polybase) supports efficient loading of large amounts of data to a data warehouse by using parallel loading streams. For more information, see [PolyBase data loading strategy](../../sql/load-data-overview.md).
+
+[COPY INTO](/sql/t-sql/statements/copy-into-transact-sql) also supports high-throughput data ingestion, and:
+
+- Data retrieval from all files within a folder and subfolders.
+
+- Data retrieval from multiple locations in the same storage account. You can specify multiple locations by using comma separated paths.
-#### PolyBase for data loading
+- [Azure Data Lake Storage](../../../storage/blobs/data-lake-storage-introduction.md) (ADLS) and Azure Blob Storage.
-PolyBase is the most efficient method for loading large amounts of data into the warehouse since it can leverage parallel loading streams. For more information, see [PolyBase data loading strategy](../../sql/load-data-overview.md).
+- CSV, PARQUET, and ORC file formats.
#### Use workload management
Use [workload management](../../sql-data-warehouse/sql-data-warehouse-workload-m
## Next steps
-To learn more about ETL and load for Netezza migration, see the next article in this series: [Data migration, ETL, and load for Netezza migrations](2-etl-load-migration-considerations.md).
+To learn about ETL and load for Netezza migration, see the next article in this series: [Data migration, ETL, and load for Netezza migrations](2-etl-load-migration-considerations.md).
synapse-analytics 2 Etl Load Migration Considerations https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md
Previously updated : 05/31/2022 Last updated : 06/01/2022 # Data migration, ETL, and load for Netezza migrations
-This article is part two of a seven part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. This article provides best practices for ETL and load migration.
+This article is part two of a seven-part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. The focus of this article is best practices for ETL and load migration.
## Data migration considerations
This query uses the helper function `FORMAT_TABLE_ACCESS` and the digit at the e
#### What is the best migration approach to minimize risk and impact on users?
-> [!TIP]
-> Migrate the existing model as-is initially, even if a change to the data model is planned in the future.
-
-This question comes up often since companies often want to lower the impact of changes on the data warehouse data model to improve agility. Companies see an opportunity to do so during a migration to modernize their data model. This approach carries a higher risk because it could impact ETL jobs populating the data warehouse from a data warehouse to feed dependent data marts. Because of that risk, it's usually better to redesign on this scale after the data warehouse migration.
+This question comes up frequently because companies may want to lower the impact of changes on the data warehouse data model to improve agility. Companies often see an opportunity to further modernize or transform their data during an ETL migration. This approach carries a higher risk because it changes multiple factors simultaneously, making it difficult to compare the outcomes of the old system versus the new. Making data model changes here could also affect upstream or downstream ETL jobs to other systems. Because of that risk, it's better to redesign on this scale after the data warehouse migration.
-Even if a data model change is an intended part of the overall migration, it's good practice to migrate the existing model as-is to the new environment (Azure Synapse Analytics in this case), rather than do any re-engineering on the new platform during migration. This approach has the advantage of minimizing the impact on existing production systems, while also leveraging the performance and elastic scalability of the Azure platform for one-off re-engineering tasks.
+Even if a data model is intentionally changed as part of the overall migration, it's good practice to migrate the existing model as-is to Azure Synapse, rather than do any re-engineering on the new platform. This approach minimizes the effect on existing production systems, while benefiting from the performance and elastic scalability of the Azure platform for one-off re-engineering tasks.
When migrating from Netezza, often the existing data model is already suitable for as-is migration to Azure Synapse.
+>[!TIP]
+>Migrate the existing model as-is initially, even if a change to the data model is planned in the future.
+ #### Migrate data marts: stay physical or go virtual? > [!TIP]
synapse-analytics 3 Security Access Operations https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/netezza/3-security-access-operations.md
Previously updated : 05/31/2022 Last updated : 06/01/2022 # Security, access, and operations for Netezza migrations
-This article is part three of a seven part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. This article provides best practices for security access operations.
+This article is part three of a seven-part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. The focus of this article is best practices for security access operations.
## Security considerations
synapse-analytics 4 Visualization Reporting https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/netezza/4-visualization-reporting.md
Previously updated : 05/31/2022 Last updated : 07/12/2022 # Visualization and reporting for Netezza migrations
-This article is part four of a seven part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. This article provides best practices for visualization and reporting.
+This article is part four of a seven-part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. The focus of this article is best practices for visualization and reporting.
## Access Azure Synapse Analytics using Microsoft and third-party BI tools
-Almost every organization accesses data warehouses and data marts using a range of BI tools and applications, such as:
+Organizations access data warehouses and data marts using a range of business intelligence (BI) tools and applications. Some examples of BI products are:
-- Microsoft BI tools, like Power BI.
+- Microsoft BI tools, such as Power BI.
-- Office applications, like Microsoft Excel spreadsheets.
+- Office applications, such as Microsoft Excel spreadsheets.
-- Third-party BI tools from various vendors.
+- Third-party BI tools from different vendors.
-- Custom analytic applications that have embedded BI tool functionality inside the application.
+- Custom analytics applications with embedded BI tool functionality.
-- Operational applications that request BI on demand, invoke queries and reports as-a-service on a BI platform, which in turn queries data in the data warehouse or data marts that are being migrated.
+- Operational applications that support on-demand BI by running queries and reports on a BI platform that in turn queries data in a data warehouse or data mart.
- Interactive data science development tools, such as Azure Synapse Spark Notebooks, Azure Machine Learning, RStudio, and Jupyter Notebooks.
-The migration of visualization and reporting as part of a data warehouse migration program means that all the existing queries, reports, and dashboards generated and issued by these tools and applications need to run on Azure Synapse and yield the same results as they did in the original data warehouse prior to migration.
+If you migrate visualization and reporting as part of your data warehouse migration, all existing queries, reports, and dashboards generated by BI products need to run in the new environment. Your BI products must yield the same results on Azure Synapse as they did in your legacy data warehouse environment.
-> [!TIP]
-> Existing users, user groups, roles and assignments of access security privileges need to be migrated first for migration of reports and visualizations to succeed.
+For consistent results after migration, all BI tools and application dependencies must work after you've migrated your data warehouse schema and data to Azure Synapse. The dependencies include less visible aspects, such as access and security. When you address access and security, ensure that you migrate:
-To make that happen, everything that BI tools and applications depend on still needs to work once you migrate your data warehouse schema and data to Azure Synapse. That includes the obvious and the not so obvious&mdash;such as access and security. Access and security are important considerations for data access in the migrated system, and are specifically discussed in [another guide](3-security-access-operations.md) in this series. When you address access and security, ensure that:
+- Authentication so users can sign into the data warehouse and data mart databases on Azure Synapse.
-- Authentication is migrated to let users sign in to the data warehouse and data mart databases on Azure Synapse.
+- All users to Azure Synapse.
-- All users are migrated to Azure Synapse.
+- All user groups to Azure Synapse.
-- All user groups are migrated to Azure Synapse.
+- All roles to Azure Synapse.
-- All roles are migrated to Azure Synapse.
+- All authorization privileges governing access control to Azure Synapse.
-- All authorization privileges governing access control are migrated to Azure Synapse.--- User, role, and privilege assignments are migrated to mirror what you had on your existing data warehouse before migration. For example:
+- User, role, and privilege assignments to mirror what you had in your existing data warehouse before migration. For example:
- Database object privileges assigned to roles - Roles assigned to user groups - Users assigned to user groups and/or roles
-> [!TIP]
-> Communication and business user involvement is critical to success.
+Access and security are important considerations for data access in the migrated system and are discussed in more detail in [Security, access, and operations for Netezza migrations](3-security-access-operations.md).
+
+>[!TIP]
+>Existing users, user groups, roles, and assignments of access security privileges need to be migrated first for migration of reports and visualizations to succeed.
+
+Migrate all required data to ensure that the reports and dashboards that query data in the legacy environment produce the same results in Azure Synapse.
-In addition, all the required data needs to be migrated to ensure the same results appear in the same reports and dashboards that now query data on Azure Synapse. User expectation will undoubtedly be that migration is seamless and there will be no surprises that destroy their confidence in the migrated system on Azure Synapse. So, this is an area where you must take extreme care and communicate as much as possible to allay any fears in your user base. Their expectations are that:
+Business users will expect a seamless migration, with no surprises that destroy their confidence in the migrated system on Azure Synapse. Take care to allay any fears that your users might have through good communication. Your users will expect that:
-- Table structure will be the same if directly referred to in queries.
+- Table structure remains the same when directly referred to in queries.
-- Table and column names remain the same if directly referred to in queries; for instance, so that calculated fields defined on columns in BI tools don't fail when aggregate reports are produced.
+- Table and column names remain the same when directly referred to in queries. For instance, calculated fields defined on columns in BI tools shouldn't fail when aggregate reports are produced.
- Historical analysis remains the same. -- Data types should, if possible, remain the same.
+- Data types remain the same, if possible.
- Query behavior remains the same. -- ODBC/JDBC drivers are tested to make sure nothing has changed in terms of query behavior.
+- ODBC/JDBC drivers are tested to ensure that query behavior remains the same.
-> [!TIP]
-> Views and SQL queries using proprietary SQL query extensions are likely to result in incompatibilities that impact BI reports and dashboards.
+>[!TIP]
+>Communication and business user involvement are critical to success.
-If BI tools are querying views in the underlying data warehouse or data mart database, then will these views still work? You might think yes, but if there are proprietary SQL extensions specific to your legacy data warehouse DBMS in these views that have no equivalent in Azure Synapse, you'll need to know about them and find a way to resolve them.
+If BI tools query views in the underlying data warehouse or data mart database, will those views still work after the migration? Some views might not work if there are proprietary SQL extensions specific to your legacy data warehouse DBMS that have no equivalent in Azure Synapse. If so, you need to know about those incompatibilities and find a way to resolve them.
-Other issues like the behavior of nulls or data type variations across DBMS platforms need to be tested, in case they cause slightly different calculation results. Obviously, you want to minimize these issues and take all necessary steps to shield business users from any kind of impact. Depending on your legacy data warehouse system (such as Netezza), there are [tools](../../partner/data-integration.md) that can help hide these differences so that BI tools and applications are kept unaware of them and can run unchanged.
+>[!TIP]
+>Views and SQL queries using proprietary SQL query extensions are likely to result in incompatibilities that impact BI reports and dashboards.
-> [!TIP]
-> Use repeatable tests to ensure reports, dashboards, and other visualizations migrate successfully.
+Other issues, like the behavior of `NULL` values or data type variations across DBMS platforms, need to be tested to ensure that even slight differences don't exist in calculation results. Minimize those issues and take all necessary steps to shield business users from being affected by them. Depending on your legacy data warehouse environment, [third-party](../../partner/data-integration.md) tools that can help hide the differences between the legacy and new environments so that BI tools and applications run unchanged.
-Testing is critical to visualization and report migration. You need a test suite and agreed-on test data to run and rerun tests in both environments. A test harness is also useful, and a few are mentioned later in this guide. In addition, it's also important to have significant business involvement in this area of migration to keep confidence high and to keep them engaged and part of the project.
+Testing is critical to visualization and report migration. You need a test suite and agreed-on test data to run and rerun tests in both environments. A test harness is also useful, and a few are mentioned in this guide. Also, it's important to involve business users in the testing aspect of the migration to keep confidence high and to keep them engaged and part of the project.
-Finally, you may also be thinking about switching BI tools. For example, you might want to [migrate to Power BI](/power-bi/guidance/powerbi-migration-overview). The temptation is to do all of this at the same time, while migrating your schema, data, ETL processing, and more. However, to minimize risk, it's better to migrate to Azure Synapse first and get everything working before undertaking further modernization.
+>[!TIP]
+>Use repeatable tests to ensure reports, dashboards, and other visualizations migrate successfully.
-If your existing BI tools run on premises, ensure that they're able to connect to Azure Synapse through your firewall to run comparisons against both environments. Alternatively, if the vendor of your existing BI tools offers their product on Azure, you can try it there. The same applies for applications running on premises that embed BI or that call your BI server on-demand, requesting a "headless report" with data returned in XML or JSON, for example.
+You might be thinking about switching BI tools, for example to [migrate to Power BI](/power-bi/guidance/powerbi-migration-overview). The temptation is to make such changes at the same time you're migrating your schema, data, ETL processing, and more. However, to minimize risk, it's better to migrate to Azure Synapse first and get everything working before undertaking further modernization.
-There's a lot to think about here, so let's look at all this in more detail.
+If your existing BI tools run on-premises, ensure they can connect to Azure Synapse through your firewall so you're able to run comparisons against both environments. Alternatively, if the vendor of your existing BI tools offers their product on Azure, you can try it there. The same applies for applications running on-premises that embed BI or call your BI server on demand, for example by requesting a "headless report" with XML or JSON data.
-> [!TIP]
-> A lift and shift data warehouse migration is likely to minimize any disruption to reports, dashboards, and other visualizations.
+There's a lot to think about here, so let's take a closer look.
-## Minimize the impact of data warehouse migration on BI tools and reports by using data virtualization
+<a id="minimize-the-impact-of-data-warehouse-migration-on-bi-tools-and-reports-by-using-data-virtualization"></a>
+## Use data virtualization to minimize the impact of migration on BI tools and reports
-> [!TIP]
-> Data virtualization allows you to shield business users from structural changes during migration so that they remain unaware of changes.
+During migration, you might be tempted to fulfill long-term requirements like opening business requests, adding missing data, or implementing new features. However, such changes can affect BI tool access to your data warehouse, especially if the change involves structural changes to your data model. If you want to adopt an agile data modeling technique or implement structural changes, do so *after* migration.
-The temptation during data warehouse migration to the cloud is to take the opportunity to make changes during the migration to fulfill long-term requirements, such as opening business requests, missing data, new features, and more. However, these changes can affect the BI tools accessing your data warehouse, especially if it involves structural changes in your data model. If you want to adopt an agile data modeling technique or implement structural changes, do so *after* migration.
-
-One way in which you can minimize the impact of things like schema changes on BI tools is to introduce data virtualization between BI tools and your data warehouse and data marts. The following diagram shows how data virtualization can hide the migration from users.
+One way to minimize the effect of schema changes or other structural changes on your BI tools is to introduce data virtualization between the BI tools and your data warehouse and data marts. The following diagram shows how data virtualization can hide a migration from users.
:::image type="content" source="../media/4-visualization-reporting/migration-data-virtualization.png" border="true" alt-text="Diagram showing how to hide the migration from users through data virtualization.":::
-This breaks the dependency between business users utilizing self-service BI tools and the physical schema of the underlying data warehouse and data marts that are being migrated.
-
-> [!TIP]
-> Schema alterations to tune your data model for Azure Synapse can be hidden from users.
+Data virtualization breaks the dependency between business users utilizing self-service BI tools and the physical schema of the underlying data warehouse and data marts that are being migrated.
-By introducing data virtualization, any schema alterations made during data warehouse and data mart migration to Azure Synapse (to optimize performance, for example) can be hidden from business users because they only access virtual tables in the data virtualization layer. If structural changes are needed, only the mappings between the data warehouse or data marts, and any virtual tables would need to be changed so that users remain unaware of those changes and unaware of the migration. [Microsoft partners](../../partner/data-integration.md) provide useful data virtualization software.
+>[!TIP]
+>Data virtualization allows you to shield business users from structural changes during migration so they remain unaware of those changes. Structural changes include schema alterations that tune your data model for Azure Synapse.
-## Identify high priority reports to migrate first
+With data virtualization, any schema alterations made during a migration to Azure Synapse, for example to optimize performance, can be hidden from business users because they only have access to virtual tables in the data virtualization layer. And, if you make structural changes, you only need to update the mappings between the data warehouse or data marts and any virtual tables. With data virtualization, users remain unaware of structural changes. [Microsoft partners](../../partner/data-integration.md) provide data virtualization software.
-A key question when migrating your existing reports and dashboards to Azure Synapse is which ones to migrate first. Several factors can drive the decision. For example:
+## Identify high-priority reports to migrate first
-- Business value
+A key question when migrating your existing reports and dashboards to Azure Synapse is which ones to migrate first. Several factors might drive that decision, such as:
- Usage
+- Business value
+ - Ease of migration - Data migration strategy
-These factors are discussed in more detail later in this article.
+The following sections discuss these factors.
-Whatever the decision is, it must involve the business, since they produce the reports and dashboards, and consume the insights these artifacts provide in support of the decisions that are made around your business. That said, if most reports and dashboards can be migrated seamlessly, with minimal effort, and offer up like-for-like results, simply by pointing your BI tool(s) at Azure Synapse, instead of your legacy data warehouse system, then everyone benefits.
+Whatever your decision, it must involve your business users because they produce the reports, dashboards, and other visualizations, and make business decisions based on insights from those items. Everyone benefits when you can:
+
+- Migrate reports and dashboards seamlessly,
+- Migrate reports and dashboards with minimal effort, and
+- Point your BI tool(s) at Azure Synapse instead of your legacy data warehouse system, and get like-for-like reports, dashboards, and other visualizations.
### Migrate reports based on usage
-Usage is interesting, since it's an indicator of business value. Reports and dashboards that are never used clearly aren't contributing to supporting any decisions and don't currently offer any value. So, do you have any mechanism for finding out which reports and dashboards are currently not used? Several BI tools provide statistics on usage, which would be an obvious place to start.
+Usage is often an indicator of business value. Unused reports and dashboards clearly don't contribute to business decisions or offer current value. If you don't have a way to find out which reports and dashboards are unused, you can use one of the several BI tools that provide usage statistics.
-If your legacy data warehouse has been up and running for many years, there's a high chance you could have hundreds, if not thousands, of reports in existence. In these situations, usage is an important indicator of the business value of a specific report or dashboard. In that sense, it's worth compiling an inventory of the reports and dashboards you have and defining their business purpose and usage statistics.
+If your legacy data warehouse has been up and running for years, there's a good chance you have hundreds, if not thousands, of reports in existence. It's worth compiling an inventory of reports and dashboards and identifying their business purpose and usage statistics.
-For those that aren't used at all, it's an appropriate time to seek a business decision, to determine if it's necessary to decommission those reports to optimize your migration efforts. A key question worth asking when deciding to decommission unused reports is: are they unused because people don't know they exist, or is it because they offer no business value, or have they been superseded by others?
+For unused reports, determine whether to decommission them to reduce your migration effort. A key question when deciding whether to decommission an unused report is whether the report is unused because people don't know it exists, because it offers no business value, or because it's been superseded by another report.
### Migrate reports based on business value
-Usage on its own isn't a clear indicator of business value. There needs to be a deeper business context to determine the value to the business. In an ideal world, we would like to know the contribution of the insights produced in a report to the bottom line of the business. That's exceedingly difficult to determine, since every decision made, and its dependency on the insights in a specific report, would need to be recorded along with the contribution that each decision makes to the bottom line of the business. You would also need to do this over time.
+Usage alone isn't always a good indicator of business value. You might want to consider the extent to which a report's insights contribute to business value. One way to do that is to evaluate the profitability of every business decision that relies on the report and the extent of the reliance. However, that information is unlikely to be readily available in most organizations.
-This level of detail is unlikely to be available in most organizations. One way in which you can get deeper on business value to drive migration order is to look at alignment with business strategy. A business strategy set by your executive typically lays out strategic business objectives, key performance indicators (KPIs), KPI targets that need to be achieved, and who is accountable for achieving them. In that sense, classifying your reports and dashboards by strategic business objectives&mdash;for example, reduce fraud, improve customer engagement, and optimize business operations&mdash;will help understand business purpose and show what objective(s), specific reports, and dashboards these are contributing to. Reports and dashboards associated with high priority objectives in the business strategy can then be highlighted so that migration is focused on delivering business value in a strategic high priority area.
+Another way to evaluate business value is to look at the alignment of a report with business strategy. The business strategy set by your executive typically lays out strategic business objectives (SBOs), key performance indicators (KPIs), KPI targets that need to be achieved, and who is accountable for achieving them. You can classify a report by which SBOs the report contributes to, such as fraud reduction, improved customer engagement, and optimized business operations. Then, you can prioritize for migration the reports and dashboards that are associated with high-priority objectives. In this way, the initial migration can deliver business value in a strategic area.
-It's also worthwhile to classify reports and dashboards as operational, tactical, or strategic, to understand the level in the business where they're used. Delivering strategic business objectives requires contribution at all these levels. Knowing which reports and dashboards are used, at what level, and what objectives they're associated with helps to focus migration on high priority business value that will drive the company forward. Business contribution of reports and dashboards is needed to understand this, perhaps like what is shown in the following **business strategy objective** table.
+Another way to evaluate business value is to classify reports and dashboards as operational, tactical, or strategic to identify at which business level they're used. SBOs require contributions at all these levels. By knowing which reports and dashboards are used, at what level, and what objectives they're associated with, you're able to focus the initial migration on high-priority business value. You can use the following *business strategy objective* table to evaluate reports and dashboards.
-| **Level** | **Report / dashboard name** | **Business purpose** | **Department used** | **Usage frequency** | **Business priority** |
+| Level | Report / dashboard name | Business purpose | Department used | Usage frequency | Business priority |
|-|-|-|-|-|-| | **Strategic** | | | | | | | **Tactical** | | | | | | | **Operational** | | | | | |
-While this may seem too time consuming, you need a mechanism to understand the contribution of reports and dashboards to business value, whether you're migrating or not. Catalogs like Azure Data Catalog are becoming very important because they give you the ability to catalog reports and dashboards, automatically capture the metadata associated with them, and let business users tag and rate them to help you understand business value.
+Metadata discovery tools like [Azure Data Catalog](../../../data-catalog/overview.md) let business users tag and rate data sources to enrich the metadata for those data sources to assist with their discovery and classification. You can use the metadata for a report or dashboard to help you understand its business value. Without such tools, understanding the contribution of reports and dashboards to business value is likely to be a time-consuming task, whether you're migrating or not.
### Migrate reports based on data migration strategy
-> [!TIP]
-> Data migration strategy could also dictate which reports and visualizations get migrated first.
+If your migration strategy is based on migrating data marts first, then the order of data mart migration will affect which reports and dashboards are migrated first. If your strategy is based on business value, the order in which you migrate data marts to Azure Synapse will reflect business priorities. Metadata discovery tools can help you implement your strategy by showing you which data mart tables supply data for which reports.
-If your migration strategy is based on migrating data marts first, the order of data mart migration will have a bearing on which reports and dashboards can be migrated first to run on Azure Synapse. Again, this is likely to be a business-value-related decision. Prioritizing which data marts are migrated first reflects business priorities. Metadata discovery tools can help you here by showing you which reports rely on data in which data mart tables.
+>[!TIP]
+>Your data migration strategy affects which reports and visualizations get migrated first.
-## Migration incompatibility issues that can impact reports and visualizations
+## Migration incompatibility issues that can affect reports and visualizations
-When it comes to migrating to Azure Synapse, there are several things that can impact the ease of migration for reports, dashboards, and other visualizations. The ease of migration is affected by:
+BI tools produce reports, dashboards, and other visualizations by issuing SQL queries that access physical tables and/or views in your data warehouse or data mart. When you migrate your legacy data warehouse to Azure Synapse, several factors can affect the ease of migration of reports, dashboards, and other visualizations. Those factors include:
-- Incompatibilities that occur during schema migration between your legacy data warehouse and Azure Synapse.
+- Schema incompatibilities between the environments.
-- Incompatibilities in SQL between your legacy data warehouse and Azure Synapse.
+- SQL incompatibilities between the environments.
-### The impact of schema incompatibilities
+### Schema incompatibilities
-> [!TIP]
-> Schema incompatibilities include legacy warehouse DBMS table types and data types that are unsupported on Azure Synapse.
+During a migration, schema incompatibilities in the data warehouse or data mart tables that supply data for reports, dashboards, and other visualizations can be:
-BI tool reports and dashboards, and other visualizations, are produced by issuing SQL queries that access physical tables and/or views in your data warehouse or data mart. When it comes to migrating your data warehouse or data mart schema to Azure Synapse, there may be incompatibilities that can impact reports and dashboards, such as:
+- Non-standard table types in your legacy data warehouse DBMS that don't have an equivalent in Azure Synapse.
-- Non-standard table types supported in your legacy data warehouse DBMS that don't have an equivalent in Azure Synapse.
+- Data types in your legacy data warehouse DBMS that don't have an equivalent in Azure Synapse.
-- Data types supported in your legacy data warehouse DBMS that don't have an equivalent in Azure Synapse.
+In most cases, there's a workaround to the incompatibilities. For example, you can migrate the data in an unsupported table type into a standard table with appropriate data types and indexed or partitioned on a date/time column. Similarly, it might be possible to represent unsupported data types in another type of column and perform calculations in Azure Synapse to achieve the same results.
-In many cases, where there are incompatibilities, there may be ways around them. For example, the data in unsupported table types can be migrated into a standard table with appropriate data types and indexed or partitioned on a date/time column. Similarly, it may be possible to represent unsupported data types in another type of column and perform calculations in Azure Synapse to achieve the same. Either way, it will need refactoring.
+>[!TIP]
+>Schema incompatibilities include legacy warehouse DBMS table types and data types that are unsupported on Azure Synapse.
-> [!TIP]
-> Querying the system catalog of your legacy warehouse DBMS is a quick and straightforward way to identify schema incompatibilities with Azure Synapse.
+To identify the reports affected by schema incompatibilities, run queries against the system catalog of your legacy data warehouse to identify the tables with unsupported data types. Then, you can use metadata from your BI tool to identify the reports that access data in those tables. For more information about how to identify object type incompatibilities, see [Unsupported Netezza database object types](1-design-performance-migration.md#unsupported-netezza-database-object-types).
-To identify reports and visualizations impacted by schema incompatibilities, run queries against the system catalog of your legacy data warehouse to identify tables with unsupported data types. Then use metadata from your BI tool or tools to identify reports that access these structures, to see what could be impacted. Obviously, this will depend on the legacy data warehouse DBMS you're migrating from. Find details of how to identify these incompatibilities in [Design and performance for Netezza migrations](1-design-performance-migration.md).
+>[!TIP]
+>Query the system catalog of your legacy warehouse DBMS to identify schema incompatibilities with Azure Synapse.
-The impact may be less than you think, because many BI tools don't support such data types. As a result, views may already exist in your legacy data warehouse that `CAST` unsupported data types to more generic types.
+The effect of schema incompatibilities on reports, dashboards, and other visualizations might be less than you think because many BI tools don't support the less generic data types. As a result, your legacy data warehouse might already have views that `CAST` unsupported data types to more generic types.
-### The impact of SQL incompatibilities and differences
+### SQL incompatibilities
-Additionally, any report, dashboard, or other visualization in an application or tool that makes use of proprietary SQL extensions associated with your legacy data warehouse DBMS is likely to be impacted when migrating to Azure Synapse. This could happen because the BI tool or application:
+During a migration, SQL incompatibilities are likely to affect any report, dashboard, or other visualization in an application or tool that:
- Accesses legacy data warehouse DBMS views that include proprietary SQL functions that have no equivalent in Azure Synapse. -- Issues SQL queries, which include proprietary SQL functions peculiar to the SQL dialect of your legacy data warehouse DBMS, that have no equivalent in Azure Synapse.
+- Issues SQL queries that include proprietary SQL functions, specific to the SQL dialect of your legacy environment, that have no equivalent in Azure Synapse.
### Gauge the impact of SQL incompatibilities on your reporting portfolio
-You can't rely on documentation associated with reports, dashboards, and other visualizations to gauge how big of an impact SQL incompatibility may have on the portfolio of embedded query services, reports, dashboards, and other visualizations you're intending to migrate to Azure Synapse. There must be a more precise way of doing that.
+Your reporting portfolio might include embedded query services, reports, dashboards, and other visualizations. Don't rely on the documentation associated with those items to gauge the effect of SQL incompatibilities on the migration of your reporting portfolio to Azure Synapse. You need to use a more precise way to assess the effect of SQL incompatibilities.
#### Use EXPLAIN statements to find SQL incompatibilities
-> [!TIP]
-> Gauge the impact of SQL incompatibilities by harvesting your DBMS log files and running `EXPLAIN` statements.
+You can find SQL incompatibilities by querying the `_v_qryhist` system table to view recent SQL activity in your legacy Netezza data warehouse. For more information, see [Query history table](https://www.ibm.com/docs/en/psfa/7.2.1?topic=tables-query-history-table). Use a script to extract a representative set of SQL statements to a file. Then, prefix each SQL statement with an `EXPLAIN` statement, and run those `EXPLAIN` statements in Azure Synapse. Any SQL statements containing proprietary unsupported SQL extensions will be rejected by Azure Synapse when the `EXPLAIN` statements are executed. This approach lets you assess the extent of SQL incompatibilities.
-One way is to view the recent SQL activity of your legacy Netezza data warehouse. Query the `_v_qryhist` system table to view recent history data and determine a representative set of SQL statements into a file. For more information, see [Query history table](https://www.ibm.com/docs/en/psfa/7.2.1?topic=tables-query-history-table). Then, prefix each SQL statement with an `EXPLAIN` statement, and then run all the `EXPLAIN` statements in Azure Synapse. Any SQL statements containing proprietary SQL extensions from your legacy data warehouse that are unsupported will be rejected by Azure Synapse when the `EXPLAIN` statements are executed. This approach would at least give you an idea of how significant or otherwise the use of incompatible SQL is.
+Metadata from your legacy data warehouse DBMS can also help you identify incompatible views. As before, capture a representative set of SQL statements, prefix each SQL statement with an `EXPLAIN` statement, and run those `EXPLAIN` statements in Azure Synapse to identify views with incompatible SQL.
-Metadata from your legacy data warehouse DBMS will also help you when it comes to views. Again, you can capture and view SQL statements, and `EXPLAIN` them as described previously to identify incompatible SQL in views.
+>[!TIP]
+>Gauge the impact of SQL incompatibilities by harvesting your DBMS log files and running `EXPLAIN` statements.
## Test report and dashboard migration to Azure Synapse Analytics
-> [!TIP]
-> Test performance and tune to minimize compute costs.
-
-A key element in data warehouse migration is the testing of reports and dashboards against Azure Synapse to verify that the migration has worked. To do this, you need to define a series of tests and a set of required outcomes for each test that needs to be run to verify success. It's important to ensure that reports and dashboards are tested and compared across your existing and migrated data warehouse systems to:
+A key element of data warehouse migration is testing of reports and dashboards in Azure Synapse to verify the migration has worked. Define a series of tests and a set of required outcomes for each test that you will run to verify success. Test and compare the reports and dashboards across your existing and migrated data warehouse systems to:
-- Identify whether schema changes made during migration, such as data types to be converted, have impacted reports in terms of ability to run results and corresponding visualizations.
+ - Identify whether any schema changes that were made during migration affected the ability of reports to run, report results, or the corresponding report visualizations. An example of a schema change is if you mapped an incompatible data type to an equivalent data type that's supported in Azure Synapse.
-- Verify all users are migrated.
+ - Verify that all users are migrated.
+
+ - Verify that all roles are migrated, and users are assigned to those roles.
+
+ - Verify that all data access security privileges are migrated to ensure access control list (ACL) migration.
+
+ - Ensure consistent results for all known queries, reports, and dashboards.
+
+ - Ensure that data and ETL migration is complete and error-free.
+
+ - Ensure that data privacy is upheld.
+
+ - Test performance and scalability.
+
+ - Test analytical functionality.
-- Verify all roles are migrated and users assigned to those roles.
+>[!TIP]
+>Test and tune performance to minimize compute costs.
-- Verify all data access security privileges are migrated to ensure access control list (ACL) migration.
+For information about how to migrate users, user groups, roles, and privileges, see [Security, access, and operations for Netezza migrations](3-security-access-operations.md).
-- Ensure consistent results of all known queries, reports, and dashboards.
+Automate testing as much as possible to make each test repeatable and to support a consistent approach to evaluating test results. Automation works well for known regular reports, and can be managed via [Azure Synapse pipelines](../../get-started-pipelines.md) or [Azure Data Factory](../../../data-factory/introduction.md) orchestration. If you already have a suite of test queries in place for regression testing, you can use the existing testing tools to automate post migration testing.
-- Ensure that data and ETL migration is complete and error-free.
+>[!TIP]
+>Best practice is to build an automated test suite to make tests repeatable.
-- Ensure data privacy is upheld.
+Ad-hoc analysis and reporting are more challenging and require compilation of a set of tests to verify that the same reports and dashboards from before and after migration are consistent. If you find inconsistencies, then your ability to compare metadata lineage across the original and migrated systems during migration testing becomes crucial. That comparison can highlight differences and pinpoint where inconsistencies originated, when detection by other means is difficult.
-- Test performance and scalability.--- Test analytical functionality.-
-For information about how to migrate users, user groups, roles, and privileges, see [Security, access, and operations for Netezza migrations](3-security-access-operations.md), which is part of this series.
-
-> [!TIP]
-> Build an automated test suite to make tests repeatable.
-
-It's also best practice to automate testing as much as possible, to make each test repeatable and to allow a consistent approach to evaluating results. This works well for known regular reports, and could be managed via [Azure Synapse Pipelines](../../get-started-pipelines.md?msclkid=8f3e7e96cfed11eca432022bc07c18de) or [Azure Data Factory](../../../data-factory/introduction.md?msclkid=2ccc66eccfde11ecaa58877e9d228779) orchestration. If you already have a suite of test queries in place for regression testing, you could use the testing tools to automate the post migration testing.
-
-> [!TIP]
-> Leverage tools that can compare metadata lineage to verify results.
-
-Ad-hoc analysis and reporting are more challenging and require a set of tests to be compiled to verify that results are consistent across your legacy data warehouse DBMS and Azure Synapse. If reports and dashboards are inconsistent, then having the ability to compare metadata lineage across original and migrated systems is extremely valuable during migration testing, as it can highlight differences and pinpoint where they occurred when these aren't easy to detect. This is discussed in more detail later in this article.
-
-In terms of security, the best way to do this is to create roles, assign access privileges to roles, and then attach users to roles. To access your newly migrated data warehouse, set up an automated process to create new users, and to do role assignment. To detach users from roles, you can follow the same steps.
-
-It's also important to communicate the cutover to all users, so they know what's changing and what to expect.
+>[!TIP]
+>Leverage tools that compare metadata lineage to verify results.
## Analyze lineage to understand dependencies between reports, dashboards, and data
-> [!TIP]
-> Having access to metadata and data lineage from reports all the way back to data source is critical for verifying that migrated reports are working correctly.
+Your understanding of lineage is a critical factor in the successful migration of reports and dashboards. Lineage is metadata that shows the journey of migrated data so you can track its path from a report or dashboard all the way back to the data source. Lineage shows how data has traveled from point to point, its location in the data warehouse and/or data mart, and which reports and dashboards use it. Lineage can help you understand what happens to data as it travels through different data stores, such as files and databases, different ETL pipelines, and into reports. When business users have access to data lineage, it improves trust, instills confidence, and supports informed business decisions.
-A critical success factor in migrating reports and dashboards is understanding lineage. Lineage is metadata that shows the journey that data has taken, so you can see the path from the report/dashboard all the way back to where the data originates. It shows how data has gone from point to point, its location in the data warehouse and/or data mart, and where it's used&mdash;for example, in what reports. It helps you understand what happens to data as it travels through different data stores&mdash;files and database&mdash;different ETL pipelines, and into reports. If business users have access to data lineage, it improves trust, breeds confidence, and enables more informed business decisions.
+>[!TIP]
+>Your ability to access metadata and data lineage from reports all the way back to a data source is critical for verifying that migrated reports work correctly.
-> [!TIP]
-> Tools that automate metadata collection and show end-to-end lineage in a multi-vendor environment are valuable when it comes to migration.
+In multi-vendor data warehouse environments, business analysts in BI teams might map out data lineage. For example, if you use different vendors for ETL, data warehouse, and reporting, and each vendor has its own metadata repository, then figuring out where a specific data element in a report came from can be challenging and time-consuming.
-In multi-vendor data warehouse environments, business analysts in BI teams may map out data lineage. For example, if you have Informatica for your ETL, Oracle for your data warehouse, and Tableau for reporting, each of which have their own metadata repository, figuring out where a specific data element in a report came from can be challenging and time consuming.
+>[!TIP]
+>Tools that automate the collection of metadata and show end-to-end lineage in a multi-vendor environment are valuable during a migration.
-To migrate seamlessly from a legacy data warehouse to Azure Synapse, end-to-end data lineage helps prove like-for-like migration when comparing reports and dashboards against your legacy environment. That means that metadata from several tools needs to be captured and integrated to show the end-to-end journey. Having access to tools that support automated metadata discovery and data lineage will let you see duplicate reports and ETL processes and reports that rely on data sources that are obsolete, questionable, or even non-existent. With this information, you can reduce the number of reports and ETL processes that you migrate.
+To migrate seamlessly from a legacy data warehouse to Azure Synapse, use end-to-end data lineage to prove like-for-like migration when you're comparing the reports and dashboards generated by each environment. To show the end-to-end data journey, you'll need to capture and integrate metadata from several tools. Having access to tools that support automated metadata discovery and data lineage, helps you identify duplicate reports or ETL processes, and find reports that rely on obsolete, questionable, or non-existent data sources. You can use that information to reduce the number of reports and ETL processes that you migrate.
-You can also compare end-to-end lineage of a report in Azure Synapse against the end-to-end lineage for the same report in your legacy data warehouse environment, to see if there are any differences that have occurred inadvertently during migration. This helps enormously with testing and verifying migration success.
+You can also compare the end-to-end lineage of a report in Azure Synapse to the end-to-end lineage of the same report in your legacy environment to check for differences that might have inadvertently occurred during migration. This type of comparison is exceptionally useful when you need to test and verify migration success.
-Data lineage visualization not only reduces time, effort, and error in the migration process, but also enables faster execution of the migration project.
+Data lineage visualization not only reduces time, effort, and error in the migration process, but also enables faster migration.
-By leveraging automated metadata discovery and data lineage tools that can compare lineage, you can verify if a report is produced using data migrated to Azure Synapse and if it's produced in the same way as in your legacy environment. This kind of capability also helps you determine:
+By using automated metadata discovery and data lineage tools that compare lineage, you can verify that a report in Azure Synapse that's produced from migrated data is produced in the same way in your legacy environment. This capability also helps you determine:
-- What data needs to be migrated to ensure successful report and dashboard execution on Azure Synapse.
+- What data needs to be migrated to ensure successful report and dashboard execution in Azure Synapse.
-- What transformations have been and should be performed to ensure successful execution on Azure Synapse.
+- What transformations have been and should be performed to ensure successful execution in Azure Synapse.
- How to reduce report duplication.
-This substantially simplifies the data migration process, because the business will have a better idea of the data assets it has and what needs to be migrated to enable a solid reporting environment on Azure Synapse.
-
-> [!TIP]
-> Azure Data Factory and several third-party ETL tools support lineage.
+Automated metadata discovery and data lineage tools substantially simplify the migration process because they help businesses become more aware of their data assets and to know what needs to be migrated to Azure Synapse to achieve a solid reporting environment.
-Several ETL tools provide end-to-end lineage capability, and you may be able to make use of this via your existing ETL tool if you're continuing to use it with Azure Synapse. [Azure Synapse Pipelines](../../get-started-pipelines.md?msclkid=8f3e7e96cfed11eca432022bc07c18de) or [Azure Data Factory](../../../data-factory/introduction.md?msclkid=2ccc66eccfde11ecaa58877e9d228779) lets you view lineage in mapping flows. Also, [Microsoft partners](../../partner/data-integration.md) provide automated metadata discovery, data lineage, and lineage comparison tools.
+Several ETL tools provide end-to-end lineage capability, so check whether your existing ETL tool has that capability if you plan to use it with Azure Synapse. Azure Synapse pipelines or Data Factory both support the ability to view lineage in mapping flows. [Microsoft partners](../../partner/data-integration.md) also provide automated metadata discovery, data lineage, and lineage comparison tools.
## Migrate BI tool semantic layers to Azure Synapse Analytics
-> [!TIP]
-> Some BI tools have semantic layers that simplify business user access to physical data structures in your data warehouse or data mart, like SAP Business Objects and IBM Cognos.
+Some BI tools have what is known as a semantic metadata layer. That layer simplifies business user access to the underlying physical data structures in a data warehouse or data mart database. The semantic metadata layer simplifies access by providing high-level objects like dimensions, measures, hierarchies, calculated metrics, and joins. The high-level objects use business terms that are familiar to business analysts, and map to physical data structures in your data warehouse or data mart.
-Some BI tools have what is known as a semantic metadata layer. The role of this metadata layer is to simplify business user access to physical data structures in an underlying data warehouse or data mart database. It does this by providing high-level objects like dimensions, measures, hierarchies, calculated metrics, and joins. These objects use business terms familiar to business analysts and are mapped to the physical data structures in the data warehouse or data mart database.
+>[!TIP]
+>Some BI tools have semantic layers that simplify business user access to physical data structures in your data warehouse or data mart.
-When it comes to data warehouse migration, changes to column names or table names may be forced upon you. For example, in Oracle, table names can have a "#". In Azure Synapse, the "#" is only allowed as a prefix to a table name to indicate a temporary table. Therefore, you may need to change a table name if migrating from Oracle. You may need to do rework to change mappings in such cases.
+In a data warehouse migration, you might be forced to change column or table names. For example, Oracle allows a `#` character in table names, but Azure Synapse only allows `#` as a table name prefix to indicate a temporary table. In such cases, you might also need to change mappings.
-A good way to get everything consistent across multiple BI tools is to create a universal semantic layer, using common data names for high-level objects like dimensions, measures, hierarchies, and joins, in a data virtualization server (as shown in the next diagram) that sits between applications, BI tools, and Azure Synapse. This allows you to set up everything once (instead of in every tool), including calculated fields, joins and mappings, and then point all BI tools at the data virtualization server.
+To achieve consistency across multiple BI tools, create a universal semantic layer by using a data virtualization server that sits between BI tools and applications and Azure Synapse. In the data virtualization server, use common data names for high-level objects like dimensions, measures, hierarchies, and joins. That way you configure everything, including calculated fields, joins, and mappings, only once instead of in every tool. Then, point all BI tools at the data virtualization server.
-> [!TIP]
-> Use data virtualization to create a common semantic layer to guarantee consistency across all BI tools in an Azure Synapse environment.
+>[!TIP]
+>Use data virtualization to create a common semantic layer to guarantee consistency across all BI tools in an Azure Synapse environment.
-In this way, you get consistency across all BI tools, while at the same time breaking the dependency between BI tools and applications and the underlying physical data structures in Azure Synapse. Use [Microsoft partners](../../partner/data-integration.md) on Azure to implement this. The following diagram shows how a common vocabulary in the data virtualization server lets multiple BI tools see a common semantic layer.
+With data virtualization, you get consistency across all BI tools and break the dependency between BI tools and applications and the underlying physical data structures in Azure Synapse. [Microsoft partners](../../partner/data-integration.md) can help you achieve consistency in Azure. The following diagram shows how a common vocabulary in the data virtualization server lets multiple BI tools see a common semantic layer.
:::image type="content" source="../media/4-visualization-reporting/data-virtualization-semantics.png" border="true" alt-text="Diagram with common data names and definitions that relate to the data virtualization server."::: ## Conclusions
-> [!TIP]
-> Identify incompatibilities early to gauge the extent of the migration effort. Migrate your users, group roles and privilege assignments. Only migrate the reports and visualizations that are used and are contributing to business value.
+In a lift and shift data warehouse migration, most reports, dashboards, and other visualizations should migrate easily.
-In a lift and shift data warehouse migration to Azure Synapse, most reports and dashboards should migrate easily.
+During a migration from a legacy environment, you might find that data in the legacy data warehouse or data mart tables is stored in unsupported data types. Or, you may find legacy data warehouse views that include proprietary SQL with no equivalent in Azure Synapse. If so, you'll need to resolve those issues to ensure a successful migration to Azure Synapse.
-However, if data structures change, then data is stored in unsupported data types, or access to data in the data warehouse or data mart is via a view that includes proprietary SQL that's unsupported in your Azure Synapse environment. You'll need to deal with those issues if they arise.
+Don't rely on user-maintained documentation to identify where issues are located. Instead, use `EXPLAIN` statements because they're a quick, pragmatic way to identify SQL incompatibilities. Rework the incompatible SQL statements to achieve equivalent functionality in Azure Synapse. Also, use automated metadata discovery and lineage tools to understand dependencies, find duplicate reports, and identify invalid reports that rely on obsolete, questionable, or non-existent data sources. Use lineage tools to compare lineage to verify that reports running in your legacy data warehouse environment are produced identically in Azure Synapse.
-You can't rely on documentation to find out where the issues are likely to be. Making use of `EXPLAIN` statements is a pragmatic and quick way to identify incompatibilities in SQL. Rework these to achieve similar results in Azure Synapse. In addition, it's recommended that you make use of automated metadata discovery and lineage tools to help you identify duplicate reports, reports that are no longer valid because they're using data from data sources that you no longer use, and to understand dependencies. Some of these tools help compare lineage to verify that reports running in your legacy data warehouse environment are produced identically in Azure Synapse.
+Don't migrate reports that you no longer use. BI tool usage data can help you determine which reports aren't in use. For the reports, dashboards, and other visualizations that you do want to migrate, migrate all users, user groups, roles, and privileges. If you're using business value to drive your report migration strategy, associate reports with strategic business objectives and priorities to help identify the contribution of report insights to specific objectives. If you're migrating data mart by data mart, use metadata to identify which reports are dependent on which tables and views, so you can make an informed decision about which data marts to migrate first.
-Don't migrate reports that you no longer use. BI tool usage data can help determine which ones aren't in use. For the visualizations and reports that you do want to migrate, migrate all users, user groups, roles, and privileges, and associate these reports with strategic business objectives and priorities to help you identify report insight contribution to specific objectives. This is useful if you're using business value to drive your report migration strategy. If you're migrating by data store, data mart by data mart, then metadata will also help you identify which reports are dependent on which tables and views, so that you can focus on migrating to these first.
+>[!TIP]
+>Identify incompatibilities early to gauge the extent of the migration effort. Migrate your users, group roles, and privilege assignments. Only migrate the reports and visualizations that are used and are contributing to business value.
-Finally, consider data virtualization to shield BI tools and applications from structural changes to the data warehouse and/or the data mart data model that may occur during migration. You can also use a common vocabulary with data virtualization to define a common semantic layer that guarantees consistent common data names, definitions, metrics, hierarchies, joins, and more across all BI tools and applications in a migrated Azure Synapse environment.
+Structural changes to the data model of your data warehouse or data mart can occur during a migration. Consider using data virtualization to shield BI tools and applications from structural changes. With data virtualization, you can use a common vocabulary to define a common semantic layer. The common semantic layer guarantees consistent common data names, definitions, metrics, hierarchies, and joins across all BI tools and applications in the new Azure Synapse environment.
## Next steps
synapse-analytics 5 Minimize Sql Issues https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/netezza/5-minimize-sql-issues.md
Previously updated : 05/31/2022 Last updated : 06/01/2022 # Minimize SQL issues for Netezza migrations
-This article is part five of a seven part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. This article provides best practices for minimizing SQL issues.
+This article is part five of a seven-part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. The focus of this article is best practices for minimizing SQL issues.
## Overview
synapse-analytics 6 Microsoft Third Party Migration Tools https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/netezza/6-microsoft-third-party-migration-tools.md
Previously updated : 05/31/2022 Last updated : 07/12/2022 # Tools for Netezza data warehouse migration to Azure Synapse Analytics
-This article is part six of a seven part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. This article provides best practices for Microsoft and third-party tools.
+This article is part six of a seven-part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. The focus of this article is best practices for Microsoft and third-party tools.
## Data warehouse migration tools
-By migrating your existing data warehouse to Azure Synapse Analytics, you benefit from:
+By migrating your existing data warehouse to Azure Synapse, you benefit from:
- A globally secure, scalable, low-cost, cloud-native, pay-as-you-use analytical database. -- The rich Microsoft analytical ecosystem that exists on Azure. This ecosystem consists of technologies to help modernize your data warehouse once it's migrated, and extends your analytical capabilities to drive new value.
+- The rich Microsoft analytical ecosystem that exists on Azure. This ecosystem consists of technologies to help modernize your data warehouse once it's migrated and extend your analytical capabilities to drive new value.
-Several tools from Microsoft and third-party partner vendors can help you migrate your existing data warehouse to Azure Synapse. These tools include:
+Several tools from both Microsoft and [third-party partners](../../partner/data-integration.md) can help you migrate your existing data warehouse to Azure Synapse. This article discusses the following types of tools:
- Microsoft data and database migration tools.
Several tools from Microsoft and third-party partner vendors can help you migrat
- Third-party data warehouse migration tools to migrate schema and data to Azure Synapse. -- Third-party tools to minimize the impact on SQL differences between your existing data warehouse DBMS and Azure Synapse.-
-The following sections discuss these tools in more detail.
+- Third-party tools to bridge the SQL differences between your existing data warehouse DBMS and Azure Synapse.
## Microsoft data migration tools
-> [!TIP]
-> Data Factory includes tools to help migrate your data and your entire data warehouse to Azure.
- Microsoft offers several tools to help you migrate your existing data warehouse to Azure Synapse, such as: -- Microsoft Azure Data Factory.
+- [Azure Data Factory](../../../data-factory/introduction.md).
- Microsoft services for physical data transfer. - Microsoft services for data ingestion.
+The next sections discuss these tools in more detail.
+ ### Microsoft Azure Data Factory
-Microsoft Azure Data Factory is a fully managed, pay-as-you-use, hybrid data integration service for highly scalable ETL and ELT processing. It uses Spark to process and analyze data in parallel and in memory to maximize throughput.
+Data Factory is a fully managed, pay-as-you-use, hybrid data integration service for highly scalable ETL and ELT processing. It uses Apache Spark to process and analyze data in parallel and in-memory to maximize throughput.
-> [!TIP]
-> Data Factory allows you to build scalable data integration pipelines code-free.
+>[!TIP]
+>Data Factory allows you to build scalable data integration pipelines code-free.
-[Azure Data Factory connectors](../../../data-factory/connector-overview.md?msclkid=00086e4acff211ec9263dee5c7eb6e69) connect to external data sources and databases and have templates for common data integration tasks. A visual front-end, browser-based UI enables non-programmers to create and run process pipelines to ingest, transform, and load data. More experienced programmers have the option to incorporate custom code, such as Python programs.
+[Data Factory connectors](../../../data-factory/connector-overview.md) support connections to external data sources and databases and include templates for common data integration tasks. A visual front-end, browser-based UI enables non-programmers to create and run [pipelines](../../data-explorer/ingest-dat) to ingest, transform, and load data. More experienced programmers can incorporate custom code, such as Python programs.
-> [!TIP]
-> Data Factory enables collaborative development between business and IT professionals.
+>[!TIP]
+>Data Factory enables collaborative development between business and IT professionals.
-Data Factory is also an orchestration tool. It's the best Microsoft tool to automate the end-to-end migration process to reduce risk and make the migration process easily repeatable. The following diagram shows a Data Factory mapping data flow.
+Data Factory is also an orchestration tool and is the best Microsoft tool to automate the end-to-end migration process. Automation reduces the risk, effort, and time to migrate, and makes the migration process easily repeatable. The following diagram shows a mapping data flow in Data Factory.
-The next screenshot shows a Data Factory wrangling data flow.
+The next screenshot shows a wrangling data flow in Data Factory.
-You can develop simple or comprehensive ETL and ELT processes without coding or maintenance with a few clicks. These processes ingest, move, prepare, transform, and process your data. You can design and manage scheduling and triggers in Azure Data Factory to build an automated data integration and loading environment. In Data Factory, you can define, manage, and schedule PolyBase bulk data load processes.
+In Data Factory, you can develop simple or comprehensive ETL and ELT processes without coding or maintenance with just a few clicks. ETL/ELT processes ingest, move, prepare, transform, and process your data. You can design and manage scheduling and triggers in Data Factory to build an automated data integration and loading environment. In Data Factory, you can define, manage, and schedule PolyBase bulk data load processes.
-> [!TIP]
-> Data Factory includes tools to help migrate your data and your entire data warehouse to Azure.
+>[!TIP]
+>Data Factory includes tools to help migrate both your data and your entire data warehouse to Azure.
-You can use Data Factory to implement and manage a hybrid environment that includes on-premises, cloud, streaming and SaaS data&mdash;for example, from applications like Salesforce&mdash;in a secure and consistent way.
+You can use Data Factory to implement and manage a hybrid environment with on-premises, cloud, streaming, and SaaS data in a secure and consistent way. SaaS data might come from applications such as Salesforce.
-A new capability in Data Factory is wrangling data flows. This opens up Data Factory to business users who want to visually discover, explore, and prepare data at scale without writing code. This capability, similar to Microsoft Excel Power Query or Microsoft Power BI dataflows, offers self-service data preparation. Business users can prepare and integrate data through a spreadsheet-style user interface with drop-down transform options.
+Wrangling data flows is a new capability in Data Factory. This capability opens up Data Factory to business users who want to visually discover, explore, and prepare data at scale without writing code. Wrangling data flows offer self-service data preparation, similar to Microsoft Excel, Power Query, and Microsoft Power BI dataflows. Business users can prepare and integrate data through a spreadsheet-style UI with drop-down transform options.
-Azure Data Factory is the recommended approach for implementing data integration and ETL/ELT processes for an Azure Synapse environment, especially if existing legacy processes need to be refactored.
+Data Factory is the recommended approach for implementing data integration and ETL/ELT processes in the Azure Synapse environment, especially if you want to refactor existing legacy processes.
### Microsoft services for physical data transfer
-> [!TIP]
-> Microsoft offers a range of products and services to assist with data transfer.
+The following sections discuss a range of products and services that Microsoft offers to assist customers with data transfer.
#### Azure ExpressRoute
-Azure ExpressRoute creates private connections between Azure data centers and infrastructure on your premises or in a collocation environment. ExpressRoute connections don't go over the public internet, and they offer more reliability, faster speeds, and lower latencies than typical internet connections. In some cases, by using ExpressRoute connections to transfer data between on-premises systems and Azure, you gain significant cost benefits.
+[Azure ExpressRoute](../../../expressroute/expressroute-introduction.md) creates private connections between Azure data centers and infrastructure on your premises or in a collocation environment. ExpressRoute connections don't go over the public internet, and offer more reliability, faster speeds, and lower latencies than typical internet connections. In some cases, you gain significant cost benefits by using ExpressRoute connections to transfer data between on-premises systems and Azure.
#### AzCopy
-[AzCopy](../../../storage/common/storage-use-azcopy-v10.md) is a command line utility that copies files to Azure Blob Storage via a standard internet connection. In a warehouse migration project, you can use AzCopy to upload extracted, compressed, and delimited text files before loading through PolyBase, or a native Parquet reader if the exported files are Parquet format. AzCopy can upload individual files, file selections, or file directories.
+[AzCopy](../../../storage/common/storage-use-azcopy-v10.md) is a command line utility that copies files to Azure Blob Storage over a standard internet connection. In a warehouse migration project, you can use AzCopy to upload extracted, compressed, delimited text files before loading them into Azure Synapse using [PolyBase](#polybase). AzCopy can upload individual files, file selections, or file folders. If the exported files are in Parquet format, use a native Parquet reader instead.
#### Azure Data Box
-Microsoft offers a service called Azure Data Box. This service writes data to be migrated to a physical storage device. This device is then shipped to an Azure data center and loaded into cloud storage. The service can be cost-effective for large volumes of data&mdash;for example, tens or hundreds of terabytes&mdash;or where network bandwidth isn't readily available. Azure Data Box is typically used for one-off historical data load when migrating a large amount of data to Azure Synapse.
+[Azure Data Box](../../../databox/data-box-overview.md) is a Microsoft service that provides you with a proprietary physical storage device that you can copy migration data onto. You then ship the device to an Azure data center for data upload to cloud storage. This service can be cost-effective for large volumes of data, such as tens or hundreds of terabytes, or where network bandwidth isn't readily available. Azure Data Box is typically used for a large one-off historical data load into Azure Synapse.
-Another service is Data Box Gateway, a virtualized cloud storage gateway device that resides on your premises and sends your images, media, and other data to Azure. Use Data Box Gateway for one-off migration tasks or ongoing incremental data uploads.
+#### Azure Data Box Gateway
+
+[Azure Data Box Gateway](../../../databox-gateway/data-box-gateway-overview.md) is a virtualized cloud storage gateway device that resides on your premises and sends your images, media, and other data to Azure. Use Data Box Gateway for one-off migration tasks or ongoing incremental data uploads.
### Microsoft services for data ingestion
+The following sections discuss the products and services that Microsoft offers to assist customers with data ingestion.
+ #### COPY INTO
-The [COPY](/sql/t-sql/statements/copy-into-transact-sql) statement provides the most flexibility for high-throughput data ingestion into Azure Synapse Analytics. Refer to the list of capabilities that `COPY` offers for data ingestion.
+The [COPY INTO](/sql/t-sql/statements/copy-into-transact-sql#syntax) statement provides the most flexibility for high-throughput data ingestion into Azure Synapse. For more information about `COPY INTO` capabilities, see [COPY (Transact-SQL)](/sql/t-sql/statements/copy-into-transact-sql).
#### PolyBase
-> [!TIP]
-> PolyBase can load data in parallel from Azure Blob Storage into Azure Synapse.
+[PolyBase](../../sql/load-data-overview.md) is the fastest, most scalable method for bulk data load into Azure Synapse. PolyBase uses the massively parallel processing (MPP) architecture of Azure Synapse for parallel loading of data to achieve the fastest throughput. PolyBase can read data from flat files in Azure Blob Storage, or directly from external data sources and other relational databases via connectors.
-PolyBase provides the fastest and most scalable method of loading bulk data into Azure Synapse. PolyBase leverages the MPP architecture to use parallel loading, to give the fastest throughput, and can read data from flat files in Azure Blob Storage or directly from external data sources and other relational databases via connectors.
+>[!TIP]
+>PolyBase can load data in parallel from Azure Blob Storage into Azure Synapse.
-PolyBase can also directly read from files compressed with gzip&mdash;this reduces the physical volume of data moved during the load process. PolyBase supports popular data formats such as delimited text, ORC, and Parquet.
+PolyBase can also directly read from files compressed with gzip to reduce the physical volume of data during a load process. PolyBase supports popular data formats such as delimited text, ORC, and Parquet.
-> [!TIP]
-> Invoke PolyBase from Azure Data Factory as part of a migration pipeline.
+>[!TIP]
+>You can invoke PolyBase from Data Factory as part of a migration pipeline.
-PolyBase is tightly integrated with Azure Data Factory to enable data load ETL/ELT processes to be rapidly developed and scheduled through a visual GUI, leading to higher productivity and fewer errors than hand-written code.
+PolyBase is tightly integrated with Data Factory to support rapid development of data load ETL/ELT processes. You can schedule data load processes through a visual UI for higher productivity and fewer errors than hand-written code. Microsoft recommends PolyBase for data ingestion into Azure Synapse, especially for high-volume data ingestion.
-PolyBase is the recommended data load method for Azure Synapse, especially for high-volume data. PolyBase loads data using the `CREATE TABLE AS` or `INSERT...SELECT` statements&mdash;CTAS achieves the highest possible throughput as it minimizes the amount of logging required. Compressed delimited text files are the most efficient input format. For maximum throughput, split very large input files into multiple smaller files and load these in parallel. For fastest loading to a staging table, define the target table as type `HEAP` and use round-robin distribution.
+PolyBase uses `CREATE TABLE AS` or `INSERT...SELECT` statements to load data. `CREATE TABLE AS` minimizes logging to achieve the highest throughput. The most efficient input format for data load is compressed delimited text files. For maximum throughput, split large input files into multiple smaller files and load them in parallel. For fastest loading to a staging table, define the target table as `HEAP` type and use round-robin distribution.
-However, PolyBase has some limitations. Rows to be loaded must be less than 1 MB in length. Fixed-width format or nested data, such as JSON and XML, aren't directly readable.
+PolyBase has some limitations, it requires data row length to be less than 1 megabyte and doesn't support fixed-width nested formats like JSON and XML.
-## Microsoft partners can help you migrate your data warehouse to Azure Synapse Analytics
+### Microsoft partners for Netezza migrations
-In addition to tools that can help you with various aspects of data warehouse migration, there are several practiced [Microsoft partners](../../partner/data-integration.md) that can bring their expertise to help you move your legacy on-premises data warehouse platform to Azure Synapse.
+[Microsoft partners](../../partner/data-integration.md) offer tools, services, and expertise to help you migrate your legacy on-premises data warehouse platform to Azure Synapse.
## Next steps
-To learn more about implementing modern data warehouses, see the next article in this series: [Beyond Netezza migration, implementing a modern data warehouse in Microsoft Azure](7-beyond-data-warehouse-migration.md).
+To learn more about implementing modern data warehouses, see the next article in this series: [Beyond Netezza migration, implement a modern data warehouse in Microsoft Azure](7-beyond-data-warehouse-migration.md).
synapse-analytics 7 Beyond Data Warehouse Migration https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/netezza/7-beyond-data-warehouse-migration.md
Title: "Beyond Netezza migration, implementing a modern data warehouse in Microsoft Azure"
+ Title: "Beyond Netezza migration, implement a modern data warehouse in Microsoft Azure"
description: Learn how a Netezza migration to Azure Synapse Analytics lets you integrate your data warehouse with the Microsoft Azure analytical ecosystem.
Previously updated : 05/31/2022 Last updated : 07/12/2022
-# Beyond Netezza migration, implementing a modern data warehouse in Microsoft Azure
+# Beyond Netezza migration, implement a modern data warehouse in Microsoft Azure
-This article is part seven of a seven part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. This article provides best practices for implementing modern data warehouses.
+This article is part seven of a seven-part series that provides guidance on how to migrate from Netezza to Azure Synapse Analytics. The focus of this article is best practices for implementing modern data warehouses.
## Beyond data warehouse migration to Azure
-One of the key reasons to migrate your existing data warehouse to Azure Synapse Analytics is to utilize a globally secure, scalable, low-cost, cloud-native, pay-as-you-use analytical database. Azure Synapse also lets you integrate your migrated data warehouse with the complete Microsoft Azure analytical ecosystem to take advantage of, and integrate with, other Microsoft technologies that help you modernize your migrated data warehouse. This includes integrating with technologies like:
+A key reason to migrate your existing data warehouse to Azure Synapse Analytics is to utilize a globally secure, scalable, low-cost, cloud-native, pay-as-you-use analytical database. With Azure Synapse, you can integrate your migrated data warehouse with the complete Microsoft Azure analytical ecosystem to take advantage of other Microsoft technologies and modernize your migrated data warehouse. Those technologies include:
-- Azure Data Lake Storage for cost effective data ingestion, staging, cleansing, and transformation, to free up data warehouse capacity occupied by fast growing staging tables.
+- [Azure Data Lake Storage](../../../storage/blobs/data-lake-storage-introduction.md) for cost effective data ingestion, staging, cleansing, and transformation. Data Lake Storage can free up the data warehouse capacity occupied by fast-growing staging tables.
-- Azure Data Factory for collaborative IT and self-service data integration [with connectors](../../../data-factory/connector-overview.md) to cloud and on-premises data sources and streaming data.
+- [Azure Data Factory](../../../data-factory/introduction.md) for collaborative IT and self-service data integration with [connectors](../../../data-factory/connector-overview.md) to cloud and on-premises data sources and streaming data.
-- [The Open Data Model Common Data Initiative](/common-data-model/) to share consistent trusted data across multiple technologies, including:
+- [Common Data Model](/common-data-model/) to share consistent trusted data across multiple technologies, including:
- Azure Synapse - Azure Synapse Spark - Azure HDInsight - Power BI
- - SAP
- Adobe Customer Experience Platform - Azure IoT
- - Microsoft ISV Partners
+ - Microsoft ISV partners
-- [Microsoft's data science technologies](/azure/architecture/data-science-process/platforms-and-tools), including:
- - Azure Machine Learning Studio
+- Microsoft [data science technologies](/azure/architecture/data-science-process/platforms-and-tools), including:
+ - Azure Machine Learning studio
- Azure Machine Learning - Azure Synapse Spark (Spark as a service) - Jupyter Notebooks - RStudio - ML.NET
- - .NET for Apache Spark to enable data scientists to use Azure Synapse data to train machine learning models at scale.
+ - .NET for Apache Spark, which lets data scientists use Azure Synapse data to train machine learning models at scale.
-- [Azure HDInsight](../../../hdinsight/index.yml) to leverage big data analytical processing and join big data with Azure Synapse data by creating a logical data warehouse using PolyBase.
+- [Azure HDInsight](../../../hdinsight/index.yml) to process large amounts of data, and to join big data with Azure Synapse data by creating a logical data warehouse using PolyBase.
-- [Azure Event Hubs](../../../event-hubs/event-hubs-about.md), [Azure Stream Analytics](../../../stream-analytics/stream-analytics-introduction.md), and [Apache Kafka](/azure/databricks/spark/latest/structured-streaming/kafka) to integrate with live streaming data from Azure Synapse.
+- [Azure Event Hubs](../../../event-hubs/event-hubs-about.md), [Azure Stream Analytics](../../../stream-analytics/stream-analytics-introduction.md), and [Apache Kafka](/azure/databricks/spark/latest/structured-streaming/kafka) to integrate live streaming data from Azure Synapse.
-There's often acute demand to integrate with [machine learning](../../machine-learning/what-is-machine-learning.md) to enable custom-built, trained machine learning models for use in Azure Synapse. This would enable in-database analytics to run at scale in-batch, on an event-driven basis and on-demand. The ability to exploit in-database analytics in Azure Synapse from multiple BI tools and applications also guarantees that all get the same predictions and recommendations.
+The growth of big data has led to an acute demand for [machine learning](../../machine-learning/what-is-machine-learning.md) to enable custom-built, trained machine learning models for use in Azure Synapse. Machine learning models enable in-database analytics to run at scale in batch, on an event-driven basis and on-demand. The ability to take advantage of in-database analytics in Azure Synapse from multiple BI tools and applications also guarantees consistent predictions and recommendations.
-In addition, there's an opportunity to integrate Azure Synapse with Microsoft partner tools on Azure to shorten time to value.
+In addition, you can integrate Azure Synapse with Microsoft partner tools on Azure to shorten time to value.
-Let's look at these in more detail to understand how you can take advantage of the technologies in Microsoft's analytical ecosystem to modernize your data warehouse once you've migrated to Azure Synapse.
+Let's take a closer look at how you can take advantage of technologies in the Microsoft analytical ecosystem to modernize your data warehouse after you've migrated to Azure Synapse.
-## Offload data staging and ETL processing to Azure Data Lake and Azure Data Factory
+## Offload data staging and ETL processing to Data Lake Storage and Data Factory
-Enterprises today have a key problem resulting from digital transformation. So much new data is being generated and captured for analysis, and much of this data is finding its way into data warehouses. A good example is transaction data created by opening OLTP systems to self-service access from mobile devices. These OLTP systems are the main sources of data to a data warehouse, and with customers now driving the transaction rate rather than employees, data in data warehouse staging tables has been growing rapidly in volume.
+Digital transformation has created a key challenge for enterprises by generating a torrent of new data for capture and analysis. A good example is transaction data created by opening online transactional processing (OLTP) systems to service access from mobile devices. Much of this data finds its way into data warehouses, and OLTP systems are the main source. With customers now driving the transaction rate rather than employees, the volume of data in data warehouse staging tables has been growing rapidly.
-The rapid influx of data into the enterprise, along with new sources of data like Internet of Things (IoT) streams, means that companies need to find a way to deal with unprecedented data growth and scale data integration ETL processing beyond current levels. One way to do this is to offload ingestion, data cleansing, transformation, and integration to a data lake and process it at scale there, as part of a data warehouse modernization program.
+With the rapid influx of data into the enterprise, along with new sources of data like Internet of Things (IoT), companies must find ways to scale up data integration ETL processing. One method is to offload ingestion, data cleansing, transformation, and integration to a data lake and process data at scale there, as part of a data warehouse modernization program.
-Once you've migrated your data warehouse to Azure Synapse, Microsoft provides the ability to modernize your ETL processing by ingesting data into, and staging data in, Azure Data Lake Storage. You can then clean, transform and integrate your data at scale using Data Factory before loading it into Azure Synapse in parallel using PolyBase.
+Once you've migrated your data warehouse to Azure Synapse, Microsoft can modernize your ETL processing by ingesting and staging data in Data Lake Storage. You can then clean, transform, and integrate your data at scale using Data Factory before loading it into Azure Synapse in parallel using PolyBase.
-For ELT strategies, consider offloading ELT processing to Azure Data Lake to easily scale as your data volume or frequency grows.
+For ELT strategies, consider offloading ELT processing to Data Lake Storage to easily scale as your data volume or frequency grows.
### Microsoft Azure Data Factory
-> [!TIP]
-> Data Factory allows you to build scalable data integration pipelines code-free.
+[Azure Data Factory](../../../data-factory/introduction.md) is a pay-as-you-use, hybrid data integration service for highly scalable ETL and ELT processing. Data Factory provides a web-based UI to build data integration pipelines with no code. With Data Factory, you can:
-[Data Factory](https://azure.microsoft.com/services/data-factory/) is a pay-as-you-use, hybrid data integration service for highly scalable ETL and ELT processing. Data Factory provides a simple web-based user interface to build data integration pipelines in a code-free manner that can:
+- Build scalable data integration pipelines code-free.
-- Build scalable data integration pipelines code-free. Easily acquire data at scale. Pay only for what you use, and connect to on-premises, cloud, and SaaS-based data sources.
+- Easily acquire data at scale.
-- Ingest, move, clean, transform, integrate, and analyze cloud and on-premises data at scale. Take automatic action, such as a recommendation or alert.
+- Pay only for what you use.
+
+- Connect to on-premises, cloud, and SaaS-based data sources.
+
+- Ingest, move, clean, transform, integrate, and analyze cloud and on-premises data at scale.
- Seamlessly author, monitor, and manage pipelines that span data stores both on-premises and in the cloud. - Enable pay-as-you-go scale-out in alignment with customer growth.
-> [!TIP]
-> Data Factory can connect to on-premises, cloud, and SaaS data.
+You can use these features without writing any code, or you can add custom code to Data Factory pipelines. The following screenshot shows an example Data Factory pipeline.
-All of this can be done without writing any code. However, adding custom code to Data Factory pipelines is also supported. The next screenshot shows an example Data Factory pipeline.
+>[!TIP]
+>Data Factory lets you to build scalable data integration pipelines without code.
-> [!TIP]
-> Pipelines called data factories control the integration and analysis of data. Data Factory is enterprise-class data integration software aimed at IT professionals with a data wrangling facility for business users.
+Implement Data Factory pipeline development from any of several places, including:
-Implement Data Factory pipeline development from any of several places including:
+- Microsoft Azure portal.
-- Microsoft Azure portal
+- Microsoft Azure PowerShell.
-- Microsoft Azure PowerShell
+- Programmatically from .NET and Python using a multi-language SDK.
-- Programmatically from .NET and Python using a multi-language SDK
+- Azure Resource Manager (ARM) templates.
-- Azure Resource Manager (ARM) templates
+- REST APIs.
-- REST APIs
+>[!TIP]
+>Data Factory can connect to on-premises, cloud, and SaaS data.
-Developers and data scientists who prefer to write code can easily author Data Factory pipelines in Java, Python, and .NET using the software development kits (SDKs) available for those programming languages. Data Factory pipelines can also be hybrid since they can connect, ingest, clean, transform, and analyze data in on-premises data centers, Microsoft Azure, other clouds, and SaaS offerings.
+Developers and data scientists who prefer to write code can easily author Data Factory pipelines in Java, Python, and .NET using the software development kits (SDKs) available for those programming languages. Data Factory pipelines can be hybrid data pipelines because they can connect, ingest, clean, transform, and analyze data in on-premises data centers, Microsoft Azure, other clouds, and SaaS offerings.
-Once you develop Data Factory pipelines to integrate and analyze data, deploy those pipelines globally and schedule them to run in batch, invoke them on demand as a service, or run them in real-time on an event-driven basis. A Data Factory pipeline can also run on one or more execution engines and monitor pipeline execution to ensure performance and track errors.
+After you develop Data Factory pipelines to integrate and analyze data, you can deploy those pipelines globally and schedule them to run in batch, invoke them on demand as a service, or run them in real-time on an event-driven basis. A Data Factory pipeline can also run on one or more execution engines and monitor execution to ensure performance and to track errors.
-#### Use cases
+>[!TIP]
+>In Azure Data Factory, pipelines control the integration and analysis of data. Data Factory is enterprise-class data integration software aimed at IT professionals and has data wrangling capability for business users.
-> [!TIP]
-> Build data warehouses on Microsoft Azure.
+#### Use cases
-Data Factory can support multiple use cases, including:
+Data Factory supports multiple use cases, such as:
-- Preparing, integrating, and enriching data from cloud and on-premises data sources to populate your migrated data warehouse and data marts on Microsoft Azure Synapse.
+- Prepare, integrate, and enrich data from cloud and on-premises data sources to populate your migrated data warehouse and data marts on Microsoft Azure Synapse.
-- Preparing, integrating, and enriching data from cloud and on-premises data sources to produce training data for use in machine learning model development and in retraining analytical models.
+- Prepare, integrate, and enrich data from cloud and on-premises data sources to produce training data for use in machine learning model development and in retraining analytical models.
-- Orchestrating data preparation and analytics to create predictive and prescriptive analytical pipelines for processing and analyzing data in batch, such as sentiment analytics, and either acting on the results of the analysis or populating your data warehouse with the results.
+- Orchestrate data preparation and analytics to create predictive and prescriptive analytical pipelines for processing and analyzing data in batch, such as sentiment analytics. Either act on the results of the analysis or populate your data warehouse with the results.
-- Preparing, integrating, and enriching data for data-driven business applications running on the Azure cloud on top of operational data stores like Azure Cosmos DB.
+- Prepare, integrate, and enrich data for data-driven business applications running on the Azure cloud on top of operational data stores such as Azure Cosmos DB.
-> [!TIP]
-> Build training data sets in data science to develop machine learning models.
+>[!TIP]
+>Build training data sets in data science to develop machine learning models.
#### Data sources
Data Factory lets you use [connectors](../../../data-factory/connector-overview.
#### Transform data using Azure Data Factory
-> [!TIP]
-> Professional ETL developers can use Azure Data Factory mapping data flows to clean, transform, and integrate data without the need to write code.
+Within a Data Factory pipeline, you can ingest, clean, transform, integrate, and analyze any type of data from these sources. Data can be structured, semi-structured like JSON or Avro, or unstructured.
-Within a Data Factory pipeline, ingest, clean, transform, integrate, and, if necessary, analyze any type of data from these sources. This includes structured, semi-structured such as JSON or Avro, and unstructured data.
+Without writing any code, professional ETL developers can use Data Factory mapping data flows to filter, split, join several types, lookup, pivot, unpivot, sort, union, and aggregate data. In addition, Data Factory supports surrogate keys, multiple write processing options like insert, upsert, update, table recreation, and table truncation, and several types of target data stores&mdash;also known as sinks. ETL developers can also create aggregations, including time-series aggregations that require a window to be placed on data columns.
-Professional ETL developers can use Data Factory mapping data flows to filter, split, join (many types), lookup, pivot, unpivot, sort, union, and aggregate data without writing any code. In addition, Data Factory supports surrogate keys, multiple write processing options such as insert, upsert, update, table recreation, and table truncation, and several types of target data stores&mdash;also known as sinks. ETL developers can also create aggregations, including time-series aggregations that require a window to be placed on data columns.
+>[!TIP]
+>Professional ETL developers can use Data Factory mapping data flows to clean, transform, and integrate data without the need to write code.
-> [!TIP]
-> Data Factory supports the ability to automatically detect and manage schema changes in inbound data, such as in streaming data.
+You can run mapping data flows that transform data as activities in a Data Factory pipeline, and if necessary, you can include multiple mapping data flows in a single pipeline. In this way, you can manage the complexity by breaking up challenging data transformation and integration tasks into smaller mapping dataflows that can be combined. And, you can add custom code when needed. In addition to this functionality, Data Factory mapping data flows include the ability to:
-Run mapping data flows that transform data as activities in a Data Factory pipeline. Include multiple mapping data flows in a single pipeline, if necessary. Break up challenging data transformation and integration tasks into smaller mapping dataflows that can be combined to handle the complexity and custom code added if necessary. In addition to this functionality, Data Factory mapping data flows include these abilities:
+- Define expressions to clean and transform data, compute aggregations, and enrich data. For example, these expressions can perform feature engineering on a date field to break it into multiple fields to create training data during machine learning model development. You can construct expressions from a rich set of functions that include mathematical, temporal, split, merge, string concatenation, conditions, pattern match, replace, and many other functions.
-- Define expressions to clean and transform data, compute aggregations, and enrich data. For example, these expressions can perform feature engineering on a date field to break it into multiple fields to create training data during machine learning model development. Construct expressions from a rich set of functions that include mathematical, temporal, split, merge, string concatenation, conditions, pattern match, replace, and many other functions.--- Automatically handle schema drift so that data transformation pipelines can avoid being impacted by schema changes in data sources. This is especially important for streaming IoT data, where schema changes can happen without notice when devices are upgraded or when readings are missed by gateway devices collecting IoT data.
+- Automatically handle schema drift so that data transformation pipelines can avoid being impacted by schema changes in data sources. This ability is especially important for streaming IoT data, where schema changes can happen without notice if devices are upgraded or when readings are missed by gateway devices collecting IoT data.
- Partition data to enable transformations to run in parallel at scale. -- Inspect data to view the metadata of a stream you're transforming.
+- Inspect streaming data to view the metadata of a stream you're transforming.
+
+>[!TIP]
+>Data Factory supports the ability to automatically detect and manage schema changes in inbound data, such as in streaming data.
-> [!TIP]
-> Data Factory can also partition data to enable ETL processing to run at scale.
+The following screenshot shows an example Data Factory mapping data flow.
-The next screenshot shows an example Data Factory mapping data flow.
+Data engineers can profile data quality and view the results of individual data transforms by enabling debug capability during development.
-Data engineers can profile data quality and view the results of individual data transforms by switching on a debug capability during development.
+>[!TIP]
+>Data Factory can also partition data to enable ETL processing to run at scale.
-> [!TIP]
-> Data Factory pipelines are also extensible since Data Factory allows you to write your own code and run it as part of a pipeline.
+If necessary, you can extend Data Factory transformational and analytical functionality by adding a linked service that contains your code into a pipeline. For example, an Azure Synapse Spark pool notebook might contain Python code that uses a trained model to score the data integrated by a mapping data flow.
-Extend Data Factory transformational and analytical functionality by adding a linked service containing your own code into a pipeline. For example, an Azure Synapse Spark pool notebook containing Python code could use a trained model to score the data integrated by a mapping data flow.
+You can store integrated data and any results from analytics within a Data Factory pipeline in one or more data stores, such as Data Lake Storage, Azure Synapse, or Hive tables in HDInsight. You can also invoke other activities to act on insights produced by a Data Factory analytical pipeline.
-Store integrated data and any results from analytics included in a Data Factory pipeline in one or more data stores such as Azure Data Lake Storage, Azure Synapse, or Azure HDInsight (Hive tables). Invoke other activities to act on insights produced by a Data Factory analytical pipeline.
+>[!TIP]
+>Data Factory pipelines are extensible because Data Factory lets you write your own code and run it as part of a pipeline.
#### Utilize Spark to scale data integration
-Internally, Data Factory utilizes Azure Synapse Spark Pools&mdash;Microsoft's Spark-as-a-service offering&mdash;at run time to clean and integrate data on the Microsoft Azure cloud. This enables it to clean, integrate, and analyze high-volume and very high-velocity data (such as click stream data) at scale. Microsoft intends to execute Data Factory pipelines on other Spark distributions. In addition to executing ETL jobs on Spark, Data Factory can also invoke Pig scripts and Hive queries to access and transform data stored in Azure HDInsight.
+At run time, Data Factory internally uses Azure Synapse Spark pools, which are Microsoft's Spark as a service offering, to clean and integrate data in the Azure cloud. You can clean, integrate, and analyze high-volume, high-velocity data, such as click-stream data, at scale. Microsoft's intention is to also run Data Factory pipelines on other Spark distributions. In addition to running ETL jobs on Spark, Data Factory can invoke Pig scripts and Hive queries to access and transform data stored in HDInsight.
#### Link self-service data prep and Data Factory ETL processing using wrangling data flows
-> [!TIP]
-> Data Factory support for wrangling data flows in addition to mapping data flows means that business and IT can work together on a common platform to integrate data.
+Data wrangling lets business users, also known as citizen data integrators and data engineers, make use of the platform to visually discover, explore, and prepare data at scale without writing code. This Data Factory capability is easy to use and is similar to Microsoft Excel Power Query or Microsoft Power BI dataflows, where self-service business users use a spreadsheet-style UI with drop-down transforms to prepare and integrate data. The following screenshot shows an example Data Factory wrangling data flow.
-Another new capability in Data Factory is wrangling data flows. This lets business users (also known as citizen data integrators and data engineers) make use of the platform to visually discover, explore, and prepare data at scale without writing code. This easy-to-use Data Factory capability is similar to Microsoft Excel Power Query or Microsoft Power BI dataflows, where self-service data preparation business users use a spreadsheet-style UI with drop-down transforms to prepare and integrate data. The following screenshot shows an example Data Factory wrangling data flow.
+Unlike Excel and Power BI, Data Factory [wrangling data flows](../../../data-factory/wrangling-tutorial.md) use Power Query to generate M code and then translate it into a massively parallel in-memory Spark job for cloud-scale execution. The combination of mapping data flows and wrangling data flows in Data Factory lets professional ETL developers and business users collaborate to prepare, integrate, and analyze data for a common business purpose. The preceding Data Factory mapping data flows diagram shows how both Data Factory and Azure Synapse Spark pool notebooks can be combined in the same Data Factory pipeline. The combination of mapping and wrangling data flows in Data Factory helps IT and business users stay aware of what data flows each has created, and supports data flow reuse to minimize reinvention and maximize productivity and consistency.
-This differs from Excel and Power BI, as Data Factory [wrangling data flows](../../../data-factory/wrangling-tutorial.md) use Power Query to generate M code and translate it into a massively parallel in-memory Spark job for cloud-scale execution. The combination of mapping data flows and wrangling data flows in Data Factory lets IT professional ETL developers and business users collaborate to prepare, integrate, and analyze data for a common business purpose. The preceding Data Factory mapping data flow diagram shows how both Data Factory and Azure Synapse Spark pool notebooks can be combined in the same Data Factory pipeline. This allows IT and business to be aware of what each has created. Mapping data flows and wrangling data flows can then be available for reuse to maximize productivity and consistency and minimize reinvention.
+>[!TIP]
+>Data Factory supports both wrangling data flows and mapping data flows, so business users and IT users can integrate data collaboratively on a common platform.
#### Link data and analytics in analytical pipelines
-In addition to cleaning and transforming data, Data Factory can combine data integration and analytics in the same pipeline. Use Data Factory to create both data integration and analytical pipelines&mdash;the latter being an extension of the former. Drop an analytical model into a pipeline so that clean, integrated data can be stored to provide predictions or recommendations. Act on this information immediately or store it in your data warehouse to provide you with new insights and recommendations that can be viewed in BI tools.
+In addition to cleaning and transforming data, Data Factory can combine data integration and analytics in the same pipeline. You can use Data Factory to create both data integration and analytical pipelines, the latter being an extension of the former. You can drop an analytical model into a pipeline to create an analytical pipeline that generates clean, integrated data for predictions or recommendations. Then, you can act on the predictions or recommendations immediately, or store them in your data warehouse to provide new insights and recommendations that can be viewed in BI tools.
-Models developed code-free with Azure Machine Learning Studio, or with the Azure Machine Learning SDK using Azure Synapse Spark pool notebooks or using R in RStudio, can be invoked as a service from within a Data Factory pipeline to batch score your data. Analysis happens at scale by executing Spark machine learning pipelines on Azure Synapse Spark pool notebooks.
+To batch score your data, you can develop an analytical model that you invoke as a service within a Data Factory pipeline. You can develop analytical models code-free with Azure Machine Learning studio, or with the Azure Machine Learning SDK using Azure Synapse Spark pool notebooks or R in RStudio. When you run Spark machine learning pipelines on Azure Synapse Spark pool notebooks, analysis happens at scale.
-Store integrated data and any results from analytics included in a Data Factory pipeline in one or more data stores, such as Azure Data Lake Storage, Azure Synapse, or Azure HDInsight (Hive tables). Invoke other activities to act on insights produced by a Data Factory analytical pipeline.
+You can store integrated data and any Data Factory analytical pipeline results in one or more data stores, such as Data Lake Storage, Azure Synapse, or Hive tables in HDInsight. You can also invoke other activities to act on insights produced by a Data Factory analytical pipeline.
-## A lake database to share consistent trusted data
+## Use a lake database to share consistent trusted data
-> [!TIP]
-> Microsoft has created a lake database to describe core data entities to be shared across the enterprise.
+A key objective of any data integration setup is the ability to integrate data once and reuse it everywhere, not just in a data warehouse. For example, you might want to use integrated data in data science. Reuse avoids reinvention and ensures consistent, commonly understood data that everyone can trust.
-A key objective in any data integration setup is the ability to integrate data once and reuse it everywhere, not just in a data warehouse&mdash;for example, in data science. Reuse avoids reinvention and ensures consistent, commonly understood data that everyone can trust.
+[Common Data Model](/common-data-model/) describes core data entities that can be shared and reused across the enterprise. To achieve reuse, Common Data Model establishes a set of common data names and definitions that describe logical data entities. Examples of common data names include Customer, Account, Product, Supplier, Orders, Payments, and Returns. IT and business professionals can use data integration software to create and store common data assets to maximize their reuse and drive consistency everywhere.
-> [!TIP]
-> Azure Data Lake Storage is shared storage that underpins Microsoft Azure Synapse, Azure Machine Learning, Azure Synapse Spark, and Azure HDInsight.
+Azure Synapse provides industry-specific database templates to help standardize data in the lake. [Lake database templates](../../database-designer/concepts-database-templates.md) provide schemas for predefined business areas, enabling data to be loaded into a lake database in a structured way. The power comes when you use data integration software to create lake database common data assets, resulting in self-describing trusted data that can be consumed by applications and analytical systems. You can create common data assets in Data Lake Storage by using Data Factory.
-To achieve this goal, establish a set of common data names and definitions describing logical data entities that need to be shared across the enterprise&mdash;such as customer, account, product, supplier, orders, payments, returns, and so forth. Once this is done, IT and business professionals can use data integration software to create these common data assets and store them to maximize their reuse to drive consistency everywhere.
+>[!TIP]
+>Data Lake Storage is shared storage that underpins Microsoft Azure Synapse, Azure Machine Learning, Azure Synapse Spark, and HDInsight.
-> [!TIP]
-> Integrating data to create lake database logical entities in shared storage enables maximum reuse of common data assets.
+Power BI, Azure Synapse Spark, Azure Synapse, and Azure Machine Learning can consume common data assets. The following diagram shows how a lake database can be used in Azure Synapse.
-Microsoft has done this by creating a [lake database](../../database-designer/concepts-lake-database.md). The lake database is a common language for business entities that represents commonly used concepts and activities across a business. Azure Synapse Analytics provides industry specific database templates to help standardize data in the lake. [Lake database templates](../../database-designer/concepts-database-templates.md) provide schemas for predefined business areas, enabling data to be loaded into a lake database in a structured way. The power comes when data integration software is used to create lake database common data assets. This results in self-describing trusted data that can be consumed by applications and analytical systems. Create a lake database in Azure Data Lake Storage by using Azure Data Factory, and consume it with Power BI, Azure Synapse Spark, Azure Synapse, and Azure Machine Learning. The following diagram shows a lake database used in Azure Synapse Analytics.
+>[!TIP]
+>Integrate data to create lake database logical entities in shared storage to maximize the reuse of common data assets.
## Integration with Microsoft data science technologies on Azure
-Another key requirement in modernizing your migrated data warehouse is to integrate it with Microsoft and third-party data science technologies on Azure to produce insights for competitive advantage. Let's look at what Microsoft offers in terms of machine learning and data science technologies and see how these can be used with Azure Synapse in a modern data warehouse environment.
+Another key objective when modernizing a data warehouse is to produce insights for competitive advantage. You can produce insights by integrating your migrated data warehouse with Microsoft and third-party data science technologies in Azure. The following sections describe machine learning and data science technologies offered by Microsoft to see how they can be used with Azure Synapse in a modern data warehouse environment.
### Microsoft technologies for data science on Azure
-> [!TIP]
-> Develop machine learning models using a no/low-code approach or from a range of programming languages like Python, R, and .NET.
-
-Microsoft offers a range of technologies to build predictive analytical models using machine learning, analyze unstructured data using deep learning, and perform other kinds of advanced analytics. This includes:
+Microsoft offers a range of technologies that support advance analysis. With these technologies, you can build predictive analytical models using machine learning or analyze unstructured data using deep learning. The technologies include:
-- Azure Machine Learning Studio
+- Azure Machine Learning studio
- Azure Machine Learning
Microsoft offers a range of technologies to build predictive analytical models u
- .NET for Apache Spark
-Data scientists can use RStudio (R) and Jupyter Notebooks (Python) to develop analytical models, or they can use other frameworks such as Keras or TensorFlow.
+Data scientists can use RStudio (R) and Jupyter Notebooks (Python) to develop analytical models, or they can use frameworks such as Keras or TensorFlow.
-#### Azure Machine Learning Studio
+>[!TIP]
+>Develop machine learning models using a no/low-code approach or by using programming languages like Python, R, and .NET.
-Azure Machine Learning Studio is a fully managed cloud service that lets you easily build, deploy, and share predictive analytics via a drag-and-drop web-based user interface. The next screenshot shows an Azure Machine Learning Studio user interface.
+#### Azure Machine Learning studio
+Azure Machine Learning studio is a fully managed cloud service that lets you build, deploy, and share predictive analytics using a drag-and-drop, web-based UI. The following screenshot shows the Azure Machine Learning studio UI.
+ #### Azure Machine Learning
-> [!TIP]
-> Azure Machine Learning provides an SDK for developing machine learning models using several open-source frameworks.
+Azure Machine Learning provides an SDK and services for Python that supports can help you quickly prepare data and also train and deploy machine learning models. You can use Azure Machine Learning in Azure notebooks using Jupyter Notebook, with open-source frameworks, such as PyTorch, TensorFlow, scikit-learn, or Spark MLlib&mdash;the machine learning library for Spark. Azure Machine Learning provides an AutoML capability that automatically tests multiple algorithms to identify the most accurate algorithms to expedite model development.
+
+>[!TIP]
+>Azure Machine Learning provides an SDK for developing machine learning models using several open-source frameworks.
-Azure Machine Learning provides a software development kit (SDK) and services for Python to quickly prepare data, as well as train and deploy machine learning models. Use Azure Machine Learning from Azure notebooks (a Jupyter Notebook service) and utilize open-source frameworks, such as PyTorch, TensorFlow, Spark MLlib (Azure Synapse Spark pool notebooks), or scikit-learn. Azure Machine Learning provides an AutoML capability that automatically identifies the most accurate algorithms to expedite model development. You can also use it to build machine learning pipelines that manage end-to-end workflow, programmatically scale on the cloud, and deploy models both to the cloud and the edge. Azure Machine Learning uses logical containers called workspaces, which can be either created manually from the Azure portal or created programmatically. These workspaces keep compute targets, experiments, data stores, trained machine learning models, Docker images, and deployed services all in one place to enable teams to work together. Use Azure Machine Learning from Visual Studio with a Visual Studio for AI extension.
+You can also use Azure Machine Learning to build machine learning pipelines that manage end-to-end workflow, programmatically scale in the cloud, and deploy models both to the cloud and the edge. Azure Machine Learning contains [workspaces](../../../machine-learning/concept-workspace.md), which are logical spaces that you can programmatically or manually create in the Azure portal. These workspaces keep compute targets, experiments, data stores, trained machine learning models, Docker images, and deployed services all in one place to enable teams to work together. You can use Azure Machine Learning in Visual Studio with the Visual Studio for AI extension.
-> [!TIP]
-> Organize and manage related data stores, experiments, trained models, Docker images, and deployed services in workspaces.
+>[!TIP]
+>Organize and manage related data stores, experiments, trained models, Docker images, and deployed services in workspaces.
#### Azure Synapse Spark pool notebooks
-> [!TIP]
-> Azure Synapse Spark is Microsoft's dynamically scalable Spark-as-a-service, offering scalable execution of data preparation, model development, and deployed model execution.
+An [Azure Synapse Spark pool notebook](../../spark/apache-spark-development-using-notebooks.md) is an Azure-optimized Apache Spark service. With Azure Synapse Spark pool notebooks:
-[Azure Synapse Spark pool notebooks](../../spark/apache-spark-development-using-notebooks.md?msclkid=cbe4b8ebcff511eca068920ea4bf16b9) is an Apache Spark service optimized to run on Azure, which:
+- Data engineers can build and run scalable data preparation jobs using Data Factory.
-- Allows data engineers to build and execute scalable data preparation jobs using Azure Data Factory.
+- Data scientists can build and run machine learning models at scale using notebooks written in languages such as Scala, R, Python, Java, and SQL to visualize results.
-- Allows data scientists to build and execute machine learning models at scale using notebooks written in languages such as Scala, R, Python, Java, and SQL; and to visualize results.
+>[!TIP]
+>Azure Synapse Spark is a dynamically scalable Spark as a service offering from Microsoft, Spark offers scalable execution of data preparation, model development, and deployed model execution.
-> [!TIP]
-> Azure Synapse Spark can access data in a range of Microsoft analytical ecosystem data stores on Azure.
+Jobs running in Azure Synapse Spark pool notebooks can retrieve, process, and analyze data at scale from Azure Blob Storage, Data Lake Storage, Azure Synapse, HDInsight, and streaming data services such as Apache Kafka.
-Jobs running in Azure Synapse Spark pool notebook can retrieve, process, and analyze data at scale from Azure Blob Storage, Azure Data Lake Storage, Azure Synapse, Azure HDInsight, and streaming data services such as Kafka.
+>[!TIP]
+>Azure Synapse Spark can access data in a range of Microsoft analytical ecosystem data stores on Azure.
-Autoscaling and auto-termination are also supported to reduce total cost of ownership (TCO). Data scientists can use the MLflow open-source framework to manage the machine learning lifecycle.
+Azure Synapse Spark pool notebooks support autoscaling and auto-termination to reduce total cost of ownership (TCO). Data scientists can use the MLflow open-source framework to manage the machine learning lifecycle.
#### ML.NET
-> [!TIP]
-> Microsoft has extended its machine learning capability to .NET developers.
+ML.NET is an open-source, cross-platform machine learning framework for Windows, Linux, macOS. Microsoft created ML.NET so that .NET developers can use existing tools, such as ML.NET Model Builder for Visual Studio, to develop custom machine learning models and integrate them into their .NET applications.
-ML.NET is an open-source and cross-platform machine learning framework (Windows, Linux, macOS), created by Microsoft for .NET developers so that they can use existing tools&mdash;like ML.NET Model Builder for Visual Studio&mdash;to develop custom machine learning models and integrate them into .NET applications.
+>[!TIP]
+>Microsoft has extended its machine learning capability to .NET developers.
#### .NET for Apache Spark
-.NET for Apache Spark aims to make Spark accessible to .NET developers across all Spark APIs. It takes Spark support beyond R, Scala, Python, and Java to .NET. While initially only available on Apache Spark on HDInsight, Microsoft intends to make this available on Azure Synapse Spark pool notebook.
+.NET for Apache Spark extends Spark support beyond R, Scala, Python, and Java to .NET and aims to make Spark accessible to .NET developers across all Spark APIs. While .NET for Apache Spark is currently only available on Apache Spark in HDInsight, Microsoft intends to make .NET for Apache Spark available on Azure Synapse Spark pool notebooks.
### Use Azure Synapse Analytics with your data warehouse
-> [!TIP]
-> Train, test, evaluate, and execute machine learning models at scale on Azure Synapse Spark pool notebook by using data in Azure Synapse.
+To combine machine learning models with Azure Synapse, you can:
-Combine machine learning models with Azure Synapse by:
+- Use machine learning models in batch or in real-time on streaming data to produce new insights, and add those insights to what you already know in Azure Synapse.
-- Using machine learning models in batch mode or in real-time to produce new insights, and add them to what you already know in Azure Synapse.
+- Use the data in Azure Synapse to develop and train new predictive models for deployment elsewhere, such as in other applications.
-- Using the data in Azure Synapse to develop and train new predictive models for deployment elsewhere, such as in other applications.
+- Deploy machine learning models, including models trained elsewhere, in Azure Synapse to analyze data in your data warehouse and drive new business value.
-- Deploying machine learning models, including those trained elsewhere, in Azure Synapse to analyze data in the data warehouse and drive new business value.
+>[!TIP]
+>Train, test, evaluate, and run machine learning models at scale on Azure Synapse Spark pool notebooks by using data in Azure Synapse.
-> [!TIP]
-> Produce new insights using machine learning on Azure in batch or in real-time and add to what you know in your data warehouse.
+Data scientists can use RStudio, Jupyter Notebooks, and Azure Synapse Spark pool notebooks together with Azure Machine Learning to develop machine learning models that run at scale on Azure Synapse Spark pool notebooks using data in Azure Synapse. For example, data scientists could create an unsupervised model to segment customers to drive different marketing campaigns. Use supervised machine learning to train a model to predict a specific outcome, such as to predict a customer's propensity to churn, or to recommend the next best offer for a customer to try to increase their value. The following diagram shows how Azure Synapse can be used for Azure Machine Learning.
-In terms of machine learning model development, data scientists can use RStudio, Jupyter Notebooks, and Azure Synapse Spark pool notebooks together with Azure Machine Learning to develop machine learning models that run at scale on Azure Synapse Spark pool notebooks using data in Azure Synapse. For example, they could create an unsupervised model to segment customers for use in driving different marketing campaigns. Use supervised machine learning to train a model to predict a specific outcome, such as predicting a customer's propensity to churn, or recommending the next best offer for a customer to try to increase their value. The next diagram shows how Azure Synapse Analytics can be leveraged for Azure Machine Learning.
+In another scenario, you can ingest social network or review website data into Data Lake Storage, then prepare and analyze the data at scale on an Azure Synapse Spark pool notebook using natural language processing to score customer sentiment about your products or brand. You can then add those scores to your data warehouse. By using big data analytics to understand the effect of negative sentiment on product sales, you add to what you already know in your data warehouse.
-In addition, you can ingest big data&mdash;such as social network data or review website data&mdash;into Azure Data Lake, then prepare and analyze it at scale on Azure Synapse Spark pool notebook, using natural language processing to score sentiment about your products or your brand. Add these scores to your data warehouse to understand the impact of&mdash;for example&mdash;negative sentiment on product sales, and to leverage big data analytics to add to what you already know in your data warehouse.
+>[!TIP]
+>Produce new insights using machine learning on Azure in batch or in real-time and add to what you know in your data warehouse.
## Integrate live streaming data into Azure Synapse Analytics
-When analyzing data in a modern data warehouse, you must be able to analyze streaming data in real-time and join it with historical data in your data warehouse. An example of this would be combining IoT data with product or asset data.
+When analyzing data in a modern data warehouse, you must be able to analyze streaming data in real-time and join it with historical data in your data warehouse. An example is combining IoT data with product or asset data.
-> [!TIP]
-> Integrate your data warehouse with streaming data from IoT devices or clickstream.
+>[!TIP]
+>Integrate your data warehouse with streaming data from IoT devices or clickstreams.
-Once you've successfully migrated your data warehouse to Azure Synapse, you can introduce this capability as part of a data warehouse modernization exercise. Do this by taking advantage of additional functionality in Azure Synapse.
+Once you've successfully migrated your data warehouse to Azure Synapse, you can introduce live streaming data integration as part of a data warehouse modernization exercise by taking advantage of the extra functionality in Azure Synapse. To do so, ingest streaming data via Event Hubs, other technologies like Apache Kafka, or potentially your existing ETL tool if it supports the streaming data sources. Store the data in Data Lake Storage. Then, create an external table in Azure Synapse using PolyBase and point it at the data being streamed into Data Lake Storage so that your data warehouse now contains new tables that provide access to the real-time streaming data. Query the external table as if the data was in the data warehouse by using standard T-SQL from any BI tool that has access to Azure Synapse. You can also join the streaming data to other tables with historical data to create views that join live streaming data to historical data to make it easier for business users to access the data.
-> [!TIP]
-> Ingest streaming data into Azure Data Lake Storage from Azure Event Hubs or Kafka, and access it from Azure Synapse using PolyBase external tables.
+>[!TIP]
+>Ingest streaming data into Data Lake Storage from Event Hubs or Apache Kafka, and access the data from Azure Synapse using PolyBase external tables.
-To do this, ingest streaming data via Azure Event Hubs or other technologies, such as Kafka, using Azure Data Factory (or using an existing ETL tool if it supports the streaming data sources). Store the data in Azure Data Lake Storage (ADLS). Next, create an external table in Azure Synapse using PolyBase and point it at the data being streamed into Azure Data Lake. Your migrated data warehouse will now contain new tables that provide access to real-time streaming data. Query this external table as if the data was in the data warehouse via standard T-SQL from any BI tool that has access to Azure Synapse. You can also join this data to other tables containing historical data and create views that join live streaming data to historical data to make it easier for business users to access. In the following diagram, a real-time data warehouse on Azure Synapse Analytics is integrated with streaming data in ADLS.
+In the following diagram, a real-time data warehouse on Azure Synapse is integrated with streaming data in Data Lake Storage.
## Create a logical data warehouse using PolyBase
-> [!TIP]
-> PolyBase simplifies access to multiple underlying analytical data stores on Azure to simplify access for business users.
-
-PolyBase offers the capability to create a logical data warehouse to simplify user access to multiple analytical data stores.
+With PolyBase, you can create a logical data warehouse to simplify user access to multiple analytical data stores. Many companies have adopted "workload optimized" analytical data stores over the last several years in addition to their data warehouses. The analytical platforms on Azure include:
-This is attractive because many companies have adopted "workload optimized" analytical data stores over the last several years in addition to their data warehouses. Examples of these platforms on Azure include:
+- Data Lake Storage with Azure Synapse Spark pool notebook (Spark as a service), for big data analytics.
-- ADLS with Azure Synapse Spark pool notebook (Spark-as-a-service), for big data analytics.--- Azure HDInsight (Hadoop as-a-service), also for big data analytics.
+- HDInsight (Hadoop as a service), also for big data analytics.
- NoSQL Graph databases for graph analysis, which could be done in Azure Cosmos DB. -- Azure Event Hubs and Azure Stream Analytics, for real-time analysis of data in motion.
+- Event Hubs and Stream Analytics, for real-time analysis of data in motion.
+
+You might have non-Microsoft equivalents of these platforms, or a master data management (MDM) system that needs to be accessed for consistent trusted data on customers, suppliers, products, assets, and more.
-You may have non-Microsoft equivalents of some of these. You may also have a master data management (MDM) system that needs to be accessed for consistent trusted data on customers, suppliers, products, assets, and more.
+>[!TIP]
+>PolyBase simplifies access to multiple underlying analytical data stores on Azure for ease of access by business users.
-These additional analytical platforms have emerged because of the explosion of new data sources&mdash;both inside and outside the enterprises&mdash;that business users want to capture and analyze. Examples include:
+Those analytical platforms emerged because of the explosion of new data sources inside and outside the enterprise and the demand by business users to capture and analyze the new data. The new data sources include:
- Machine generated data, such as IoT sensor data and clickstream data.
These additional analytical platforms have emerged because of the explosion of n
- Other external data, such as open government data and weather data.
-This data is over and above the structured transaction data and master data sources that typically feed data warehouses. These new data sources include semi-structured data (like JSON, XML, or Avro) or unstructured data (like text, voice, image, or video), which is more complex to process and analyze. This data could be very high volume, high velocity, or both.
+This new data goes beyond the structured transaction data and main data sources that typically feed data warehouses and often includes:
+
+- Semi-structured data like JSON, XML, or Avro.
+- Unstructured data like text, voice, image, or video, which is more complex to process and analyze.
+- High volume data, high velocity data, or both.
-As a result, the need for new kinds of more complex analysis has emerged, such as natural language processing, graph analysis, deep learning, streaming analytics, or complex analysis of large volumes of structured data. All of this is typically not happening in a data warehouse, so it's not surprising to see different analytical platforms for different types of analytical workloads, as shown in the following diagram.
+As a result, new more complex kinds of analysis have emerged, such as natural language processing, graph analysis, deep learning, streaming analytics, or complex analysis of large volumes of structured data. These kinds of analysis typically don't happen in a data warehouse, so it's not surprising to see different analytical platforms for different types of analytical workloads, as shown in the following diagram.
-Since these platforms are producing new insights, it's normal to see a requirement to combine these insights with what you already know in Azure Synapse. That's what PolyBase makes possible.
+>[!TIP]
+>The ability to make data in multiple analytical data stores look like it's all in one system and join it to Azure Synapse is known as a logical data warehouse architecture.
-> [!TIP]
-> The ability to make data in multiple analytical data stores look like it's all in one system and join it to Azure Synapse is known as a logical data warehouse architecture.
+Because these platforms produce new insights, it's normal to see a requirement to combine the new insights with what you already know in Azure Synapse, which is what PolyBase makes possible.
-By leveraging PolyBase data virtualization inside Azure Synapse, you can implement a logical data warehouse. Join data in Azure Synapse to data in other Azure and on-premises analytical data stores&mdash;like Azure HDInsight or Azure Cosmos DB&mdash;or to streaming data flowing into ADLS from Azure Stream Analytics and Event Hubs. Users access external tables in Azure Synapse, unaware that the data they're accessing is stored in multiple underlying analytical systems. The next diagram shows the complex data warehouse structure accessed through comparatively simpler but still powerful user interface methods.
+By using PolyBase data virtualization inside Azure Synapse, you can implement a logical data warehouse where data in Azure Synapse is joined to data in other Azure and on-premises analytical data stores like HDInsight, Azure Cosmos DB, or streaming data flowing into Data Lake Storage from Stream Analytics or Event Hubs. This approach lowers the complexity for users, who access external tables in Azure Synapse and don't need to know that the data they're accessing is stored in multiple underlying analytical systems. The following diagram shows a complex data warehouse structure accessed through comparatively simpler yet still powerful UI methods.
-The previous diagram shows how other technologies of the Microsoft analytical ecosystem can be combined with the capability of Azure Synapse logical data warehouse architecture. For example, data can be ingested into ADLS and curated using Azure Data Factory to create trusted data products that represent Microsoft [lake database](../../database-designer/concepts-lake-database.md) logical data entities. This trusted, commonly understood data can then be consumed and reused in different analytical environments such as Azure Synapse, Azure Synapse Spark pool notebooks, or Azure Cosmos DB. All insights produced in these environments are accessible via a logical data warehouse data virtualization layer made possible by PolyBase.
+The diagram shows how other technologies in the Microsoft analytical ecosystem can be combined with the capability of the logical data warehouse architecture in Azure Synapse. For example, you can ingest data into Data Lake Storage and curate the data using Data Factory to create trusted data products that represent Microsoft [lake database](../../database-designer/concepts-lake-database.md) logical data entities. This trusted, commonly understood data can then be consumed and reused in different analytical environments such as Azure Synapse, Azure Synapse Spark pool notebooks, or Azure Cosmos DB. All insights produced in these environments are accessible via a logical data warehouse data virtualization layer made possible by PolyBase.
-> [!TIP]
-> A logical data warehouse architecture simplifies business user access to data and adds new value to what you already know in your data warehouse.
+>[!TIP]
+>A logical data warehouse architecture simplifies business user access to data and adds new value to what you already know in your data warehouse.
## Conclusions
-> [!TIP]
-> Migrating your data warehouse to Azure Synapse lets you make use of a rich Microsoft analytical ecosystem running on Azure.
+After you migrate your data warehouse to Azure Synapse, you can take advantage of other technologies in the Microsoft analytical ecosystem. By doing so, you not only modernize your data warehouse, but bring insights produced in other Azure analytical data stores into an integrated analytical architecture.
-Once you migrate your data warehouse to Azure Synapse, you can leverage other technologies in the Microsoft analytical ecosystem. You don't only modernize your data warehouse, but combine insights produced in other Azure analytical data stores into an integrated analytical architecture.
+You can broaden your ETL processing to ingest data of any type into Data Lake Storage, and then prepare and integrate the data at scale using Data Factory to produce trusted, commonly understood data assets. Those assets can be consumed by your data warehouse and accessed by data scientists and other applications. You can build real-time and batch oriented analytical pipelines and create machine learning models to run in batch, in real-time on streaming data, and on-demand as a service.
-Broaden your ETL processing to ingest data of any type into ADLS. Prepare and integrate it at scale using Azure Data Factory to produce trusted, commonly understood data assets that can be consumed by your data warehouse and accessed by data scientists and other applications. Build real-time and batch-oriented analytical pipelines and create machine learning models to run in batch, in real-time on streaming data, and on-demand as a service.
+You can use PolyBase or `COPY INTO` to go beyond your data warehouse to simplify access to insights from multiple underlying analytical platforms on Azure. To do so, create holistic integrated views in a logical data warehouse that support access to streaming, big data, and traditional data warehouse insights from BI tools and applications.
-Leverage PolyBase and `COPY INTO` to go beyond your data warehouse. Simplify access to insights from multiple underlying analytical platforms on Azure by creating holistic integrated views in a logical data warehouse. Easily access streaming, big data, and traditional data warehouse insights from BI tools and applications to drive new value in your business.
+By migrating your data warehouse to Azure Synapse, you can take advantage of the rich Microsoft analytical ecosystem running on Azure to drive new value in your business.
## Next steps
-To learn more about migrating to a dedicated SQL pool, see [Migrate a data warehouse to a dedicated SQL pool in Azure Synapse Analytics](../migrate-to-synapse-analytics-guide.md).
+To learn about migrating to a dedicated SQL pool, see [Migrate a data warehouse to a dedicated SQL pool in Azure Synapse Analytics](../migrate-to-synapse-analytics-guide.md).
synapse-analytics 1 Design Performance Migration https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/teradata/1-design-performance-migration.md
Previously updated : 05/31/2022 Last updated : 07/12/2022 # Design and performance for Teradata migrations
-This article is part one of a seven part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. This article provides best practices for design and performance.
+This article is part one of a seven-part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. The focus of this article is best practices for design and performance.
## Overview
-Many existing users of Teradata data warehouse systems want to take advantage of the innovations provided by newer environments such as cloud, IaaS, and PaaS, and to delegate tasks like infrastructure maintenance and platform development to the cloud provider.
+Many existing users of Teradata data warehouse systems want to take advantage of the innovations provided by modern cloud environments. Infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) cloud environments let you delegate tasks like infrastructure maintenance and platform development to the cloud provider.
-> [!TIP]
-> More than just a database&mdash;the Azure environment includes a comprehensive set of capabilities and tools.
+>[!TIP]
+>More than just a database&mdash;the Azure environment includes a comprehensive set of capabilities and tools.
-Although Teradata and Azure Synapse Analytics are both SQL databases designed to use massively parallel processing (MPP) techniques to achieve high query performance on exceptionally large data volumes, there are some basic differences in approach:
+Although Teradata and Azure Synapse Analytics are both SQL databases that use massively parallel processing (MPP) techniques to achieve high query performance on exceptionally large data volumes, there are some basic differences in approach:
- Legacy Teradata systems are often installed on-premises and use proprietary hardware, while Azure Synapse is cloud-based and uses Azure Storage and compute resources. -- Since storage and compute resources are separate in the Azure environment, these resources can be scaled upwards or downwards independently, leveraging the elastic scaling capability.
+- Because storage and compute resources are separate in the Azure environment and have elastic scaling capability, those resources can be scaled upwards or downwards independently.
-- Azure Synapse can be paused or resized as required to reduce resource utilization and cost.
+- You can pause or resize Azure Synapse as needed to reduce resource utilization and cost.
-- Upgrading a Teradata configuration is a major task involving additional physical hardware and potentially lengthy database reconfiguration or reload.
+- Upgrading a Teradata configuration is a major task involving extra physical hardware and potentially lengthy database reconfiguration or reload.
Microsoft Azure is a globally available, highly secure, scalable cloud environment that includes Azure Synapse and an ecosystem of supporting tools and capabilities. The next diagram summarizes the Azure Synapse ecosystem. :::image type="content" source="../media/1-design-performance-migration/azure-synapse-ecosystem.png" border="true" alt-text="Chart showing the Azure Synapse ecosystem of supporting tools and capabilities.":::
-> [!TIP]
-> Azure Synapse gives best-of-breed performance and price-performance in independent benchmarks.
-
-Azure Synapse provides best-of-breed relational database performance by using techniques such as massively parallel processing (MPP) and multiple levels of automated caching for frequently used data. See the results of this approach in independent benchmarks such as the one run recently by [GigaOm](https://research.gigaom.com/report/data-warehouse-cloud-benchmark/), which compares Azure Synapse to other popular cloud data warehouse offerings. Customers who have migrated to this environment have seen many benefits including:
+Azure Synapse provides best-of-breed relational database performance by using techniques such as MPP and multiple levels of automated caching for frequently used data. You can see the results of these techniques in independent benchmarks such as the one run recently by [GigaOm](https://research.gigaom.com/report/data-warehouse-cloud-benchmark/), which compares Azure Synapse to other popular cloud data warehouse offerings. Customers who migrate to the Azure Synapse environment see many benefits, including:
- Improved performance and price/performance.
Azure Synapse provides best-of-breed relational database performance by using te
- Lower overall TCO, better cost control, and streamlined operational expenditure (OPEX).
-To maximize these benefits, migrate new or existing data and applications to the Azure Synapse platform. In many organizations, this will include migrating an existing data warehouse from legacy on-premises platforms such as Teradata. At a high level, the basic process includes these steps:
--
-This paper looks at schema migration with a goal of equivalent or better performance of your migrated Teradata data warehouse and data marts on Azure Synapse. This paper applies specifically to migrations from an existing Teradata environment.
+To maximize these benefits, migrate new or existing data and applications to the Azure Synapse platform. In many organizations, migration includes moving an existing data warehouse from a legacy on-premises platform, such as Teradata, to Azure Synapse. At a high level, the migration process includes these steps:
+
+ :::column span="":::
+ &#160;&#160;&#160; **Preparation** &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#129094;
+
+ - Define scope&mdash;what is to be migrated.
+
+ - Build inventory of data and processes for migration.
+
+ - Define data model changes (if any).
+
+ - Define source data extract mechanism.
+
+ - Identify the appropriate Azure and third-party tools and features to be used.
+
+ - Train staff early on the new platform.
+
+ - Set up the Azure target platform.
+
+ :::column-end:::
+ :::column span="":::
+ &#160;&#160;&#160; **Migration** &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#129094;
+
+ - Start small and simple.
+
+ - Automate wherever possible.
+
+ - Leverage Azure built-in tools and features to reduce migration effort.
+
+ - Migrate metadata for tables and views.
+
+ - Migrate historical data to be maintained.
+
+ - Migrate or refactor stored procedures and business processes.
+
+ - Migrate or refactor ETL/ELT incremental load processes.
+
+ :::column-end:::
+ :::column span="":::
+ &#160;&#160;&#160; **Post migration**
+
+ - Monitor and document all stages of the process.
+
+ - Use the experience gained to build a template for future migrations.
+
+ - Re-engineer the data model if required (using new platform performance and scalability).
+
+ - Test applications and query tools.
+
+ - Benchmark and optimize query performance.
+
+ :::column-end:::
+
+This article provides general information and guidelines for performance optimization when migrating a data warehouse from an existing Netezza environment to Azure Synapse. The goal of performance optimization is to achieve the same or better data warehouse performance in Azure Synapse after schema migration.
## Design considerations ### Migration scope
-> [!TIP]
-> Create an inventory of objects to be migrated and document the migration process.
-
-#### Preparation for migration
-
-When migrating from a Teradata environment, there are some specific topics to consider in addition to the more general subjects described in this article.
+When you're preparing to migrate from a Teradata environment, consider the following migration choices.
#### Choose the workload for the initial migration
-Legacy Teradata environments have typically evolved over time to encompass multiple subject areas and mixed workloads. When deciding where to start on an initial migration project, choose an area that can:
+Typically, legacy Teradata environments have evolved over time to encompass multiple subject areas and mixed workloads. When you're deciding where to start on a migration project, choose an area where you'll be able to:
- Prove the viability of migrating to Azure Synapse by quickly delivering the benefits of the new environment. -- Allow the in-house technical staff to gain relevant experience of the processes and tools involved, which can be used in migrations to other areas.
+- Allow your in-house technical staff to gain relevant experience with the processes and tools that they'll use when they migrate other areas.
+
+- Create a template for further migrations that's specific to the source Teradata environment and the current tools and processes that are already in place.
-- Create a template for further migrations specific to the source Teradata environment and the current tools and processes that are already in place.
+A good candidate for an initial migration from a Teradata environment supports the preceding items, and:
-A good candidate for an initial migration from the Teradata environment that would enable the preceding items is typically one that implements a BI/Analytics workload, rather than an online transaction processing (OLTP) workload, with a data model that can be migrated with minimal modification, normally a star or snowflake schema.
+- Implements a BI/Analytics workload rather than an online transaction processing (OLTP) workload.
-The migration data volume for the initial exercise should be large enough to demonstrate the capabilities and benefits of the Azure Synapse environment while quickly demonstrating the value&mdash;typically in the 1-10 TB range.
+- Has a data model, such as a star or snowflake schema, that can be migrated with minimal modification.
-To minimize the risk and reduce implementation time for the initial migration project, confine the scope of the migration to just the data marts, such as the OLAP DB part of a Teradata warehouse. However, this won't address the broader topics such as ETL migration and historical data migration. Address these topics in later phases of the project, once the migrated data mart layer is backfilled with the data and processes required to build them.
+>[!TIP]
+>Create an inventory of objects that need to be migrated, and document the migration process.
-#### Lift and shift as-is versus a phased approach incorporating changes
+The volume of migrated data in an initial migration should be large enough to demonstrate the capabilities and benefits of the Azure Synapse environment but not too large to quickly demonstrate value. A size in the 1-10 terabyte range is typical.
-> [!TIP]
-> "Lift and shift" is a good starting point, even if subsequent phases will implement changes to the data model.
+For your initial migration project, minimize the risk, effort, and migration time so you can quickly see the benefits of the Azure cloud environment, confine the scope of the migration to just the data marts, such as the OLAP DB part of a Teradata warehouse. Both the lift-and-shift and phased migration approaches limit the scope of the initial migration to just the data marts and don't address broader migration aspects, such as ETL migration and historical data migration. However, you can address those aspects in later phases of the project once the migrated data mart layer is backfilled with data and the required build processes.
-Whatever the drive and scope of the intended migration, there are&mdash;broadly speaking&mdash;two types of migration:
+<a id="lift-and-shift-as-is-versus-a-phased-approach-incorporating-changes"></a>
+#### Lift and shift migration vs. Phased approach
+
+In general, there are two types of migration regardless of the purpose and scope of the planned migration: lift and shift as-is and a phased approach that incorporates changes.
##### Lift and shift
-In this case, the existing data model&mdash;such as a star schema&mdash;is migrated unchanged to the new Azure Synapse platform. The emphasis is on minimizing risk and the migration time required by reducing the work needed to realize the benefits of moving to the Azure cloud environment.
+In a lift and shift migration, an existing data model, like a star schema, is migrated unchanged to the new Azure Synapse platform. This approach minimizes risk and migration time by reducing the work needed to realize the benefits of moving to the Azure cloud environment. Lift and shift migration is a good fit for these scenarios:
-This is a good fit for existing Teradata environments where a single data mart is being migrated, or where the data is already in a well-designed star or snowflake schema&mdash;or there are other pressures to move to a more modern cloud environment.
+- You have an existing Teradata environment with a single data mart to migrate, or
+- You have an existing Teradata environment with data that's already in a well-designed star or snowflake schema, or
+- You're under time and cost pressures to move to a modern cloud environment.
-##### Phased approach incorporating modifications
+>[!TIP]
+>Lift and shift is a good starting point, even if subsequent phases implement changes to the data model.
-In cases where a legacy warehouse has evolved over a long time, you might need to re-engineer to maintain the required performance levels or to support new data, such as Internet of Things (IoT) streams. Migrate to Azure Synapse to get the benefits of a scalable cloud environment as part of the re-engineering process. Migration could include a change in the underlying data model, such as a move from an Inmon model to a data vault.
+<a id="phased-approach-incorporating-modifications"></a>
+##### Phased approach that incorporates changes
-Microsoft recommends moving the existing data model as-is to Azure (optionally using a VM Teradata instance in Azure) and using the performance and flexibility of the Azure environment to apply the re-engineering changes, leveraging Azure's capabilities to make the changes without impacting the existing source system.
+If a legacy data warehouse has evolved over a long period of time, you might need to re-engineer it to maintain the required performance levels. You might also have to re-engineer to support new data like Internet of Things (IoT) streams. As part of the re-engineering process, migrate to Azure Synapse to get the benefits of a scalable cloud environment. Migration can also include a change in the underlying data model, such as a move from an Inmon model to a data vault.
-#### Use an Azure VM Teradata instance as part of a migration
+Microsoft recommends moving your existing data model as-is to Azure (optionally using a VM Teradata instance in Azure) and using the performance and flexibility of the Azure environment to apply the re-engineering changes. That way, you can use Azure's capabilities to make the changes without impacting the existing source system.
-> [!TIP]
-> Use Azure VMs to create a temporary Teradata instance to speed up migration and minimize impact on the source system.
+#### Use an Azure VM Teradata instance as part of a migration
-When migrating from an on-premises Teradata environment, you can leverage the Azure environment. Azure provides cheap cloud storage and elastic scalability to create a Teradata instance within a VM in Azure, collocating with the target Azure Synapse environment.
+When migrating from an on-premises Teradata environment, you can leverage cloud storage and elastic scalability in Azure to create a Teradata instance within a VM. This approach collocates the Teradata instance with the target Azure Synapse environment. You can use standard Teradata utilities, such as Teradata Parallel Data Transporter, to efficiently move the subset of Teradata tables being migrated onto the VM instance. Then, all further migration tasks can occur within the Azure environment. This approach has several benefits:
-With this approach, standard Teradata utilities such as Teradata Parallel Data Transporter can efficiently move the subset of Teradata tables being migrated onto the VM instance. Then, all migration tasks can take place within the Azure environment. This approach has several benefits:
+- After the initial replication of data, the source system isn't affected by the migration tasks.
-- After the initial replication of data, the source system isn't impacted by the migration tasks.
+- Familiar Teradata interfaces, tools, and utilities are available within the Azure environment.
-- The familiar Teradata interfaces, tools, and utilities are available within the Azure environment.
+- The Azure environment sidesteps any potential issues with network bandwidth availability between the on-premises source system and the cloud target system.
-- Once in the Azure environment, there are no potential issues with network bandwidth availability between the on-premises source system and the cloud target system.
+- Tools like Azure Data Factory can call utilities like Teradata Parallel Transporter to efficiently and rapidly migrate data.
-- Tools like Azure Data Factory can efficiently call utilities like Teradata Parallel Transporter to migrate data quickly and easily.
+- You can orchestrate and control the migration process entirely within the Azure environment.
-- The migration process is orchestrated and controlled entirely within the Azure environment, keeping everything in a single place.
+>[!TIP]
+>Use Azure VMs to create a temporary Teradata instance to speed up migration and minimize impact on the source system.
#### Use Azure Data Factory to implement a metadata-driven migration
-Automate and orchestrate the migration process by using the capabilities of the Azure environment. This approach minimizes the impact on the existing Teradata environment, which may already be running close to full capacity.
+You can automate and orchestrate the migration process by using the capabilities of the Azure environment. This approach minimizes the performance hit on the existing Netezza environment, which may already be running close to capacity.
-Azure Data Factory is a cloud-based data integration service that allows creation of data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. Using Data Factory, you can create and schedule data-driven workflows&mdash;called pipelines&mdash;to ingest data from disparate data stores. Data Factory can process and transform data by using compute services such as Azure HDInsight Hadoop, Spark, Azure Data Lake Analytics, and Azure Machine Learning.
+[Azure Data Factory](../../../data-factory/introduction.md) is a cloud-based data integration service that supports creating data-driven workflows in the cloud that orchestrate and automate data movement and data transformation. You can use Data Factory to create and schedule data-driven workflows (pipelines) that ingest data from disparate data stores. Data Factory can process and transform data by using compute services such as [Azure HDInsight Hadoop](/azure/hdinsight/hadoop/apache-hadoop-introduction), Spark, Azure Data Lake Analytics, and Azure Machine Learning.
-By creating metadata to list the data tables to be migrated and their location, you can use the Data Factory facilities to manage the migration process.
+When you're planning to use Data Factory facilities to manage the migration process, create metadata that lists all the data tables to be migrated and their location.
### Design differences between Teradata and Azure Synapse
-#### Multiple databases versus a single database and schemas
+As mentioned earlier, there are some basic differences in approach between Teradata and Azure Synapse Analytics databases and these differences are discussed next.
-> [!TIP]
-> Combine multiple databases into a single database in Azure Synapse and use schemas to logically separate the tables.
+<a id="multiple-databases-versus-a-single-database-and-schemas"></a>
+#### Multiple databases vs. a single database and schemas
-In a Teradata environment, there are often multiple separate databases for individual parts of the overall environment. For example, there may be a separate database for data ingestion and staging tables, a database for the core warehouse tables, and another database for data marts, sometimes called a semantic layer. Processing these as ETL/ELT pipelines may implement cross-database joins and will move data between these separate databases.
+The Netezza environment often contains multiple separate databases. For instance, there could be separate databases for: data ingestion and staging tables, core warehouse tables, and data marts (sometimes referred to as the semantic layer). ETL or ELT pipeline processes might implement cross-database joins and move data between the separate databases.
-Querying within the Azure Synapse environment is limited to a single database. Schemas are used to separate the tables into logically separate groups. Therefore, we recommend using a series of schemas within the target Azure Synapse database to mimic any separate databases migrated from the Teradata environment. If the Teradata environment already uses schemas, you may need to use a new naming convention to move the existing Teradata tables and views to the new environment&mdash;for example, concatenate the existing Teradata schema and table names into the new Azure Synapse table name and use schema names in the new environment to maintain the original separate database names. Schema consolidation naming can have dots&mdash;however, Azure Synapse Spark may have issues. You can use SQL views over the underlying tables to maintain the logical structures, but there are some potential downsides to this approach:
+In contrast, the Azure Synapse environment contains a single database and uses schemas to separate tables into logically separate groups. We recommend that you use a series of schemas within the target Azure Synapse database to mimic the separate databases migrated from the Teradata environment. If the Teradata environment already uses schemas, you may need to use a new naming convention when you move the existing Teradata tables and views to the new environment. For example, you could concatenate the existing Teradata schema and table names into the new Azure Synapse table name, and use schema names in the new environment to maintain the original separate database names. If schema consolidation naming has dots, Azure Synapse Spark might have issues. Although you can use SQL views on top of the underlying tables to maintain the logical structures, there are potential downsides to that approach:
- Views in Azure Synapse are read-only, so any updates to the data must take place on the underlying base tables. -- There may already be one or more layers of views in existence, and adding an extra layer of views might impact performance and supportability as nested views are difficult to troubleshoot.
+- There may already be one or more layers of views in existence and adding an extra layer of views could affect performance and supportability because nested views are difficult to troubleshoot.
-#### Table considerations
+>[!TIP]
+>Combine multiple databases into a single database within Azure Synapse and use schema names to logically separate the tables.
-> [!TIP]
-> Use existing indexes to indicate candidates for indexing in the migrated warehouse.
+#### Table considerations
-When migrating tables between different technologies, only the raw data and the metadata that describes it gets physically moved between the two environments. Other database elements from the source system&mdash;such as indexes&mdash;aren't migrated as these may not be needed or may be implemented differently within the new target environment.
+When you migrate tables between different environments, typically only the raw data and the metadata that describes it physically migrate. Other database elements from the source system, such as indexes, usually aren't migrated because they might be unnecessary or implemented differently in the new environment.
+Performance optimizations in the source environment, such as indexes, indicate where you might add performance optimization in the new environment. For example, if a table within the source Teradata environment has a non-unique secondary index (NUSI), that suggests that a non-clustered index should be created within Azure Synapse. Other native performance optimization techniques like table replication may be more applicable than straight like-for-like index creation.
-However, it's important to understand where performance optimizations such as indexes have been used in the source environment, as this can indicate where to add performance optimization in the new target environment. For example, if a non-unique secondary index (NUSI) has been created within the source Teradata environment, it may indicate that a non-clustered index should be created within the migrated Azure Synapse. Other native performance optimization techniques, such as table replication, may be more applicable than a straight "like-for-like" index creation.
+>[!TIP]
+>Existing indexes indicate candidates for indexing in the migrated warehouse.
#### High availability for the database
-Teradata supports data replication across nodes via the `FALLBACK` option, where table rows that reside physically on a given node are replicated to another node within the system. This approach guarantees that data won't be lost if there's a node failure and provides the basis for failover scenarios.
+Teradata supports data replication across nodes via the `FALLBACK` option, which replicates table rows that reside physically on a given node to another node within the system. This approach guarantees that data won't be lost if there's a node failure and provides the basis for failover scenarios.
-The goal of the high availability architecture in Azure SQL Database is to guarantee that your database is up and running 99.9% of the time, without worrying about the impact of maintenance operations and outages. Azure automatically handles critical servicing tasks such as patching, backups, and Windows and SQL upgrades, as well as unplanned events such as underlying hardware, software, or network failures.
+The goal of the high availability architecture in Azure Synapse Analytics is to guarantee that your database is up and running 99.9% of the time, without worrying about the impact of maintenance operations and outages. For more information on the SLA, see [SLA for Azure Synapse Analytics](https://azure.microsoft.com/support/legal/sla/synapse-analytics/v1_1/). Azure automatically handles critical servicing tasks such as patching, backups, and Windows and SQL upgrades. Azure also automatically handles unplanned events such as failures in the underlying hardware, software, or network.
-Data storage in Azure Synapse is automatically [backed up](../../sql-data-warehouse/backup-and-restore.md) with snapshots. These snapshots are a built-in feature of the service that creates restore points. You don't have to enable this capability. Users can't currently delete automatic restore points where the service uses these restore points to maintain SLAs for recovery.
+Data storage in Azure Synapse is automatically [backed up](../../sql-data-warehouse/backup-and-restore.md) with snapshots. These snapshots are a built-in feature of the service that creates restore points. You don't have to enable this capability. Users can't currently delete automatic restore points that the service uses to maintain service level agreements (SLAs) for recovery.
-Azure Synapse Dedicated SQL pool takes snapshots of the data warehouse throughout the day creating restore points that are available for seven days. This retention period can't be changed. Azure Synapse supports an eight-hour recovery point objective (RPO). You can restore your data warehouse in the primary region from any one of the snapshots taken in the past seven days. If you require more granular backups, other user-defined options are available.
+Azure Synapse Dedicated SQL pool takes snapshots of the data warehouse throughout the day to create restore points that are available for seven days. This retention period can't be changed. Azure Synapse supports an eight-hour recovery point objective (RPO). You can restore your data warehouse in the primary region from any one of the snapshots taken in the past seven days. If you require more granular backups, you can use another user-defined option.
#### Unsupported Teradata table types
-> [!TIP]
-> Standard tables in Azure Synapse can support migrated Teradata time-series and temporal data.
+Teradata supports special table types for time-series and temporal data. The syntax and some of the functions for these table types aren't directly supported in Azure Synapse. However, you can migrate the data into a standard table in Azure Synapse by mapping to appropriate data types and indexing or partitioning the date/time column.
-Teradata supports special table types for time-series and temporal data. The syntax and some of the functions for these table types aren't directly supported in Azure Synapse, but the data can be migrated into a standard table with appropriate data types and indexing or partitioning on the date/time column.
+>[!TIP]
+>Standard tables in Azure Synapse can support migrated Teradata time-series and temporal data.
-Teradata implements the temporal query functionality via query rewriting to add additional filters within a temporal query to limit the applicable date range. If this functionality is currently used in the source Teradata environment and is to be migrated, add this additional filtering into the relevant temporal queries.
+Teradata implements temporal query functionality by using query rewriting to add additional filters within a temporal query to limit the applicable date range. If you plan to migrate this functionality from the source Teradata environment, then add the additional filtering into the relevant temporal queries.
-The Azure environment also includes specific features for complex analytics on time-series data at a scale called [time series insights](https://azure.microsoft.com/services/time-series-insights/). This is aimed at IoT data analysis applications and may be more appropriate for this use case.
+The Azure environment supports [time series insights](https://azure.microsoft.com/services/time-series-insights) for complex analytics on time-series data at scale. This functionality is aimed at IoT data analysis applications.
#### SQL DML syntax differences
-There are a few differences in SQL Data Manipulation Language (DML) syntax between Teradata SQL and Azure Synapse (T-SQL) that you should be aware of during migration:
+SQL Data Manipulation Language (DML) [syntax differences](5-minimize-sql-issues.md#sql-ddl-differences-between-teradata-and-azure-synapse) exist between Teradata SQL and Azure Synapse T-SQL:
- `QUALIFY`: Teradata supports the `QUALIFY` operator. For example:
There are a few differences in SQL Data Manipulation Language (DML) syntax betwe
) WHERE rn = 1; ``` -- Date arithmetic: Azure Synapse has operators such as `DATEADD` and `DATEDIFF` which can be used on `DATE` or `DATETIME` fields. Teradata supports direct subtraction on dates such as `SELECT DATE1 - DATE2 FROM...`
+- Date arithmetic: Azure Synapse has operators such as `DATEADD` and `DATEDIFF`, which can be used on `DATE` or `DATETIME` fields. Teradata supports direct subtraction on dates such as `SELECT DATE1 - DATE2 FROM...`
-- In `GROUP BY` ordinal, explicitly provide the T-SQL column name.
+- `GROUP BY`: for the `GROUP BY` ordinal, explicitly provide a T-SQL column name.
- `LIKE ANY`: Teradata supports `LIKE ANY` syntax such as:
There are a few differences in SQL Data Manipulation Language (DML) syntax betwe
#### Functions, stored procedures, triggers, and sequences
-> [!TIP]
-> Assess the number and type of non-data objects to be migrated as part of the preparation phase.
+When migrating a data warehouse from a mature environment like Teradata, you probably need to migrate elements other than simple tables and views. Examples include functions, stored procedures, triggers, and sequences. Check whether tools within the Azure environment can replace the functionality of functions, stored procedures, and sequences because it's usually more efficient to use built-in Azure tools than to recode those elements for Azure Synapse.
-When migrating from a mature legacy data warehouse environment such as Teradata, you must often migrate elements other than simple tables and views to the new target environment. Examples include functions, stored procedures, triggers, and sequences.
+As part of your preparation phase, create an inventory of objects that need to be migrated, define a method for handling them, and allocate appropriate resources in your migration plan.
-As part of the preparation phase, create an inventory of these objects to be migrated, and define the method of handling them. Assign an appropriate allocation of resources in the project plan.
+[Data integration partners](../../partner/data-integration.md) offer tools and services that can automate the migration of functions, stored procedures, and sequences.
-There may be facilities in the Azure environment that replace the functionality implemented as functions or stored procedures in the Teradata environment. In this case, it's more efficient to use the built-in Azure facilities rather than recoding the Teradata functions.
-
-[Data integration partners](../../partner/data-integration.md) offer tools and services that can automate the migration.
+The following sections further discuss the migration of functions, stored procedures, and sequences.
##### Functions
-As with most database products, Teradata supports system functions and user-defined functions within an SQL implementation. When migrating to another database platform such as Azure Synapse, common system functions are available and can be migrated without change. Some system functions may have slightly different syntax, but the required changes can be automated if so.
+As with most database products, Teradata supports system and user-defined functions within a SQL implementation. When you migrate a legacy database platform to Azure Synapse, common system functions can usually be migrated without change. Some system functions might have a slightly different syntax, but any required changes can be automated.
-For system functions where there's no equivalent, or for arbitrary user-defined functions, recode these using the language(s) available in the target environment. Azure Synapse uses the popular Transact-SQL language to implement user-defined functions.
+For Teradata system functions or arbitrary user-defined functions that have no equivalent in Azure Synapse, recode those functions using a target environment language. Azure Synapse uses the Transact-SQL language to implement user-defined functions.
##### Stored procedures
-Most modern database products allow for procedures to be stored within the database. Teradata provides the SPL language for this purpose.
-
-A stored procedure typically contains SQL statements and some procedural logic, and may return data or a status.
+Most modern database products support storing procedures within the database. Teradata provides the SPL language for this purpose. A stored procedure typically contains both SQL statements and procedural logic, and returns data or a status.
-Azure Synapse Analytics also supports stored procedures using T-SQL. If you must migrate stored procedures, recode these procedures for their new environment.
+Azure Synapse supports stored procedures using T-SQL, so you need to recode any migrated stored procedures in that language.
##### Triggers
-Azure Synapse doesn't support trigger creation, but trigger creation can be implemented with Azure Data Factory.
+Azure Synapse doesn't support trigger creation, but trigger creation can be implemented using Azure Data Factory.
##### Sequences
-With Azure Synapse, sequences are handled in a similar way to Teradata. Use [IDENTITY](/sql/t-sql/statements/create-table-transact-sql-identity-property?msclkid=8ab663accfd311ec87a587f5923eaa7b) columns or SQL code to create the next sequence number in a series.
+Azure Synapse handles sequences in a similar way to Teradata, and you can implement sequences using [IDENTITY](/sql/t-sql/statements/create-table-transact-sql-identity-property) columns or SQL code that generates the next sequence number in a series. A sequence provides unique numeric values that you can use as surrogate key values for primary keys.
### Extract metadata and data from a Teradata environment #### Data Definition Language (DDL) generation
-> [!TIP]
-> Use existing Teradata metadata to automate the generation of CREATE TABLE and CREATE VIEW DDL for Azure Synapse Analytics.
+The ANSI SQL standard defines the basic syntax for Data Definition Language (DDL) commands. Some DDL commands, such as `CREATE TABLE` and `CREATE VIEW`, are common to both Teradata and Azure Synapse but also provide implementation-specific features such as indexing, table distribution, and partitioning options.
-You can edit existing Teradata `CREATE TABLE` and `CREATE VIEW` scripts to create the equivalent definitions with modified data types, if necessary, as described in the previous section. Typically, this involves removing extra Teradata-specific clauses such as `FALLBACK`.
+You can edit existing Teradata `CREATE TABLE` and `CREATE VIEW` scripts to achieve equivalent definitions in Azure Synapse. To do so, you might need to use [modified data types](5-minimize-sql-issues.md#unsupported-teradata-data-types) and remove or modify Teradata-specific clauses such as `FALLBACK`.
However, all the information that specifies the current definitions of tables and views within the existing Teradata environment is maintained within system catalog tables. These tables are the best source of this information, as it's guaranteed to be up to date and complete. User-maintained documentation may not be in sync with the current table definitions.
-Access the information in these tables via views into the catalog such as `DBC.ColumnsV`, and generate the equivalent `CREATE TABLE` DDL statements for the equivalent tables in Azure Synapse.
+Within the Teradata environment, system catalog tables specify the current table and view definition. Unlike user-maintained documentation, system catalog information is always complete and in sync with current table definitions. By using views into the catalog such as `DBC.ColumnsV`, you can access system catalog information to generate `CREATE TABLE` DDL statements that create equivalent tables in Azure Synapse.
+
+>[!TIP]
+>Use existing Teradata metadata to automate the generation of `CREATE TABLE` and `CREATE VIEW` DDL for Azure Synapse.
-Third-party migration and ETL tools also use the catalog information to achieve the same result.
+You can also use [third-party](../../partner/data-integration.md) migration and ETL tools that process system catalog information to achieve similar results.
#### Data extraction from Teradata
-> [!TIP]
-> Use Teradata Parallel Transporter for most efficient data extract.
+You can extract raw table data from Teradata tables to flat delimited files, such as CSV files, using standard Teradata utilities like Basic Teradata Query (BTEQ), Teradata FastExport, or Teradata Parallel Transporter (TPT). Use TPT to extract table data as efficiently as possible. TPT uses multiple parallel FastExport streams to achieve the highest throughput.
-Migrate the raw data from existing Teradata tables using standard Teradata utilities, such as BTEQ and FASTEXPORT. During a migration exercise, extract the data as efficiently as possible. Use Teradata Parallel Transporter, which uses multiple parallel FASTEXPORT streams to achieve the best throughput.
+>[!TIP]
+>Use Teradata Parallel Transporter for the most efficient data extract.
-Call Teradata Parallel Transporter directly from Azure Data Factory. This is the recommended approach for managing the data migration process whether the Teradata instance in on-premises or copied to a VM in the Azure environment, as described in the previous section.
+Call TPT directly from Azure Data Factory. This is the recommended approach for data migration of Teradata on-premises instances and [Teradata instances that run within a VM](#use-an-azure-vm-teradata-instance-as-part-of-a-migration) in the Azure environment.
-Recommended data formats for the extracted data include delimited text files (also called Comma Separated Values or CSV), Optimized Row Columnar (ORC), or Parquet files.
+Extracted data files should contain delimited text in CSV, Optimized Row Columnar (ORC), or Parquet format.
-For more information about the process of migrating data and ETL from a Teradata environment, see [Data migration, ETL, and load for Teradata migrations](2-etl-load-migration-considerations.md).
+For more information about migrating data and ETL from a Teradata environment, see [Data migration, ETL, and load for Teradata migrations](2-etl-load-migration-considerations.md).
## Performance recommendations for Teradata migrations
-This article provides general information and guidelines about use of performance optimization techniques for Azure Synapse and adds specific recommendations for use when migrating from a Teradata environment.
+The goal of performance optimization is same or better data warehouse performance after migration to Azure Synapse.
-### Differences in performance tuning approach
-> [!TIP]
-> Prioritize early familiarity with Azure Synapse tuning options in a migration exercise.
+>[!TIP]
+>Prioritize familiarity with the tuning options in Azure Synapse at the start of a migration.
+
+### Differences in performance tuning approach
-This section highlights lower-level implementation differences between Teradata and Azure Synapse for performance tuning.
+This section highlights low-level performance tuning implementation differences between Teradata and Azure Synapse.
#### Data distribution options
-Azure enables the specification of data distribution methods for individual tables. The aim is to reduce the amount of data that must be moved between processing nodes when executing a query.
+For performance, Azure Synapse was designed with multi-node architecture and uses parallel processing. To optimize individual table performance in Azure Synapse, you can define a data distribution option in `CREATE TABLE` statements using the `DISTRIBUTION` statement. For example, you can specify a hash-distributed table, which distributes table rows across compute nodes by using a deterministic hash function. The aim is to reduce the amount of data moved between processing nodes when executing a query.
-For large table-large table joins, hash distribute one or, ideally, both tables on one of the join columns&mdash;which has a wide range of values to help ensure an even distribution. Perform join processing locally, as the data rows to be joined will already be collocated on the same processing node.
+For large table to large table joins, hash distribute one or, ideally, both tables on one of the join columns&mdash;which has a wide range of values to help ensure an even distribution. Perform join processing locally because the data rows that will be joined are collocated on the same processing node.
-Another way to achieve local joins for small table-large table joins&mdash;typically dimension table to fact table in a star schema model&mdash;is to replicate the smaller dimension table across all nodes. This ensures that any value of the join key of the larger table will have a matching dimension row locally available. The overhead of replicating the dimension tables is relatively low, provided the tables aren't very large (see [Design guidance for replicated tables](../../sql-data-warehouse/design-guidance-for-replicated-tables.md))&mdash;in which case, the hash distribution approach as previously described is more appropriate. For more information, see [Distributed tables design](../../sql-data-warehouse/sql-data-warehouse-tables-distribute.md).
+Azure Synapse also supports local joins between a small table and a large table through small table replication. For instance, consider a small dimension table and a large fact table within a star schema model. Azure Synapse can replicate the smaller dimension table across all nodes to ensure that the value of any join key for the large table has a matching, locally available dimension row. The overhead of dimension table replication is relatively low for a small dimension table. For large dimension tables, a hash distribution approach is more appropriate. For more information on data distribution options, see [Design guidance for using replicated tables](../../sql-data-warehouse/design-guidance-for-replicated-tables.md) and [Guidance for designing distributed tables](../../sql-data-warehouse/sql-data-warehouse-tables-distribute.md).
#### Data indexing
-Azure Synapse provides several indexing options, but these are different from the indexing options implemented in Teradata. For more information about the different indexing options, see [table indexes](/azure/sql-data-warehouse/sql-data-warehouse-tables-index).
+Azure Synapse supports several user-definable indexing options that are different from the indexing options implemented in Teradata. For more information about the different indexing options in Azure Synapse, see [Indexes on dedicated SQL pool tables](../../sql-data-warehouse/sql-data-warehouse-tables-index.md).
-Existing indexes within the source Teradata environment can however provide a useful indication of how the data is currently used. They can identify candidates for indexing within the Azure Synapse environment.
+Existing indexes within the source Teradata environment provide a useful indication of data usage and the candidate columns for indexing in the Azure Synapse environment.
#### Data partitioning
-In an enterprise data warehouse, fact tables can contain many billions of rows. Partitioning optimizes the maintenance and querying of these tables by splitting them into separate parts to reduce the amount of data processed. The `CREATE TABLE` statement defines the partitioning specification for a table. Partitioning should only be done on very large tables where each partition will contain at least 60 million rows.
+In an enterprise data warehouse, fact tables can contain billions of rows. Partitioning optimizes the maintenance and query performance of these tables by splitting them into separate parts to reduce the amount of data processed. In Azure Synapse, the `CREATE TABLE` statement defines the partitioning specification for a table. Only partition very large tables and ensure each partition contains at least 60 million rows.
-Only one field per table can be used for partitioning. That field is frequently a date field since many queries are filtered by date or a date range. It's possible to change the partitioning of a table after initial load by recreating the table with the new distribution using the `CREATE TABLE AS` (or CTAS) statement. See [table partitions](/azure/sql-data-warehouse/sql-data-warehouse-tables-partition) for a detailed discussion of partitioning in Azure Synapse.
+You can only use one field per table for partitioning. That field is often a date field because many queries are filtered by date or date range. It's possible to change the partitioning of a table after initial load by using the `CREATE TABLE AS` (CTAS) statement to recreate the table with a new distribution. For a detailed discussion of partitioning in Azure Synapse, see [Partitioning tables in dedicated SQL pool](/azure/sql-data-warehouse/sql-data-warehouse-tables-partition).
#### Data table statistics
-Ensure that statistics on data tables are up to date by building in a [statistics](../../sql/develop-tables-statistics.md) step to ETL/ELT jobs.
+You should ensure that statistics on data tables are up to date by building in a [statistics](../../sql/develop-tables-statistics.md) step to ETL/ELT jobs.
+
+<a id="polybase-for-data-loading"></a>
+#### PolyBase or COPY INTO for data loading
+
+[PolyBase](/sql/relational-databases/polybase) supports efficient loading of large amounts of data to a data warehouse by using parallel loading streams. For more information, see [PolyBase data loading strategy](../../sql/load-data-overview.md).
+
+[COPY INTO](/sql/t-sql/statements/copy-into-transact-sql) also supports high-throughput data ingestion, and:
+
+- Data retrieval from all files within a folder and subfolders.
+
+- Data retrieval from multiple locations in the same storage account. You can specify multiple locations by using comma separated paths.
-#### PolyBase for data loading
+- [Azure Data Lake Storage](../../../storage/blobs/data-lake-storage-introduction.md) (ADLS) and Azure Blob Storage.
-PolyBase is the most efficient method for loading large amounts of data into the warehouse since it can leverage parallel loading streams. For more information, see [PolyBase data loading strategy](../../sql/load-data-overview.md).
+- CSV, PARQUET, and ORC file formats.
#### Use workload management
-Use [workload management](../../sql-data-warehouse/sql-data-warehouse-workload-management.md?context=%2fazure%2fsynapse-analytics%2fcontext%2fcontext) instead of resource classes. ETL would be in its own workgroup and should be configured to have more resources per query (less concurrency by more resources). For more information, see [What is dedicated SQL pool in Azure Synapse Analytics](../../sql-data-warehouse/sql-data-warehouse-overview-what-is.md).
+Azure Synapse uses resource classes to manage workloads. In general, large resource classes provide better individual query performance, while smaller resource classes provide higher levels of concurrency. You can monitor utilization using Dynamic Management Views (DMVs) to ensure that the applicable resources are being efficiently utilized.
## Next steps
-To learn more about ETL and load for Teradata migration, see the next article in this series: [Data migration, ETL, and load for Teradata migrations](2-etl-load-migration-considerations.md).
+To learn about ETL and load for Teradata migration, see the next article in this series: [Data migration, ETL, and load for Teradata migrations](2-etl-load-migration-considerations.md).
synapse-analytics 2 Etl Load Migration Considerations https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/teradata/2-etl-load-migration-considerations.md
Previously updated : 05/31/2022 Last updated : 06/01/2022 # Data migration, ETL, and load for Teradata migrations
-This article is part two of a seven part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. This article provides best practices for ETL and load migration.
+This article is part two of a seven-part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. The focus of this article is best practices for ETL and load migration.
## Data migration considerations
If logging is enabled and the log history is accessible, other information, such
#### What is the best migration approach to minimize risk and impact on users?
-> [!TIP]
-> Migrate the existing model as-is initially, even if a change to the data model is planned in the future.
+This question comes up frequently because companies may want to lower the impact of changes on the data warehouse data model to improve agility. Companies often see an opportunity to further modernize or transform their data during an ETL migration. This approach carries a higher risk because it changes multiple factors simultaneously, making it difficult to compare the outcomes of the old system versus the new. Making data model changes here could also affect upstream or downstream ETL jobs to other systems. Because of that risk, it's better to redesign on this scale after the data warehouse migration.
-This question comes up often since companies often want to lower the impact of changes on the data warehouse data model to improve agility. Companies see an opportunity to do so during a migration to modernize their data model. This approach carries a higher risk because it could impact ETL jobs populating the data warehouse from a data warehouse to feed dependent data marts. Because of that risk, it's usually better to redesign on this scale after the data warehouse migration.
+Even if a data model is intentionally changed as part of the overall migration, it's good practice to migrate the existing model as-is to Azure Synapse, rather than do any re-engineering on the new platform. This approach minimizes the effect on existing production systems, while benefiting from the performance and elastic scalability of the Azure platform for one-off re-engineering tasks.
-Even if a data model change is an intended part of the overall migration, it's good practice to migrate the existing model as-is to the new environment (Azure Synapse Analytics in this case), rather than do any re-engineering on the new platform during migration. This approach has the advantage of minimizing the impact on existing production systems, while also leveraging the performance and elastic scalability of the Azure platform for one-off re-engineering tasks.
+When migrating from Teradata, consider creating a Teradata environment in a VM within Azure as a stepping-stone in the migration process.
-When migrating from Teradata, consider creating a Teradata environment in a VM within Azure as a stepping stone in the migration process.
+>[!TIP]
+>Migrate the existing model as-is initially, even if a change to the data model is planned in the future.
#### Use a VM Teradata instance as part of a migration
synapse-analytics 3 Security Access Operations https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/teradata/3-security-access-operations.md
Previously updated : 05/31/2022 Last updated : 06/01/2022 # Security, access, and operations for Teradata migrations
-This article is part three of a seven part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. This article provides best practices for security access operations.
+This article is part three of a seven-part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. The focus of this article is best practices for security access operations.
## Security considerations
synapse-analytics 4 Visualization Reporting https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/teradata/4-visualization-reporting.md
Previously updated : 05/31/2022 Last updated : 07/12/2022 # Visualization and reporting for Teradata migrations
-This article is part four of a seven part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. This article provides best practices for visualization and reporting.
+This article is part four of a seven-part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. The focus of this article is best practices for visualization and reporting.
## Access Azure Synapse Analytics using Microsoft and third-party BI tools
-Almost every organization accesses data warehouses and data marts using a range of BI tools and applications, such as:
+Organizations access data warehouses and data marts using a range of business intelligence (BI) tools and applications. Some examples of BI products are:
-- Microsoft BI tools, like Power BI.
+- Microsoft BI tools, such as Power BI.
-- Office applications, like Microsoft Excel spreadsheets.
+- Office applications, such as Microsoft Excel spreadsheets.
-- Third-party BI tools from various vendors.
+- Third-party BI tools from different vendors.
-- Custom analytic applications that have embedded BI tool functionality inside the application.
+- Custom analytics applications with embedded BI tool functionality.
-- Operational applications that request BI on demand, invoke queries and reports as-a-service on a BI platform, which in turn queries data in the data warehouse or data marts that are being migrated.
+- Operational applications that support on-demand BI by running queries and reports on a BI platform that in turn queries data in a data warehouse or data mart.
- Interactive data science development tools, such as Azure Synapse Spark Notebooks, Azure Machine Learning, RStudio, and Jupyter Notebooks.
-The migration of visualization and reporting as part of a data warehouse migration program means that all the existing queries, reports, and dashboards generated and issued by these tools and applications need to run on Azure Synapse and yield the same results as they did in the original data warehouse prior to migration.
+If you migrate visualization and reporting as part of your data warehouse migration, all existing queries, reports, and dashboards generated by BI products need to run in the new environment. Your BI products must yield the same results on Azure Synapse as they did in your legacy data warehouse environment.
-> [!TIP]
-> Existing users, user groups, roles and assignments of access security privileges need to be migrated first for migration of reports and visualizations to succeed.
+For consistent results after migration, all BI tools and application dependencies must work after you've migrated your data warehouse schema and data to Azure Synapse. The dependencies include less visible aspects, such as access and security. When you address access and security, ensure that you migrate:
-To make that happen, everything that BI tools and applications depend on still needs to work once you migrate your data warehouse schema and data to Azure Synapse. That includes the obvious and the not so obvious&mdash;such as access and security. Access and security are important considerations for data access in the migrated system, and are specifically discussed in [another guide](3-security-access-operations.md) in this series. When you address access and security, ensure that:
+- Authentication so users can sign into the data warehouse and data mart databases on Azure Synapse.
-- Authentication is migrated to let users sign in to the data warehouse and data mart databases on Azure Synapse.
+- All users to Azure Synapse.
-- All users are migrated to Azure Synapse.
+- All user groups to Azure Synapse.
-- All user groups are migrated to Azure Synapse.
+- All roles to Azure Synapse.
-- All roles are migrated to Azure Synapse.
+- All authorization privileges governing access control to Azure Synapse.
-- All authorization privileges governing access control are migrated to Azure Synapse.--- User, role, and privilege assignments are migrated to mirror what you had on your existing data warehouse before migration. For example:
+- User, role, and privilege assignments to mirror what you had in your existing data warehouse before migration. For example:
- Database object privileges assigned to roles - Roles assigned to user groups - Users assigned to user groups and/or roles
-> [!TIP]
-> Communication and business user involvement is critical to success.
+Access and security are important considerations for data access in the migrated system and are discussed in more detail in [Security, access, and operations for Teradata migrations](3-security-access-operations.md).
+
+>[!TIP]
+>Existing users, user groups, roles, and assignments of access security privileges need to be migrated first for migration of reports and visualizations to succeed.
+
+Migrate all required data to ensure that the reports and dashboards that query data in the legacy environment produce the same results in Azure Synapse.
-In addition, all the required data needs to be migrated to ensure the same results appear in the same reports and dashboards that now query data on Azure Synapse. User expectation will undoubtedly be that migration is seamless and there will be no surprises that destroy their confidence in the migrated system on Azure Synapse. So, this is an area where you must take extreme care and communicate as much as possible to allay any fears in your user base. Their expectations are that:
+Business users will expect a seamless migration, with no surprises that destroy their confidence in the migrated system on Azure Synapse. Take care to allay any fears that your users might have through good communication. Your users will expect that:
-- Table structure will be the same if directly referred to in queries.
+- Table structure remains the same when directly referred to in queries.
-- Table and column names remain the same if directly referred to in queries; for instance, so that calculated fields defined on columns in BI tools don't fail when aggregate reports are produced.
+- Table and column names remain the same when directly referred to in queries. For instance, calculated fields defined on columns in BI tools shouldn't fail when aggregate reports are produced.
- Historical analysis remains the same. -- Data types should, if possible, remain the same.
+- Data types remain the same, if possible.
- Query behavior remains the same. -- ODBC/JDBC drivers are tested to make sure nothing has changed in terms of query behavior.
+- ODBC/JDBC drivers are tested to ensure that query behavior remains the same.
-> [!TIP]
-> Views and SQL queries using proprietary SQL query extensions are likely to result in incompatibilities that impact BI reports and dashboards.
+>[!TIP]
+>Communication and business user involvement are critical to success.
-If BI tools are querying views in the underlying data warehouse or data mart database, then will these views still work? You might think yes, but if there are proprietary SQL extensions specific to your legacy data warehouse DBMS in these views that have no equivalent in Azure Synapse, you'll need to know about them and find a way to resolve them.
+If BI tools query views in the underlying data warehouse or data mart database, will those views still work after the migration? Some views might not work if there are proprietary SQL extensions specific to your legacy data warehouse DBMS that have no equivalent in Azure Synapse. If so, you need to know about those incompatibilities and find a way to resolve them.
-Other issues like the behavior of nulls or data type variations across DBMS platforms need to be tested, in case they cause slightly different calculation results. Obviously, you want to minimize these issues and take all necessary steps to shield business users from any kind of impact. Depending on your legacy data warehouse system (such as Teradata), there are [tools](../../partner/data-integration.md) that can help hide these differences so that BI tools and applications are kept unaware of them and can run unchanged.
+>[!TIP]
+>Views and SQL queries using proprietary SQL query extensions are likely to result in incompatibilities that impact BI reports and dashboards.
-> [!TIP]
-> Use repeatable tests to ensure reports, dashboards, and other visualizations migrate successfully.
+Other issues, like the behavior of `NULL` values or data type variations across DBMS platforms, need to be tested to ensure that even slight differences don't exist in calculation results. Minimize those issues and take all necessary steps to shield business users from being affected by them. Depending on your legacy data warehouse environment, [third-party](../../partner/data-integration.md) tools that can help hide the differences between the legacy and new environments so that BI tools and applications run unchanged.
-Testing is critical to visualization and report migration. You need a test suite and agreed-on test data to run and rerun tests in both environments. A test harness is also useful, and a few are mentioned later in this guide. In addition, it's also important to have significant business involvement in this area of migration to keep confidence high and to keep them engaged and part of the project.
+Testing is critical to visualization and report migration. You need a test suite and agreed-on test data to run and rerun tests in both environments. A test harness is also useful, and a few are mentioned in this guide. Also, it's important to involve business users in the testing aspect of the migration to keep confidence high and to keep them engaged and part of the project.
-Finally, you may also be thinking about switching BI tools. For example, you might want to [migrate to Power BI](/power-bi/guidance/powerbi-migration-overview). The temptation is to do all of this at the same time, while migrating your schema, data, ETL processing, and more. However, to minimize risk, it's better to migrate to Azure Synapse first and get everything working before undertaking further modernization.
+>[!TIP]
+>Use repeatable tests to ensure reports, dashboards, and other visualizations migrate successfully.
-If your existing BI tools run on premises, ensure that they're able to connect to Azure Synapse through your firewall to run comparisons against both environments. Alternatively, if the vendor of your existing BI tools offers their product on Azure, you can try it there. The same applies for applications running on premises that embed BI or that call your BI server on-demand, requesting a "headless report" with data returned in XML or JSON, for example.
+You might be thinking about switching BI tools, for example to [migrate to Power BI](/power-bi/guidance/powerbi-migration-overview). The temptation is to make such changes at the same time you're migrating your schema, data, ETL processing, and more. However, to minimize risk, it's better to migrate to Azure Synapse first and get everything working before undertaking further modernization.
-There's a lot to think about here, so let's look at all this in more detail.
+If your existing BI tools run on-premises, ensure they can connect to Azure Synapse through your firewall so you're able to run comparisons against both environments. Alternatively, if the vendor of your existing BI tools offers their product on Azure, you can try it there. The same applies for applications running on-premises that embed BI or call your BI server on demand, for example by requesting a "headless report" with XML or JSON data.
-> [!TIP]
-> A lift and shift data warehouse migration is likely to minimize any disruption to reports, dashboards, and other visualizations.
+There's a lot to think about here, so let's take a closer look.
-## Minimize the impact of data warehouse migration on BI tools and reports by using data virtualization
+## Use data virtualization to minimize the impact of migration on BI tools and reports
-> [!TIP]
-> Data virtualization allows you to shield business users from structural changes during migration so that they remain unaware of changes.
+During migration, you might be tempted to fulfill long-term requirements like opening business requests, adding missing data, or implementing new features. However, such changes can affect BI tool access to your data warehouse, especially if the change involves structural changes to your data model. If you want to adopt an agile data modeling technique or implement structural changes, do so *after* migration.
-The temptation during data warehouse migration to the cloud is to take the opportunity to make changes during the migration to fulfill long-term requirements, such as opening business requests, missing data, new features, and more. However, these changes can affect the BI tools accessing your data warehouse, especially if it involves structural changes in your data model. If you want to adopt an agile data modeling technique or implement structural changes, do so *after* migration.
-
-One way in which you can minimize the impact of things like schema changes on BI tools is to introduce data virtualization between BI tools and your data warehouse and data marts. The following diagram shows how data virtualization can hide the migration from users.
+One way to minimize the effect of schema changes or other structural changes on your BI tools is to introduce data virtualization between the BI tools and your data warehouse and data marts. The following diagram shows how data virtualization can hide a migration from users.
:::image type="content" source="../media/4-visualization-reporting/migration-data-virtualization.png" border="true" alt-text="Diagram showing how to hide the migration from users through data virtualization.":::
-This breaks the dependency between business users utilizing self-service BI tools and the physical schema of the underlying data warehouse and data marts that are being migrated.
-
-> [!TIP]
-> Schema alterations to tune your data model for Azure Synapse can be hidden from users.
+Data virtualization breaks the dependency between business users utilizing self-service BI tools and the physical schema of the underlying data warehouse and data marts that are being migrated.
-By introducing data virtualization, any schema alterations made during data warehouse and data mart migration to Azure Synapse (to optimize performance, for example) can be hidden from business users because they only access virtual tables in the data virtualization layer. If structural changes are needed, only the mappings between the data warehouse or data marts, and any virtual tables would need to be changed so that users remain unaware of those changes and unaware of the migration. [Microsoft partners](../../partner/data-integration.md) provide useful data virtualization software.
+>[!TIP]
+>Data virtualization allows you to shield business users from structural changes during migration so they remain unaware of those changes. Structural changes include schema alterations that tune your data model for Azure Synapse.
-## Identify high priority reports to migrate first
+With data virtualization, any schema alterations made during a migration to Azure Synapse, for example to optimize performance, can be hidden from business users because they only have access to virtual tables in the data virtualization layer. And, if you make structural changes, you only need to update the mappings between the data warehouse or data marts and any virtual tables. With data virtualization, users remain unaware of structural changes. [Microsoft partners](../../partner/data-integration.md) provide data virtualization software.
-A key question when migrating your existing reports and dashboards to Azure Synapse is which ones to migrate first. Several factors can drive the decision. For example:
+## Identify high-priority reports to migrate first
-- Business value
+A key question when migrating your existing reports and dashboards to Azure Synapse is which ones to migrate first. Several factors might drive that decision, such as:
- Usage
+- Business value
+ - Ease of migration - Data migration strategy
-These factors are discussed in more detail later in this article.
+The following sections discuss these factors.
-Whatever the decision is, it must involve the business, since they produce the reports and dashboards, and consume the insights these artifacts provide in support of the decisions that are made around your business. That said, if most reports and dashboards can be migrated seamlessly, with minimal effort, and offer up like-for-like results, simply by pointing your BI tool(s) at Azure Synapse, instead of your legacy data warehouse system, then everyone benefits.
+Whatever your decision, it must involve your business users because they produce the reports, dashboards, and other visualizations, and make business decisions based on insights from those items. Everyone benefits when you can:
+
+- Migrate reports and dashboards seamlessly,
+- Migrate reports and dashboards with minimal effort, and
+- Point your BI tool(s) at Azure Synapse instead of your legacy data warehouse system, and get like-for-like reports, dashboards, and other visualizations.
### Migrate reports based on usage
-Usage is interesting, since it's an indicator of business value. Reports and dashboards that are never used clearly aren't contributing to supporting any decisions and don't currently offer any value. So, do you have any mechanism for finding out which reports and dashboards are currently not used? Several BI tools provide statistics on usage, which would be an obvious place to start.
+Usage is often an indicator of business value. Unused reports and dashboards clearly don't contribute to business decisions or offer current value. If you don't have a way to find out which reports and dashboards are unused, you can use one of the several BI tools that provide usage statistics.
-If your legacy data warehouse has been up and running for many years, there's a high chance you could have hundreds, if not thousands, of reports in existence. In these situations, usage is an important indicator of the business value of a specific report or dashboard. In that sense, it's worth compiling an inventory of the reports and dashboards you have and defining their business purpose and usage statistics.
+If your legacy data warehouse has been up and running for years, there's a good chance you have hundreds, if not thousands, of reports in existence. It's worth compiling an inventory of reports and dashboards and identifying their business purpose and usage statistics.
-For those that aren't used at all, it's an appropriate time to seek a business decision, to determine if it's necessary to decommission those reports to optimize your migration efforts. A key question worth asking when deciding to decommission unused reports is: are they unused because people don't know they exist, or is it because they offer no business value, or have they been superseded by others?
+For unused reports, determine whether to decommission them to reduce your migration effort. A key question when deciding whether to decommission an unused report is whether the report is unused because people don't know it exists, because it offers no business value, or because it's been superseded by another report.
### Migrate reports based on business value
-Usage on its own isn't a clear indicator of business value. There needs to be a deeper business context to determine the value to the business. In an ideal world, we would like to know the contribution of the insights produced in a report to the bottom line of the business. That's exceedingly difficult to determine, since every decision made, and its dependency on the insights in a specific report, would need to be recorded along with the contribution that each decision makes to the bottom line of the business. You would also need to do this over time.
+Usage alone isn't always a good indicator of business value. You might want to consider the extent to which a report's insights contribute to business value. One way to do that is to evaluate the profitability of every business decision that relies on the report and the extent of the reliance. However, that information is unlikely to be readily available in most organizations.
-This level of detail is unlikely to be available in most organizations. One way in which you can get deeper on business value to drive migration order is to look at alignment with business strategy. A business strategy set by your executive typically lays out strategic business objectives, key performance indicators (KPIs), KPI targets that need to be achieved, and who is accountable for achieving them. In that sense, classifying your reports and dashboards by strategic business objectives&mdash;for example, reduce fraud, improve customer engagement, and optimize business operations&mdash;will help understand business purpose and show what objective(s), specific reports, and dashboards these are contributing to. Reports and dashboards associated with high priority objectives in the business strategy can then be highlighted so that migration is focused on delivering business value in a strategic high priority area.
+Another way to evaluate business value is to look at the alignment of a report with business strategy. The business strategy set by your executive typically lays out strategic business objectives (SBOs), key performance indicators (KPIs), KPI targets that need to be achieved, and who is accountable for achieving them. You can classify a report by which SBOs the report contributes to, such as fraud reduction, improved customer engagement, and optimized business operations. Then, you can prioritize for migration the reports and dashboards that are associated with high-priority objectives. In this way, the initial migration can deliver business value in a strategic area.
-It's also worthwhile to classify reports and dashboards as operational, tactical, or strategic, to understand the level in the business where they're used. Delivering strategic business objectives requires contribution at all these levels. Knowing which reports and dashboards are used, at what level, and what objectives they're associated with helps to focus migration on high priority business value that will drive the company forward. Business contribution of reports and dashboards is needed to understand this, perhaps like what is shown in the following **business strategy objective** table.
+Another way to evaluate business value is to classify reports and dashboards as operational, tactical, or strategic to identify at which business level they're used. SBOs require contributions at all these levels. By knowing which reports and dashboards are used, at what level, and what objectives they're associated with, you're able to focus the initial migration on high-priority business value. You can use the following *business strategy objective* table to evaluate reports and dashboards.
-| **Level** | **Report / dashboard name** | **Business purpose** | **Department used** | **Usage frequency** | **Business priority** |
+| Level | Report / dashboard name | Business purpose | Department used | Usage frequency | Business priority |
|-|-|-|-|-|-| | **Strategic** | | | | | | | **Tactical** | | | | | | | **Operational** | | | | | |
-While this may seem too time consuming, you need a mechanism to understand the contribution of reports and dashboards to business value, whether you're migrating or not. Catalogs like Azure Data Catalog are becoming very important because they give you the ability to catalog reports and dashboards, automatically capture the metadata associated with them, and let business users tag and rate them to help you understand business value.
+Metadata discovery tools like [Azure Data Catalog](../../../data-catalog/overview.md) let business users tag and rate data sources to enrich the metadata for those data sources to assist with their discovery and classification. You can use the metadata for a report or dashboard to help you understand its business value. Without such tools, understanding the contribution of reports and dashboards to business value is likely to be a time-consuming task, whether you're migrating or not.
### Migrate reports based on data migration strategy
-> [!TIP]
-> Data migration strategy could also dictate which reports and visualizations get migrated first.
+If your migration strategy is based on migrating data marts first, then the order of data mart migration will affect which reports and dashboards are migrated first. If your strategy is based on business value, the order in which you migrate data marts to Azure Synapse will reflect business priorities. Metadata discovery tools can help you implement your strategy by showing you which data mart tables supply data for which reports.
-If your migration strategy is based on migrating data marts first, the order of data mart migration will have a bearing on which reports and dashboards can be migrated first to run on Azure Synapse. Again, this is likely to be a business-value-related decision. Prioritizing which data marts are migrated first reflects business priorities. Metadata discovery tools can help you here by showing you which reports rely on data in which data mart tables.
+>[!TIP]
+>Your data migration strategy affects which reports and visualizations get migrated first.
-## Migration incompatibility issues that can impact reports and visualizations
+## Migration incompatibility issues that can affect reports and visualizations
-When it comes to migrating to Azure Synapse, there are several things that can impact the ease of migration for reports, dashboards, and other visualizations. The ease of migration is affected by:
+BI tools produce reports, dashboards, and other visualizations by issuing SQL queries that access physical tables and/or views in your data warehouse or data mart. When you migrate your legacy data warehouse to Azure Synapse, several factors can affect the ease of migration of reports, dashboards, and other visualizations. Those factors include:
-- Incompatibilities that occur during schema migration between your legacy data warehouse and Azure Synapse.
+- Schema incompatibilities between the environments.
-- Incompatibilities in SQL between your legacy data warehouse and Azure Synapse.
+- SQL incompatibilities between the environments.
-### The impact of schema incompatibilities
+### Schema incompatibilities
-> [!TIP]
-> Schema incompatibilities include legacy warehouse DBMS table types and data types that are unsupported on Azure Synapse.
+During a migration, schema incompatibilities in the data warehouse or data mart tables that supply data for reports, dashboards, and other visualizations can be:
-BI tool reports and dashboards, and other visualizations, are produced by issuing SQL queries that access physical tables and/or views in your data warehouse or data mart. When it comes to migrating your data warehouse or data mart schema to Azure Synapse, there may be incompatibilities that can impact reports and dashboards, such as:
+- Non-standard table types in your legacy data warehouse DBMS that don't have an equivalent in Azure Synapse.
-- Non-standard table types supported in your legacy data warehouse DBMS that don't have an equivalent in Azure Synapse, for example Teradata Time-Series tables.
+- Data types in your legacy data warehouse DBMS that don't have an equivalent in Azure Synapse.
-- Data types supported in your legacy data warehouse DBMS that don't have an equivalent in Azure Synapse, for example Teradata Geospatial or Interval data types.
+In most cases, there's a workaround to the incompatibilities. For example, you can migrate the data in an unsupported table type into a standard table with appropriate data types and indexed or partitioned on a date/time column. Similarly, it might be possible to represent unsupported data types in another type of column and perform calculations in Azure Synapse to achieve the same results.
-In many cases, where there are incompatibilities, there may be ways around them. For example, the data in unsupported table types can be migrated into a standard table with appropriate data types and indexed or partitioned on a date/time column. Similarly, it may be possible to represent unsupported data types in another type of column and perform calculations in Azure Synapse to achieve the same. Either way, it will need refactoring.
+>[!TIP]
+>Schema incompatibilities include legacy warehouse DBMS table types and data types that are unsupported on Azure Synapse.
-> [!TIP]
-> Querying the system catalog of your legacy warehouse DBMS is a quick and straightforward way to identify schema incompatibilities with Azure Synapse.
+To identify the reports affected by schema incompatibilities, run queries against the system catalog of your legacy data warehouse to identify the tables with unsupported data types. Then, you can use metadata from your BI tool to identify the reports that access data in those tables. For more information about how to identify object type incompatibilities, see [Unsupported Teradata database object types](5-minimize-sql-issues.md#unsupported-teradata-data-types).
-To identify reports and visualizations impacted by schema incompatibilities, run queries against the system catalog of your legacy data warehouse to identify tables with unsupported data types. Then use metadata from your BI tool or tools to identify reports that access these structures, to see what could be impacted. Obviously, this will depend on the legacy data warehouse DBMS you're migrating from. Find details of how to identify these incompatibilities in [Design and performance for Teradata migrations](1-design-performance-migration.md).
+>[!TIP]
+>Query the system catalog of your legacy warehouse DBMS to identify schema incompatibilities with Azure Synapse.
-The impact may be less than you think, because many BI tools don't support such data types. As a result, views may already exist in your legacy data warehouse that `CAST` unsupported data types to more generic types.
+The effect of schema incompatibilities on reports, dashboards, and other visualizations might be less than you think because many BI tools don't support the less generic data types. As a result, your legacy data warehouse might already have views that `CAST` unsupported data types to more generic types.
-### The impact of SQL incompatibilities and differences
+### SQL incompatibilities
-Additionally, any report, dashboard, or other visualization in an application or tool that makes use of proprietary SQL extensions associated with your legacy data warehouse DBMS is likely to be impacted when migrating to Azure Synapse. This could happen because the BI tool or application:
+During a migration, SQL incompatibilities are likely to affect any report, dashboard, or other visualization in an application or tool that:
- Accesses legacy data warehouse DBMS views that include proprietary SQL functions that have no equivalent in Azure Synapse. -- Issues SQL queries, which include proprietary SQL functions peculiar to the SQL dialect of your legacy data warehouse DBMS, that have no equivalent in Azure Synapse.
+- Issues SQL queries that include proprietary SQL functions, specific to the SQL dialect of your legacy environment, that have no equivalent in Azure Synapse.
### Gauge the impact of SQL incompatibilities on your reporting portfolio
-You can't rely on documentation associated with reports, dashboards, and other visualizations to gauge how big of an impact SQL incompatibility may have on the portfolio of embedded query services, reports, dashboards, and other visualizations you're intending to migrate to Azure Synapse. There must be a more precise way of doing that.
+Your reporting portfolio might include embedded query services, reports, dashboards, and other visualizations. Don't rely on the documentation associated with those items to gauge the effect of SQL incompatibilities on the migration of your reporting portfolio to Azure Synapse. You need to use a more precise way to assess the effect of SQL incompatibilities.
#### Use EXPLAIN statements to find SQL incompatibilities
-> [!TIP]
-> Gauge the impact of SQL incompatibilities by harvesting your DBMS log files and running `EXPLAIN` statements.
+You can find SQL incompatibilities by reviewing the logs of recent SQL activity in your legacy Teradata data warehouse. Use a script to extract a representative set of SQL statements to a file. Then, prefix each SQL statement with an `EXPLAIN` statement, and run those `EXPLAIN` statements in Azure Synapse. Any SQL statements containing proprietary unsupported SQL extensions will be rejected by Azure Synapse when the `EXPLAIN` statements are executed. This approach lets you assess the extent of SQL incompatibilities.
-One way is to get hold of the SQL log files of your legacy data warehouse. Use a script to pull out a representative set of SQL statements into a file, prefix each SQL statement with an `EXPLAIN` statement, and then run all the `EXPLAIN` statements in Azure Synapse. Any SQL statements containing proprietary SQL extensions from your legacy data warehouse that are unsupported will be rejected by Azure Synapse when the `EXPLAIN` statements are executed. This approach would at least give you an idea of how significant or otherwise the use of incompatible SQL is.
+Metadata from your legacy data warehouse DBMS can also help you identify incompatible views. As before, capture a representative set of SQL statements from the applicable logs, prefix each SQL statement with an `EXPLAIN` statement, and run those `EXPLAIN` statements in Azure Synapse to identify views with incompatible SQL.
-Metadata from your legacy data warehouse DBMS will also help you when it comes to views. Again, you can capture and view SQL statements, and `EXPLAIN` them as described previously to identify incompatible SQL in views.
+>[!TIP]
+>Gauge the impact of SQL incompatibilities by harvesting your DBMS log files and running `EXPLAIN` statements.
## Test report and dashboard migration to Azure Synapse Analytics
-> [!TIP]
-> Test performance and tune to minimize compute costs.
-
-A key element in data warehouse migration is the testing of reports and dashboards against Azure Synapse to verify that the migration has worked. To do this, you need to define a series of tests and a set of required outcomes for each test that needs to be run to verify success. It's important to ensure that reports and dashboards are tested and compared across your existing and migrated data warehouse systems to:
+A key element of data warehouse migration is testing of reports and dashboards in Azure Synapse to verify the migration has worked. Define a series of tests and a set of required outcomes for each test that you will run to verify success. Test and compare the reports and dashboards across your existing and migrated data warehouse systems to:
-- Identify whether schema changes made during migration, such as data types to be converted, have impacted reports in terms of ability to run results and corresponding visualizations.
+ - Identify whether any schema changes that were made during migration affected the ability of reports to run, report results, or the corresponding report visualizations. An example of a schema change is if you mapped an incompatible data type to an equivalent data type that's supported in Azure Synapse.
-- Verify all users are migrated.
+ - Verify that all users are migrated.
+
+ - Verify that all roles are migrated, and users are assigned to those roles.
+
+ - Verify that all data access security privileges are migrated to ensure access control list (ACL) migration.
+
+ - Ensure consistent results for all known queries, reports, and dashboards.
+
+ - Ensure that data and ETL migration is complete and error-free.
+
+ - Ensure that data privacy is upheld.
+
+ - Test performance and scalability.
+
+ - Test analytical functionality.
-- Verify all roles are migrated and users assigned to those roles.
+>[!TIP]
+>Test and tune performance to minimize compute costs.
-- Verify all data access security privileges are migrated to ensure access control list (ACL) migration.
+For information about how to migrate users, user groups, roles, and privileges, see [Security, access, and operations for Teradata migrations](3-security-access-operations.md).
-- Ensure consistent results of all known queries, reports, and dashboards.
+Automate testing as much as possible to make each test repeatable and to support a consistent approach to evaluating test results. Automation works well for known regular reports, and can be managed via [Azure Synapse pipelines](../../get-started-pipelines.md) or [Azure Data Factory](../../../data-factory/introduction.md) orchestration. If you already have a suite of test queries in place for regression testing, you can use the existing testing tools to automate post migration testing.
-- Ensure that data and ETL migration is complete and error-free.
+>[!TIP]
+>Best practice is to build an automated test suite to make tests repeatable.
-- Ensure data privacy is upheld.
+Ad-hoc analysis and reporting are more challenging and require compilation of a set of tests to verify that the same reports and dashboards from before and after migration are consistent. If you find inconsistencies, then your ability to compare metadata lineage across the original and migrated systems during migration testing becomes crucial. That comparison can highlight differences and pinpoint where inconsistencies originated, when detection by other means is difficult.
-- Test performance and scalability.--- Test analytical functionality.-
-For information about how to migrate users, user groups, roles, and privileges, see [Security, access, and operations for Teradata migrations](3-security-access-operations.md), which is part of this series.
-
-> [!TIP]
-> Build an automated test suite to make tests repeatable.
-
-It's also best practice to automate testing as much as possible, to make each test repeatable and to allow a consistent approach to evaluating results. This works well for known regular reports, and could be managed via [Azure Synapse Pipelines](../../get-started-pipelines.md?msclkid=8f3e7e96cfed11eca432022bc07c18de) or [Azure Data Factory](../../../data-factory/introduction.md?msclkid=2ccc66eccfde11ecaa58877e9d228779) orchestration. If you already have a suite of test queries in place for regression testing, you could use the testing tools to automate the post migration testing.
-
-> [!TIP]
-> Leverage tools that can compare metadata lineage to verify results.
-
-Ad-hoc analysis and reporting are more challenging and require a set of tests to be compiled to verify that results are consistent across your legacy data warehouse DBMS and Azure Synapse. If reports and dashboards are inconsistent, then having the ability to compare metadata lineage across original and migrated systems is extremely valuable during migration testing, as it can highlight differences and pinpoint where they occurred when these aren't easy to detect. This is discussed in more detail later in this article.
-
-In terms of security, the best way to do this is to create roles, assign access privileges to roles, and then attach users to roles. To access your newly migrated data warehouse, set up an automated process to create new users, and to do role assignment. To detach users from roles, you can follow the same steps.
-
-It's also important to communicate the cutover to all users, so they know what's changing and what to expect.
+>[!TIP]
+>Leverage tools that compare metadata lineage to verify results.
## Analyze lineage to understand dependencies between reports, dashboards, and data
-> [!TIP]
-> Having access to metadata and data lineage from reports all the way back to data source is critical for verifying that migrated reports are working correctly.
+Your understanding of lineage is a critical factor in the successful migration of reports and dashboards. Lineage is metadata that shows the journey of migrated data so you can track its path from a report or dashboard all the way back to the data source. Lineage shows how data has traveled from point to point, its location in the data warehouse and/or data mart, and which reports and dashboards use it. Lineage can help you understand what happens to data as it travels through different data stores, such as files and databases, different ETL pipelines, and into reports. When business users have access to data lineage, it improves trust, instills confidence, and supports informed business decisions.
-A critical success factor in migrating reports and dashboards is understanding lineage. Lineage is metadata that shows the journey that data has taken, so you can see the path from the report/dashboard all the way back to where the data originates. It shows how data has gone from point to point, its location in the data warehouse and/or data mart, and where it's used&mdash;for example, in what reports. It helps you understand what happens to data as it travels through different data stores&mdash;files and database&mdash;different ETL pipelines, and into reports. If business users have access to data lineage, it improves trust, breeds confidence, and enables more informed business decisions.
+>[!TIP]
+>Your ability to access metadata and data lineage from reports all the way back to a data source is critical for verifying that migrated reports work correctly.
-> [!TIP]
-> Tools that automate metadata collection and show end-to-end lineage in a multi-vendor environment are valuable when it comes to migration.
+In multi-vendor data warehouse environments, business analysts in BI teams might map out data lineage. For example, if you use different vendors for ETL, data warehouse, and reporting, and each vendor has its own metadata repository, then figuring out where a specific data element in a report came from can be challenging and time-consuming.
-In multi-vendor data warehouse environments, business analysts in BI teams may map out data lineage. For example, if you have Informatica for your ETL, Oracle for your data warehouse, and Tableau for reporting, each of which have their own metadata repository, figuring out where a specific data element in a report came from can be challenging and time consuming.
+>[!TIP]
+>Tools that automate the collection of metadata and show end-to-end lineage in a multi-vendor environment are valuable during a migration.
-To migrate seamlessly from a legacy data warehouse to Azure Synapse, end-to-end data lineage helps prove like-for-like migration when comparing reports and dashboards against your legacy environment. That means that metadata from several tools needs to be captured and integrated to show the end-to-end journey. Having access to tools that support automated metadata discovery and data lineage will let you see duplicate reports and ETL processes and reports that rely on data sources that are obsolete, questionable, or even non-existent. With this information, you can reduce the number of reports and ETL processes that you migrate.
+To migrate seamlessly from a legacy data warehouse to Azure Synapse, use end-to-end data lineage to prove like-for-like migration when you're comparing the reports and dashboards generated by each environment. To show the end-to-end data journey, you'll need to capture and integrate metadata from several tools. Having access to tools that support automated metadata discovery and data lineage, helps you identify duplicate reports or ETL processes, and find reports that rely on obsolete, questionable, or non-existent data sources. You can use that information to reduce the number of reports and ETL processes that you migrate.
-You can also compare end-to-end lineage of a report in Azure Synapse against the end-to-end lineage for the same report in your legacy data warehouse environment, to see if there are any differences that have occurred inadvertently during migration. This helps enormously with testing and verifying migration success.
+You can also compare the end-to-end lineage of a report in Azure Synapse to the end-to-end lineage of the same report in your legacy environment to check for differences that might have inadvertently occurred during migration. This type of comparison is exceptionally useful when you need to test and verify migration success.
-Data lineage visualization not only reduces time, effort, and error in the migration process, but also enables faster execution of the migration project.
+Data lineage visualization not only reduces time, effort, and error in the migration process, but also enables faster migration.
-By leveraging automated metadata discovery and data lineage tools that can compare lineage, you can verify if a report is produced using data migrated to Azure Synapse and if it's produced in the same way as in your legacy environment. This kind of capability also helps you determine:
+By using automated metadata discovery and data lineage tools that compare lineage, you can verify that a report in Azure Synapse that's produced from migrated data is produced in the same way in your legacy environment. This capability also helps you determine:
-- What data needs to be migrated to ensure successful report and dashboard execution on Azure Synapse.
+- What data needs to be migrated to ensure successful report and dashboard execution in Azure Synapse.
-- What transformations have been and should be performed to ensure successful execution on Azure Synapse.
+- What transformations have been and should be performed to ensure successful execution in Azure Synapse.
- How to reduce report duplication.
-This substantially simplifies the data migration process, because the business will have a better idea of the data assets it has and what needs to be migrated to enable a solid reporting environment on Azure Synapse.
-
-> [!TIP]
-> Azure Data Factory and several third-party ETL tools support lineage.
+Automated metadata discovery and data lineage tools substantially simplify the migration process because they help businesses become more aware of their data assets and to know what needs to be migrated to Azure Synapse to achieve a solid reporting environment.
-Several ETL tools provide end-to-end lineage capability, and you may be able to make use of this via your existing ETL tool if you're continuing to use it with Azure Synapse. [Azure Synapse Pipelines](../../get-started-pipelines.md?msclkid=8f3e7e96cfed11eca432022bc07c18de) or [Azure Data Factory](../../../data-factory/introduction.md?msclkid=2ccc66eccfde11ecaa58877e9d228779) lets you view lineage in mapping flows. Also, [Microsoft partners](../../partner/data-integration.md) provide automated metadata discovery, data lineage, and lineage comparison tools.
+Several ETL tools provide end-to-end lineage capability, so check whether your existing ETL tool has that capability if you plan to use it with Azure Synapse. Azure Synapse pipelines or Data Factory both support the ability to view lineage in mapping flows. [Microsoft partners](../../partner/data-integration.md) also provide automated metadata discovery, data lineage, and lineage comparison tools.
## Migrate BI tool semantic layers to Azure Synapse Analytics
-> [!TIP]
-> Some BI tools have semantic layers that simplify business user access to physical data structures in your data warehouse or data mart, like SAP Business Objects and IBM Cognos.
+Some BI tools have what is known as a semantic metadata layer. That layer simplifies business user access to the underlying physical data structures in a data warehouse or data mart database. The semantic metadata layer simplifies access by providing high-level objects like dimensions, measures, hierarchies, calculated metrics, and joins. The high-level objects use business terms that are familiar to business analysts, and map to physical data structures in your data warehouse or data mart.
-Some BI tools have what is known as a semantic metadata layer. The role of this metadata layer is to simplify business user access to physical data structures in an underlying data warehouse or data mart database. It does this by providing high-level objects like dimensions, measures, hierarchies, calculated metrics, and joins. These objects use business terms familiar to business analysts and are mapped to the physical data structures in the data warehouse or data mart database.
+>[!TIP]
+>Some BI tools have semantic layers that simplify business user access to physical data structures in your data warehouse or data mart.
-When it comes to data warehouse migration, changes to column names or table names may be forced upon you. For example, in Oracle, table names can have a "#". In Azure Synapse, the "#" is only allowed as a prefix to a table name to indicate a temporary table. Therefore, you may need to change a table name if migrating from Oracle. You may need to do rework to change mappings in such cases.
+In a data warehouse migration, you might be forced to change column or table names. You might also need to change mappings.
-A good way to get everything consistent across multiple BI tools is to create a universal semantic layer, using common data names for high-level objects like dimensions, measures, hierarchies, and joins, in a data virtualization server (as shown in the next diagram) that sits between applications, BI tools, and Azure Synapse. This allows you to set up everything once (instead of in every tool), including calculated fields, joins and mappings, and then point all BI tools at the data virtualization server.
+To achieve consistency across multiple BI tools, create a universal semantic layer by using a data virtualization server that sits between BI tools and applications and Azure Synapse. In the data virtualization server, use common data names for high-level objects like dimensions, measures, hierarchies, and joins. That way you configure everything, including calculated fields, joins, and mappings, only once instead of in every tool. Then, point all BI tools at the data virtualization server.
-> [!TIP]
-> Use data virtualization to create a common semantic layer to guarantee consistency across all BI tools in an Azure Synapse environment.
+>[!TIP]
+>Use data virtualization to create a common semantic layer to guarantee consistency across all BI tools in an Azure Synapse environment.
-In this way, you get consistency across all BI tools, while at the same time breaking the dependency between BI tools and applications and the underlying physical data structures in Azure Synapse. Use [Microsoft partners](../../partner/data-integration.md) on Azure to implement this. The following diagram shows how a common vocabulary in the data virtualization server lets multiple BI tools see a common semantic layer.
+With data virtualization, you get consistency across all BI tools and break the dependency between BI tools and applications and the underlying physical data structures in Azure Synapse. [Microsoft partners](../../partner/data-integration.md) can help you achieve consistency in Azure. The following diagram shows how a common vocabulary in the data virtualization server lets multiple BI tools see a common semantic layer.
:::image type="content" source="../media/4-visualization-reporting/data-virtualization-semantics.png" border="true" alt-text="Diagram with common data names and definitions that relate to the data virtualization server."::: ## Conclusions
-> [!TIP]
-> Identify incompatibilities early to gauge the extent of the migration effort. Migrate your users, group roles and privilege assignments. Only migrate the reports and visualizations that are used and are contributing to business value.
+In a lift and shift data warehouse migration, most reports, dashboards, and other visualizations should migrate easily.
-In a lift and shift data warehouse migration to Azure Synapse, most reports and dashboards should migrate easily.
+During a migration from a legacy environment, you might find that data in the legacy data warehouse or data mart tables is stored in unsupported data types. Or, you may find legacy data warehouse views that include proprietary SQL with no equivalent in Azure Synapse. If so, you'll need to resolve those issues to ensure a successful migration to Azure Synapse.
-However, if data structures change, then data is stored in unsupported data types, or access to data in the data warehouse or data mart is via a view that includes proprietary SQL that's unsupported in your Azure Synapse environment. You'll need to deal with those issues if they arise.
+Don't rely on user-maintained documentation to identify where issues are located. Instead, use `EXPLAIN` statements because they're a quick, pragmatic way to identify SQL incompatibilities. Rework the incompatible SQL statements to achieve equivalent functionality in Azure Synapse. Also, use automated metadata discovery and lineage tools to understand dependencies, find duplicate reports, and identify invalid reports that rely on obsolete, questionable, or non-existent data sources. Use lineage tools to compare lineage to verify that reports running in your legacy data warehouse environment are produced identically in Azure Synapse.
-You can't rely on documentation to find out where the issues are likely to be. Making use of `EXPLAIN` statements is a pragmatic and quick way to identify incompatibilities in SQL. Rework these to achieve similar results in Azure Synapse. In addition, it's recommended that you make use of automated metadata discovery and lineage tools to help you identify duplicate reports, reports that are no longer valid because they're using data from data sources that you no longer use, and to understand dependencies. Some of these tools help compare lineage to verify that reports running in your legacy data warehouse environment are produced identically in Azure Synapse.
+Don't migrate reports that you no longer use. BI tool usage data can help you determine which reports aren't in use. For the reports, dashboards, and other visualizations that you do want to migrate, migrate all users, user groups, roles, and privileges. If you're using business value to drive your report migration strategy, associate reports with strategic business objectives and priorities to help identify the contribution of report insights to specific objectives. If you're migrating data mart by data mart, use metadata to identify which reports are dependent on which tables and views, so you can make an informed decision about which data marts to migrate first.
-Don't migrate reports that you no longer use. BI tool usage data can help determine which ones aren't in use. For the visualizations and reports that you do want to migrate, migrate all users, user groups, roles, and privileges, and associate these reports with strategic business objectives and priorities to help you identify report insight contribution to specific objectives. This is useful if you're using business value to drive your report migration strategy. If you're migrating by data store, data mart by data mart, then metadata will also help you identify which reports are dependent on which tables and views, so that you can focus on migrating to these first.
+>[!TIP]
+>Identify incompatibilities early to gauge the extent of the migration effort. Migrate your users, group roles, and privilege assignments. Only migrate the reports and visualizations that are used and are contributing to business value.
-Finally, consider data virtualization to shield BI tools and applications from structural changes to the data warehouse and/or the data mart data model that may occur during migration. You can also use a common vocabulary with data virtualization to define a common semantic layer that guarantees consistent common data names, definitions, metrics, hierarchies, joins, and more across all BI tools and applications in a migrated Azure Synapse environment.
+Structural changes to the data model of your data warehouse or data mart can occur during a migration. Consider using data virtualization to shield BI tools and applications from structural changes. With data virtualization, you can use a common vocabulary to define a common semantic layer. The common semantic layer guarantees consistent common data names, definitions, metrics, hierarchies, and joins across all BI tools and applications in the new Azure Synapse environment.
## Next steps
synapse-analytics 5 Minimize Sql Issues https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/teradata/5-minimize-sql-issues.md
Previously updated : 05/31/2022 Last updated : 06/01/2022 # Minimize SQL issues for Teradata migrations
-This article is part five of a seven part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. This article provides best practices for minimizing SQL issues.
+This article is part five of a seven-part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. The focus of this article is best practices for minimizing SQL issues.
## Overview
Teradata implements the temporal query functionality via query rewriting to add
The Azure environment also includes specific features for complex analytics on time-series data at scale called [time series insights](https://azure.microsoft.com/services/time-series-insights/)&mdash;this is aimed at IoT data analysis applications and may be more appropriate for this use-case.
-### Teradata data type mapping
+<a id="teradata-data-type-mapping"></a>
+### Unsupported Teradata data types
> [!TIP] > Assess the impact of unsupported data types as part of the preparation phase.
-Most Teradata data types have a direct equivalent in Azure Synapse. This table shows these data types together with the recommended approach for handling them. In the table, Teradata column type is the type that's stored within the system catalog&mdash;for example, in `DBC.ColumnsV`.
+Most Teradata data types have a direct equivalent in Azure Synapse. The following table shows the Teradata data types that are unsupported in Azure Synapse together with the recommended mapping. In the table, Teradata column type is the type that's stored within the system catalog&mdash;for example, in `DBC.ColumnsV`.
| Teradata column type | Teradata data type | Azure Synapse data type | |-|--|-|
synapse-analytics 6 Microsoft Third Party Migration Tools https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/teradata/6-microsoft-third-party-migration-tools.md
Previously updated : 05/31/2022 Last updated : 07/12/2022 # Tools for Teradata data warehouse migration to Azure Synapse Analytics
-This article is part six of a seven part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. This article provides best practices for Microsoft and third-party tools.
+This article is part six of a seven-part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. The focus of this article is best practices for Microsoft and third-party tools.
## Data warehouse migration tools
-By migrating your existing data warehouse to Azure Synapse Analytics, you benefit from:
+By migrating your existing data warehouse to Azure Synapse, you benefit from:
- A globally secure, scalable, low-cost, cloud-native, pay-as-you-use analytical database. -- The rich Microsoft analytical ecosystem that exists on Azure. This ecosystem consists of technologies to help modernize your data warehouse once it's migrated, and extends your analytical capabilities to drive new value.
+- The rich Microsoft analytical ecosystem that exists on Azure. This ecosystem consists of technologies to help modernize your data warehouse once it's migrated and extend your analytical capabilities to drive new value.
-Several tools from Microsoft and third-party partner vendors can help you migrate your existing data warehouse to Azure Synapse. These tools include:
+Several tools from both Microsoft and [third-party partners](../../partner/data-integration.md) can help you migrate your existing data warehouse to Azure Synapse. This article discusses the following types of tools:
- Microsoft data and database migration tools.
Several tools from Microsoft and third-party partner vendors can help you migrat
- Third-party data warehouse migration tools to migrate schema and data to Azure Synapse. -- Third-party tools to minimize the impact on SQL differences between your existing data warehouse DBMS and Azure Synapse.-
-The following sections discuss these tools in more detail.
+- Third-party tools to bridge the SQL differences between your existing data warehouse DBMS and Azure Synapse.
## Microsoft data migration tools
-> [!TIP]
-> Data Factory includes tools to help migrate your data and your entire data warehouse to Azure.
- Microsoft offers several tools to help you migrate your existing data warehouse to Azure Synapse, such as: -- Microsoft Azure Data Factory.
+- [Azure Data Factory](../../../data-factory/introduction.md).
- Microsoft services for physical data transfer. - Microsoft services for data ingestion.
+The next sections discuss these tools in more detail.
+ ### Microsoft Azure Data Factory
-Microsoft Azure Data Factory is a fully managed, pay-as-you-use, hybrid data integration service for highly scalable ETL and ELT processing. It uses Spark to process and analyze data in parallel and in memory to maximize throughput.
+Data Factory is a fully managed, pay-as-you-use, hybrid data integration service for highly scalable ETL and ELT processing. It uses Apache Spark to process and analyze data in parallel and in-memory to maximize throughput.
-> [!TIP]
-> Data Factory allows you to build scalable data integration pipelines code-free.
+>[!TIP]
+>Data Factory allows you to build scalable data integration pipelines code-free.
-[Azure Data Factory connectors](../../../data-factory/connector-overview.md?msclkid=00086e4acff211ec9263dee5c7eb6e69) connect to external data sources and databases and have templates for common data integration tasks. A visual front-end, browser-based UI enables non-programmers to create and run process pipelines to ingest, transform, and load data. More experienced programmers have the option to incorporate custom code, such as Python programs.
+[Data Factory connectors](../../../data-factory/connector-overview.md) support connections to external data sources and databases and include templates for common data integration tasks. A visual front-end, browser-based UI enables non-programmers to create and run [pipelines](../../data-explorer/ingest-dat) to ingest, transform, and load data. More experienced programmers can incorporate custom code, such as Python programs.
-> [!TIP]
-> Data Factory enables collaborative development between business and IT professionals.
+>[!TIP]
+>Data Factory enables collaborative development between business and IT professionals.
-Data Factory is also an orchestration tool. It's the best Microsoft tool to automate the end-to-end migration process to reduce risk and make the migration process easily repeatable. The following diagram shows a Data Factory mapping data flow.
+Data Factory is also an orchestration tool and is the best Microsoft tool to automate the end-to-end migration process. Automation reduces the risk, effort, and time to migrate, and makes the migration process easily repeatable. The following diagram shows a mapping data flow in Data Factory.
-The next screenshot shows a Data Factory wrangling data flow.
+The next screenshot shows a wrangling data flow in Data Factory.
-You can develop simple or comprehensive ETL and ELT processes without coding or maintenance with a few clicks. These processes ingest, move, prepare, transform, and process your data. You can design and manage scheduling and triggers in Azure Data Factory to build an automated data integration and loading environment. In Data Factory, you can define, manage, and schedule PolyBase bulk data load processes.
+In Data Factory, you can develop simple or comprehensive ETL and ELT processes without coding or maintenance with just a few clicks. ETL/ELT processes ingest, move, prepare, transform, and process your data. You can design and manage scheduling and triggers in Data Factory to build an automated data integration and loading environment. In Data Factory, you can define, manage, and schedule PolyBase bulk data load processes.
-> [!TIP]
-> Data Factory includes tools to help migrate your data and your entire data warehouse to Azure.
+>[!TIP]
+>Data Factory includes tools to help migrate both your data and your entire data warehouse to Azure.
-You can use Data Factory to implement and manage a hybrid environment that includes on-premises, cloud, streaming and SaaS data&mdash;for example, from applications like Salesforce&mdash;in a secure and consistent way.
+You can use Data Factory to implement and manage a hybrid environment with on-premises, cloud, streaming, and SaaS data in a secure and consistent way. SaaS data might come from applications such as Salesforce.
-A new capability in Data Factory is wrangling data flows. This opens up Data Factory to business users who want to visually discover, explore, and prepare data at scale without writing code. This capability, similar to Microsoft Excel Power Query or Microsoft Power BI dataflows, offers self-service data preparation. Business users can prepare and integrate data through a spreadsheet-style user interface with drop-down transform options.
+Wrangling data flows is a new capability in Data Factory. This capability opens up Data Factory to business users who want to visually discover, explore, and prepare data at scale without writing code. Wrangling data flows offer self-service data preparation, similar to Microsoft Excel, Power Query, and Microsoft Power BI dataflows. Business users can prepare and integrate data through a spreadsheet-style UI with drop-down transform options.
-Azure Data Factory is the recommended approach for implementing data integration and ETL/ELT processes for an Azure Synapse environment, especially if existing legacy processes need to be refactored.
+Data Factory is the recommended approach for implementing data integration and ETL/ELT processes in the Azure Synapse environment, especially if you want to refactor existing legacy processes.
### Microsoft services for physical data transfer
-> [!TIP]
-> Microsoft offers a range of products and services to assist with data transfer.
+The following sections discuss a range of products and services that Microsoft offers to assist customers with data transfer.
#### Azure ExpressRoute
-Azure ExpressRoute creates private connections between Azure data centers and infrastructure on your premises or in a collocation environment. ExpressRoute connections don't go over the public internet, and they offer more reliability, faster speeds, and lower latencies than typical internet connections. In some cases, by using ExpressRoute connections to transfer data between on-premises systems and Azure, you gain significant cost benefits.
+[Azure ExpressRoute](../../../expressroute/expressroute-introduction.md) creates private connections between Azure data centers and infrastructure on your premises or in a collocation environment. ExpressRoute connections don't go over the public internet, and offer more reliability, faster speeds, and lower latencies than typical internet connections. In some cases, you gain significant cost benefits by using ExpressRoute connections to transfer data between on-premises systems and Azure.
#### AzCopy
-[AzCopy](../../../storage/common/storage-use-azcopy-v10.md) is a command line utility that copies files to Azure Blob Storage via a standard internet connection. In a warehouse migration project, you can use AzCopy to upload extracted, compressed, and delimited text files before loading through PolyBase, or a native Parquet reader if the exported files are Parquet format. AzCopy can upload individual files, file selections, or file directories.
+[AzCopy](../../../storage/common/storage-use-azcopy-v10.md) is a command line utility that copies files to Azure Blob Storage over a standard internet connection. In a warehouse migration project, you can use AzCopy to upload extracted, compressed, delimited text files before loading them into Azure Synapse using [PolyBase](#polybase). AzCopy can upload individual files, file selections, or file folders. If the exported files are in Parquet format, use a native Parquet reader instead.
#### Azure Data Box
-Microsoft offers a service called Azure Data Box. This service writes data to be migrated to a physical storage device. This device is then shipped to an Azure data center and loaded into cloud storage. The service can be cost-effective for large volumes of data&mdash;for example, tens or hundreds of terabytes&mdash;or where network bandwidth isn't readily available. Azure Data Box is typically used for one-off historical data load when migrating a large amount of data to Azure Synapse.
+[Azure Data Box](../../../databox/data-box-overview.md) is a Microsoft service that provides you with a proprietary physical storage device that you can copy migration data onto. You then ship the device to an Azure data center for data upload to cloud storage. This service can be cost-effective for large volumes of data, such as tens or hundreds of terabytes, or where network bandwidth isn't readily available. Azure Data Box is typically used for a large one-off historical data load into Azure Synapse.
-Another service is Data Box Gateway, a virtualized cloud storage gateway device that resides on your premises and sends your images, media, and other data to Azure. Use Data Box Gateway for one-off migration tasks or ongoing incremental data uploads.
+#### Azure Data Box Gateway
+
+[Azure Data Box Gateway](../../../databox-gateway/data-box-gateway-overview.md) is a virtualized cloud storage gateway device that resides on your premises and sends your images, media, and other data to Azure. Use Data Box Gateway for one-off migration tasks or ongoing incremental data uploads.
### Microsoft services for data ingestion
+The following sections discuss the products and services that Microsoft offers to assist customers with data ingestion.
+ #### COPY INTO
-The [COPY](/sql/t-sql/statements/copy-into-transact-sql) statement provides the most flexibility for high-throughput data ingestion into Azure Synapse Analytics. Refer to the list of capabilities that `COPY` offers for data ingestion.
+The [COPY INTO](/sql/t-sql/statements/copy-into-transact-sql#syntax) statement provides the most flexibility for high-throughput data ingestion into Azure Synapse. For more information about `COPY INTO` capabilities, see [COPY (Transact-SQL)](/sql/t-sql/statements/copy-into-transact-sql).
#### PolyBase
-> [!TIP]
-> PolyBase can load data in parallel from Azure Blob Storage into Azure Synapse.
+[PolyBase](../../sql/load-data-overview.md) is the fastest, most scalable method for bulk data load into Azure Synapse. PolyBase uses the massively parallel processing (MPP) architecture of Azure Synapse for parallel loading of data to achieve the fastest throughput. PolyBase can read data from flat files in Azure Blob Storage, or directly from external data sources and other relational databases via connectors.
-PolyBase provides the fastest and most scalable method of loading bulk data into Azure Synapse. PolyBase leverages the MPP architecture to use parallel loading, to give the fastest throughput, and can read data from flat files in Azure Blob Storage or directly from external data sources and other relational databases via connectors.
+>[!TIP]
+>PolyBase can load data in parallel from Azure Blob Storage into Azure Synapse.
-PolyBase can also directly read from files compressed with gzip&mdash;this reduces the physical volume of data moved during the load process. PolyBase supports popular data formats such as delimited text, ORC, and Parquet.
+PolyBase can also directly read from files compressed with gzip to reduce the physical volume of data during a load process. PolyBase supports popular data formats such as delimited text, ORC, and Parquet.
-> [!TIP]
-> Invoke PolyBase from Azure Data Factory as part of a migration pipeline.
+>[!TIP]
+>You can invoke PolyBase from Data Factory as part of a migration pipeline.
-PolyBase is tightly integrated with Azure Data Factory to enable data load ETL/ELT processes to be rapidly developed and scheduled through a visual GUI, leading to higher productivity and fewer errors than hand-written code.
+PolyBase is tightly integrated with Data Factory to support rapid development of data load ETL/ELT processes. You can schedule data load processes through a visual UI for higher productivity and fewer errors than hand-written code. Microsoft recommends PolyBase for data ingestion into Azure Synapse, especially for high-volume data ingestion.
-PolyBase is the recommended data load method for Azure Synapse, especially for high-volume data. PolyBase loads data using the `CREATE TABLE AS` or `INSERT...SELECT` statements&mdash;CTAS achieves the highest possible throughput as it minimizes the amount of logging required. Compressed delimited text files are the most efficient input format. For maximum throughput, split very large input files into multiple smaller files and load these in parallel. For fastest loading to a staging table, define the target table as type `HEAP` and use round-robin distribution.
+PolyBase uses `CREATE TABLE AS` or `INSERT...SELECT` statements to load data. `CREATE TABLE AS` minimizes logging to achieve the highest throughput. The most efficient input format for data load is compressed delimited text files. For maximum throughput, split large input files into multiple smaller files and load them in parallel. For fastest loading to a staging table, define the target table as `HEAP` type and use round-robin distribution.
-However, PolyBase has some limitations. Rows to be loaded must be less than 1 MB in length. Fixed-width format or nested data, such as JSON and XML, aren't directly readable.
+PolyBase has some limitations, it requires data row length to be less than 1 megabyte and doesn't support fixed-width nested formats like JSON and XML.
-## Microsoft partners can help you migrate your data warehouse to Azure Synapse Analytics
+### Microsoft partners for Teradata migrations
-In addition to tools that can help you with various aspects of data warehouse migration, there are several practiced [Microsoft partners](../../partner/data-integration.md) that can bring their expertise to help you move your legacy on-premises data warehouse platform to Azure Synapse.
+[Microsoft partners](../../partner/data-integration.md) offer tools, services, and expertise to help you migrate your legacy on-premises data warehouse platform to Azure Synapse.
## Next steps
-To learn more about implementing modern data warehouses, see the next article in this series: [Beyond Teradata migration, implementing a modern data warehouse in Microsoft Azure](7-beyond-data-warehouse-migration.md).
+To learn more about implementing modern data warehouses, see the next article in this series: [Beyond Teradata migration, implement a modern data warehouse in Microsoft Azure](7-beyond-data-warehouse-migration.md).
synapse-analytics 7 Beyond Data Warehouse Migration https://github.com/MicrosoftDocs/azure-docs/commits/main/articles/synapse-analytics/migration-guides/teradata/7-beyond-data-warehouse-migration.md
Title: "Beyond Teradata migration, implementing a modern data warehouse in Microsoft Azure"
+ Title: "Beyond Teradata migration, implement a modern data warehouse in Microsoft Azure"
description: Learn how a Teradata migration to Azure Synapse Analytics lets you integrate your data warehouse with the Microsoft Azure analytical ecosystem.
Previously updated : 05/31/2022 Last updated : 07/12/2022
-# Beyond Teradata migration, implementing a modern data warehouse in Microsoft Azure
+# Beyond Teradata migration, implement a modern data warehouse in Microsoft Azure
-This article is part seven of a seven part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. This article provides best practices for implementing modern data warehouses.
+This article is part seven of a seven-part series that provides guidance on how to migrate from Teradata to Azure Synapse Analytics. The focus of this article is best practices for implementing modern data warehouses.
## Beyond data warehouse migration to Azure
-One of the key reasons to migrate your existing data warehouse to Azure Synapse Analytics is to utilize a globally secure, scalable, low-cost, cloud-native, pay-as-you-use analytical database. Azure Synapse also lets you integrate your migrated data warehouse with the complete Microsoft Azure analytical ecosystem to take advantage of, and integrate with, other Microsoft technologies that help you modernize your migrated data warehouse. This includes integrating with technologies like:
+A key reason to migrate your existing data warehouse to Azure Synapse Analytics is to utilize a globally secure, scalable, low-cost, cloud-native, pay-as-you-use analytical database. With Azure Synapse, you can integrate your migrated data warehouse with the complete Microsoft Azure analytical ecosystem to take advantage of other Microsoft technologies and modernize your migrated data warehouse. Those technologies include:
-- Azure Data Lake Storage for cost effective data ingestion, staging, cleansing, and transformation, to free up data warehouse capacity occupied by fast growing staging tables.
+- [Azure Data Lake Storage](../../../storage/blobs/data-lake-storage-introduction.md) for cost effective data ingestion, staging, cleansing, and transformation. Data Lake Storage can free up the data warehouse capacity occupied by fast-growing staging tables.
-- Azure Data Factory for collaborative IT and self-service data integration [with connectors](../../../data-factory/connector-overview.md) to cloud and on-premises data sources and streaming data.
+- [Azure Data Factory](../../../data-factory/introduction.md) for collaborative IT and self-service data integration with [connectors](../../../data-factory/connector-overview.md) to cloud and on-premises data sources and streaming data.
-- [The Open Data Model Common Data Initiative](/common-data-model/) to share consistent trusted data across multiple technologies, including:
+- [Common Data Model](/common-data-model/) to share consistent trusted data across multiple technologies, including:
- Azure Synapse - Azure Synapse Spark - Azure HDInsight - Power BI
- - SAP
- Adobe Customer Experience Platform - Azure IoT
- - Microsoft ISV Partners
+ - Microsoft ISV partners
-- [Microsoft's data science technologies](/azure/architecture/data-science-process/platforms-and-tools), including:
- - Azure Machine Learning Studio
+- Microsoft [data science technologies](/azure/architecture/data-science-process/platforms-and-tools), including:
+ - Azure Machine Learning studio
- Azure Machine Learning - Azure Synapse Spark (Spark as a service) - Jupyter Notebooks - RStudio - ML.NET
- - .NET for Apache Spark to enable data scientists to use Azure Synapse data to train machine learning models at scale.
+ - .NET for Apache Spark, which lets data scientists use Azure Synapse data to train machine learning models at scale.
-- [Azure HDInsight](../../../hdinsight/index.yml) to leverage big data analytical processing and join big data with Azure Synapse data by creating a logical data warehouse using PolyBase.
+- [Azure HDInsight](../../../hdinsight/index.yml) to process large amounts of data, and to join big data with Azure Synapse data by creating a logical data warehouse using PolyBase.
-- [Azure Event Hubs](../../../event-hubs/event-hubs-about.md), [Azure Stream Analytics](../../../stream-analytics/stream-analytics-introduction.md), and [Apache Kafka](/azure/databricks/spark/latest/structured-streaming/kafka) to integrate with live streaming data from Azure Synapse.
+- [Azure Event Hubs](../../../event-hubs/event-hubs-about.md), [Azure Stream Analytics](../../../stream-analytics/stream-analytics-introduction.md), and [Apache Kafka](/azure/databricks/spark/latest/structured-streaming/kafka) to integrate live streaming data from Azure Synapse.
-There's often acute demand to integrate with [machine learning](../../machine-learning/what-is-machine-learning.md) to enable custom-built, trained machine learning models for use in Azure Synapse. This would enable in-database analytics to run at scale in-batch, on an event-driven basis and on-demand. The ability to exploit in-database analytics in Azure Synapse from multiple BI tools and applications also guarantees that all get the same predictions and recommendations.
+The growth of big data has led to an acute demand for [machine learning](../../machine-learning/what-is-machine-learning.md) to enable custom-built, trained machine learning models for use in Azure Synapse. Machine learning models enable in-database analytics to run at scale in batch, on an event-driven basis and on-demand. The ability to take advantage of in-database analytics in Azure Synapse from multiple BI tools and applications also guarantees consistent predictions and recommendations.
-In addition, there's an opportunity to integrate Azure Synapse with Microsoft partner tools on Azure to shorten time to value.
+In addition, you can integrate Azure Synapse with Microsoft partner tools on Azure to shorten time to value.
-Let's look at these in more detail to understand how you can take advantage of the technologies in Microsoft's analytical ecosystem to modernize your data warehouse once you've migrated to Azure Synapse.
+Let's take a closer look at how you can take advantage of technologies in the Microsoft analytical ecosystem to modernize your data warehouse after you've migrated to Azure Synapse.
-## Offload data staging and ETL processing to Azure Data Lake and Azure Data Factory
+## Offload data staging and ETL processing to Data Lake Storage and Data Factory
-Enterprises today have a key problem resulting from digital transformation. So much new data is being generated and captured for analysis, and much of this data is finding its way into data warehouses. A good example is transaction data created by opening OLTP systems to self-service access from mobile devices. These OLTP systems are the main sources of data to a data warehouse, and with customers now driving the transaction rate rather than employees, data in data warehouse staging tables has been growing rapidly in volume.
+Digital transformation has created a key challenge for enterprises by generating a torrent of new data for capture and analysis. A good example is transaction data created by opening online transactional processing (OLTP) systems to service access from mobile devices. Much of this data finds its way into data warehouses, and OLTP systems are the main source. With customers now driving the transaction rate rather than employees, the volume of data in data warehouse staging tables has been growing rapidly.
-The rapid influx of data into the enterprise, along with new sources of data like Internet of Things (IoT) streams, means that companies need to find a way to deal with unprecedented data growth and scale data integration ETL processing beyond current levels. One way to do this is to offload ingestion, data cleansing, transformation, and integration to a data lake and process it at scale there, as part of a data warehouse modernization program.
+With the rapid influx of data into the enterprise, along with new sources of data like Internet of Things (IoT), companies must find ways to scale up data integration ETL processing. One method is to offload ingestion, data cleansing, transformation, and integration to a data lake and process data at scale there, as part of a data warehouse modernization program.
-Once you've migrated your data warehouse to Azure Synapse, Microsoft provides the ability to modernize your ETL processing by ingesting data into, and staging data in, Azure Data Lake Storage. You can then clean, transform and integrate your data at scale using Data Factory before loading it into Azure Synapse in parallel using PolyBase.
+Once you've migrated your data warehouse to Azure Synapse, Microsoft can modernize your ETL processing by ingesting and staging data in Data Lake Storage. You can then clean, transform, and integrate your data at scale using Data Factory before loading it into Azure Synapse in parallel using PolyBase.
-For ELT strategies, consider offloading ELT processing to Azure Data Lake to easily scale as your data volume or frequency grows.
+For ELT strategies, consider offloading ELT processing to Data Lake Storage to easily scale as your data volume or frequency grows.
### Microsoft Azure Data Factory
-> [!TIP]
-> Data Factory allows you to build scalable data integration pipelines code-free.
+[Azure Data Factory](../../../data-factory/introduction.md) is a pay-as-you-use, hybrid data integration service for highly scalable ETL and ELT processing. Data Factory provides a web-based UI to build data integration pipelines with no code. With Data Factory, you can:
-[Data Factory](https://azure.microsoft.com/services/data-factory/) is a pay-as-you-use, hybrid data integration service for highly scalable ETL and ELT processing. Data Factory provides a simple web-based user interface to build data integration pipelines in a code-free manner that can:
+- Build scalable data integration pipelines code-free.
-- Build scalable data integration pipelines code-free. Easily acquire data at scale. Pay only for what you use, and connect to on-premises, cloud, and SaaS-based data sources.
+- Easily acquire data at scale.
-- Ingest, move, clean, transform, integrate, and analyze cloud and on-premises data at scale. Take automatic action, such as a recommendation or alert.
+- Pay only for what you use.
+
+- Connect to on-premises, cloud, and SaaS-based data sources.
+
+- Ingest, move, clean, transform, integrate, and analyze cloud and on-premises data at scale.
- Seamlessly author, monitor, and manage pipelines that span data stores both on-premises and in the cloud. - Enable pay-as-you-go scale-out in alignment with customer growth.
-> [!TIP]
-> Data Factory can connect to on-premises, cloud, and SaaS data.
+You can use these features without writing any code, or you can add custom code to Data Factory pipelines. The following screenshot shows an example Data Factory pipeline.
-All of this can be done without writing any code. However, adding custom code to Data Factory pipelines is also supported. The next screenshot shows an example Data Factory pipeline.
+>[!TIP]
+>Data Factory lets you to build scalable data integration pipelines without code.
-> [!TIP]
-> Pipelines called data factories control the integration and analysis of data. Data Factory is enterprise-class data integration software aimed at IT professionals with a data wrangling facility for business users.
+Implement Data Factory pipeline development from any of several places, including:
-Implement Data Factory pipeline development from any of several places including:
+- Microsoft Azure portal.
-- Microsoft Azure portal
+- Microsoft Azure PowerShell.
-- Microsoft Azure PowerShell
+- Programmatically from .NET and Python using a multi-language SDK.
-- Programmatically from .NET and Python using a multi-language SDK
+- Azure Resource Manager (ARM) templates.
-- Azure Resource Manager (ARM) templates
+- REST APIs.
-- REST APIs
+>[!TIP]
+>Data Factory can connect to on-premises, cloud, and SaaS data.
-Developers and data scientists who prefer to write code can easily author Data Factory pipelines in Java, Python, and .NET using the software development kits (SDKs) available for those programming languages. Data Factory pipelines can also be hybrid since they can connect, ingest, clean, transform, and analyze data in on-premises data centers, Microsoft Azure, other clouds, and SaaS offerings.
+Developers and data scientists who prefer to write code can easily author Data Factory pipelines in Java, Python, and .NET using the software development kits (SDKs) available for those programming languages. Data Factory pipelines can be hybrid data pipelines because they can connect, ingest, clean, transform, and analyze data in on-premises data centers, Microsoft Azure, other clouds, and SaaS offerings.
-Once you develop Data Factory pipelines to integrate and analyze data, deploy those pipelines globally and schedule them to run in batch, invoke them on demand as a service, or run them in real-time on an event-driven basis. A Data Factory pipeline can also run on one or more execution engines and monitor pipeline execution to ensure performance and track errors.
+After you develop Data Factory pipelines to integrate and analyze data, you can deploy those pipelines globally and schedule them to run in batch, invoke them on demand as a service, or run them in real-time on an event-driven basis. A Data Factory pipeline can also run on one or more execution engines and monitor execution to ensure performance and to track errors.
-#### Use cases
+>[!TIP]
+>In Azure Data Factory, pipelines control the integration and analysis of data. Data Factory is enterprise-class data integration software aimed at IT professionals and has data wrangling capability for business users.
-> [!TIP]
-> Build data warehouses on Microsoft Azure.
+#### Use cases
-Data Factory can support multiple use cases, including:
+Data Factory supports multiple use cases, such as:
-- Preparing, integrating, and enriching data from cloud and on-premises data sources to populate your migrated data warehouse and data marts on Microsoft Azure Synapse.
+- Prepare, integrate, and enrich data from cloud and on-premises data sources to populate your migrated data warehouse and data marts on Microsoft Azure Synapse.
-- Preparing, integrating, and enriching data from cloud and on-premises data sources to produce training data for use in machine learning model development and in retraining analytical models.
+- Prepare, integrate, and enrich data from cloud and on-premises data sources to produce training data for use in machine learning model development and in retraining analytical models.
-- Orchestrating data preparation and analytics to create predictive and prescriptive analytical pipelines for processing and analyzing data in batch, such as sentiment analytics, and either acting on the results of the analysis or populating your data warehouse with the results.
+- Orchestrate data preparation and analytics to create predictive and prescriptive analytical pipelines for processing and analyzing data in batch, such as sentiment analytics. Either act on the results of the analysis or populate your data warehouse with the results.
-- Preparing, integrating, and enriching data for data-driven business applications running on the Azure cloud on top of operational data stores like Azure Cosmos DB.
+- Prepare, integrate, and enrich data for data-driven business applications running on the Azure cloud on top of operational data stores such as Azure Cosmos DB.
-> [!TIP]
-> Build training data sets in data science to develop machine learning models.
+>[!TIP]
+>Build training data sets in data science to develop machine learning models.
#### Data sources
Data Factory lets you use [connectors](../../../data-factory/connector-overview.
#### Transform data using Azure Data Factory
-> [!TIP]
-> Professional ETL developers can use Azure Data Factory mapping data flows to clean, transform, and integrate data without the need to write code.
+Within a Data Factory pipeline, you can ingest, clean, transform, integrate, and analyze any type of data from these sources. Data can be structured, semi-structured like JSON or Avro, or unstructured.
-Within a Data Factory pipeline, ingest, clean, transform, integrate, and, if necessary, analyze any type of data from these sources. This includes structured, semi-structured such as JSON or Avro, and unstructured data.
+Without writing any code, professional ETL developers can use Data Factory mapping data flows to filter, split, join several types, lookup, pivot, unpivot, sort, union, and aggregate data. In addition, Data Factory supports surrogate keys, multiple write processing options like insert, upsert, update, table recreation, and table truncation, and several types of target data stores&mdash;also known as sinks. ETL developers can also create aggregations, including time-series aggregations that require a window to be placed on data columns.
-Professional ETL developers can use Data Factory mapping data flows to filter, split, join (many types), lookup, pivot, unpivot, sort, union, and aggregate data without writing any code. In addition, Data Factory supports surrogate keys, multiple write processing options such as insert, upsert, update, table recreation, and table truncation, and several types of target data stores&mdash;also known as sinks. ETL developers can also create aggregations, including time-series aggregations that require a window to be placed on data columns.
+>[!TIP]
+>Professional ETL developers can use Data Factory mapping data flows to clean, transform, and integrate data without the need to write code.
-> [!TIP]
-> Data Factory supports the ability to automatically detect and manage schema changes in inbound data, such as in streaming data.
+You can run mapping data flows that transform data as activities in a Data Factory pipeline, and if necessary, you can include multiple mapping data flows in a single pipeline. In this way, you can manage the complexity by breaking up challenging data transformation and integration tasks into smaller mapping dataflows that can be combined. And, you can add custom code when needed. In addition to this functionality, Data Factory mapping data flows include the ability to:
-Run mapping data flows that transform data as activities in a Data Factory pipeline. Include multiple mapping data flows in a single pipeline, if necessary. Break up challenging data transformation and integration tasks into smaller mapping dataflows that can be combined to handle the complexity and custom code added if necessary. In addition to this functionality, Data Factory mapping data flows include these abilities:
+- Define expressions to clean and transform data, compute aggregations, and enrich data. For example, these expressions can perform feature engineering on a date field to break it into multiple fields to create training data during machine learning model development. You can construct expressions from a rich set of functions that include mathematical, temporal, split, merge, string concatenation, conditions, pattern match, replace, and many other functions.
-- Define expressions to clean and transform data, compute aggregations, and enrich data. For example, these expressions can perform feature engineering on a date field to break it into multiple fields to create training data during machine learning model development. Construct expressions from a rich set of functions that include mathematical, temporal, split, merge, string concatenation, conditions, pattern match, replace, and many other functions.--- Automatically handle schema drift so that data transformation pipelines can avoid being impacted by schema changes in data sources. This is especially important for streaming IoT data, where schema changes can happen without notice when devices are upgraded or when readings are missed by gateway devices collecting IoT data.
+- Automatically handle schema drift so that data transformation pipelines can avoid being impacted by schema changes in data sources. This ability is especially important for streaming IoT data, where schema changes can happen without notice if devices are upgraded or when readings are missed by gateway devices collecting IoT data.
- Partition data to enable transformations to run in parallel at scale. -- Inspect data to view the metadata of a stream you're transforming.
+- Inspect streaming data to view the metadata of a stream you're transforming.
+
+>[!TIP]
+>Data Factory supports the ability to automatically detect and manage schema changes in inbound data, such as in streaming data.
-> [!TIP]
-> Data Factory can also partition data to enable ETL processing to run at scale.
+The following screenshot shows an example Data Factory mapping data flow.
-The next screenshot shows an example Data Factory mapping data flow.
+Data engineers can profile data quality and view the results of individual data transforms by enabling debug capability during development.
-Data engineers can profile data quality and view the results of individual data transforms by switching on a debug capability during development.
+>[!TIP]
+>Data Factory can also partition data to enable ETL processing to run at scale.
-> [!TIP]
-> Data Factory pipelines are also extensible since Data Factory allows you to write your own code and run it as part of a pipeline.
+If necessary, you can extend Data Factory transformational and analytical functionality by adding a linked service that contains your code into a pipeline. For example, an Azure Synapse Spark pool notebook might contain Python code that uses a trained model to score the data integrated by a mapping data flow.
-Extend Data Factory transformational and analytical functionality by adding a linked service containing your own code into a pipeline. For example, an Azure Synapse Spark pool notebook containing Python code could use a trained model to score the data integrated by a mapping data flow.
+You can store integrated data and any results from analytics within a Data Factory pipeline in one or more data stores, such as Data Lake Storage, Azure Synapse, or Hive tables in HDInsight. You can also invoke other activities to act on insights produced by a Data Factory analytical pipeline.
-Store integrated data and any results from analytics included in a Data Factory pipeline in one or more data stores such as Azure Data Lake Storage, Azure Synapse, or Azure HDInsight (Hive tables). Invoke other activities to act on insights produced by a Data Factory analytical pipeline.
+>[!TIP]
+>Data Factory pipelines are extensible because Data Factory lets you write your own code and run it as part of a pipeline.
#### Utilize Spark to scale data integration
-Internally, Data Factory utilizes Azure Synapse Spark Pools&mdash;Microsoft's Spark-as-a-service offering&mdash;at run time to clean and integrate data on the Microsoft Azure cloud. This enables it to clean, integrate, and analyze high-volume and very high-velocity data (such as click stream data) at scale. Microsoft intends to execute Data Factory pipelines on other Spark distributions. In addition to executing ETL jobs on Spark, Data Factory can also invoke Pig scripts and Hive queries to access and transform data stored in Azure HDInsight.
+At run time, Data Factory internally uses Azure Synapse Spark pools, which are Microsoft's Spark as a service offering, to clean and integrate data in the Azure cloud. You can clean, integrate, and analyze high-volume, high-velocity data, such as click-stream data, at scale. Microsoft's intention is to also run Data Factory pipelines on other Spark distributions. In addition to running ETL jobs on Spark, Data Factory can invoke Pig scripts and Hive queries to access and transform data stored in HDInsight.
#### Link self-service data prep and Data Factory ETL processing using wrangling data flows
-> [!TIP]
-> Data Factory support for wrangling data flows in addition to mapping data flows means that business and IT can work together on a common platform to integrate data.
+Data wrangling lets business users, also known as citizen data integrators and data engineers, make use of the platform to visually discover, explore, and prepare data at scale without writing code. This Data Factory capability is easy to use and is similar to Microsoft Excel Power Query or Microsoft Power BI dataflows, where self-service business users use a spreadsheet-style UI with drop-down transforms to prepare and integrate data. The following screenshot shows an example Data Factory wrangling data flow.
-Another new capability in Data Factory is wrangling data flows. This lets business users (also known as citizen data integrators and data engineers) make use of the platform to visually discover, explore, and prepare data at scale without writing code. This easy-to-use Data Factory capability is similar to Microsoft Excel Power Query or Microsoft Power BI dataflows, where self-service data preparation business users use a spreadsheet-style UI with drop-down transforms to prepare and integrate data. The following screenshot shows an example Data Factory wrangling data flow.
+Unlike Excel and Power BI, Data Factory [wrangling data flows](../../../data-factory/wrangling-tutorial.md) use Power Query to generate M code and then translate it into a massively parallel in-memory Spark job for cloud-scale execution. The combination of mapping data flows and wrangling data flows in Data Factory lets professional ETL developers and business users collaborate to prepare, integrate, and analyze data for a common business purpose. The preceding Data Factory mapping data flows diagram shows how both Data Factory and Azure Synapse Spark pool notebooks can be combined in the same Data Factory pipeline. The combination of mapping and wrangling data flows in Data Factory helps IT and business users stay aware of what data flows each has created, and supports data flow reuse to minimize reinvention and maximize productivity and consistency.
-This differs from Excel and Power BI, as Data Factory [wrangling data flows](../../../data-factory/wrangling-tutorial.md) use Power Query to generate M code and translate it into a massively parallel in-memory Spark job for cloud-scale execution. The combination of mapping data flows and wrangling data flows in Data Factory lets IT professional ETL developers and business users collaborate to prepare, integrate, and analyze data for a common business purpose. The preceding Data Factory mapping data flow diagram shows how both Data Factory and Azure Synapse Spark pool notebooks can be combined in the same Data Factory pipeline. This allows IT and business to be aware of what each has created. Mapping data flows and wrangling data flows can then be available for reuse to maximize productivity and consistency and minimize reinvention.
+>[!TIP]
+>Data Factory supports both wrangling data flows and mapping data flows, so business users and IT users can integrate data collaboratively on a common platform.
#### Link data and analytics in analytical pipelines
-In addition to cleaning and transforming data, Data Factory can combine data integration and analytics in the same pipeline. Use Data Factory to create both data integration and analytical pipelines&mdash;the latter being an extension of the former. Drop an analytical model into a pipeline so that clean, integrated data can be stored to provide predictions or recommendations. Act on this information immediately or store it in your data warehouse to provide you with new insights and recommendations that can be viewed in BI tools.
+In addition to cleaning and transforming data, Data Factory can combine data integration and analytics in the same pipeline. You can use Data Factory to create both data integration and analytical pipelines, the latter being an extension of the former. You can drop an analytical model into a pipeline to create an analytical pipeline that generates clean, integrated data for predictions or recommendations. Then, you can act on the predictions or recommendations immediately, or store them in your data warehouse to provide new insights and recommendations that can be viewed in BI tools.
-Models developed code-free with Azure Machine Learning Studio, or with the Azure Machine Learning SDK using Azure Synapse Spark pool notebooks or using R in RStudio, can be invoked as a service from within a Data Factory pipeline to batch score your data. Analysis happens at scale by executing Spark machine learning pipelines on Azure Synapse Spark pool notebooks.
+To batch score your data, you can develop an analytical model that you invoke as a service within a Data Factory pipeline. You can develop analytical models code-free with Azure Machine Learning studio, or with the Azure Machine Learning SDK using Azure Synapse Spark pool notebooks or R in RStudio. When you run Spark machine learning pipelines on Azure Synapse Spark pool notebooks, analysis happens at scale.
-Store integrated data and any results from analytics included in a Data Factory pipeline in one or more data stores, such as Azure Data Lake Storage, Azure Synapse, or Azure HDInsight (Hive tables). Invoke other activities to act on insights produced by a Data Factory analytical pipeline.
+You can store integrated data and any Data Factory analytical pipeline results in one or more data stores, such as Data Lake Storage, Azure Synapse, or Hive tables in HDInsight. You can also invoke other activities to act on insights produced by a Data Factory analytical pipeline.
-## A lake database to share consistent trusted data
+## Use a lake database to share consistent trusted data
-> [!TIP]
-> Microsoft has created a lake database to describe core data entities to be shared across the enterprise.
+A key objective of any data integration setup is the ability to integrate data once and reuse it everywhere, not just in a data warehouse. For example, you might want to use integrated data in data science. Reuse avoids reinvention and ensures consistent, commonly understood data that everyone can trust.
-A key objective in any data integration setup is the ability to integrate data once and reuse it everywhere, not just in a data warehouse&mdash;for example, in data science. Reuse avoids reinvention and ensures consistent, commonly understood data that everyone can trust.
+[Common Data Model](/common-data-model/) describes core data entities that can be shared and reused across the enterprise. To achieve reuse, Common Data Model establishes a set of common data names and definitions that describe logical data entities. Examples of common data names include Customer, Account, Product, Supplier, Orders, Payments, and Returns. IT and business professionals can use data integration software to create and store common data assets to maximize their reuse and drive consistency everywhere.
-> [!TIP]
-> Azure Data Lake Storage is shared storage that underpins Microsoft Azure Synapse, Azure Machine Learning, Azure Synapse Spark, and Azure HDInsight.
+Azure Synapse provides industry-specific database templates to help standardize data in the lake. [Lake database templates](../../database-designer/concepts-database-templates.md) provide schemas for predefined business areas, enabling data to be loaded into a lake database in a structured way. The power comes when you use data integration software to create lake database common data assets, resulting in self-describing trusted data that can be consumed by applications and analytical systems. You can create common data assets in Data Lake Storage by using Data Factory.
-To achieve this goal, establish a set of common data names and definitions describing logical data entities that need to be shared across the enterprise&mdash;such as customer, account, product, supplier, orders, payments, returns, and so forth. Once this is done, IT and business professionals can use data integration software to create these common data assets and store them to maximize their reuse to drive consistency everywhere.
+>[!TIP]
+>Data Lake Storage is shared storage that underpins Microsoft Azure Synapse, Azure Machine Learning, Azure Synapse Spark, and HDInsight.
-> [!TIP]
-> Integrating data to create lake database logical entities in shared storage enables maximum reuse of common data assets.
+Power BI, Azure Synapse Spark, Azure Synapse, and Azure Machine Learning can consume common data assets. The following diagram shows how a lake database can be used in Azure Synapse.
-Microsoft has done this by creating a [lake database](../../database-designer/concepts-lake-database.md). The lake database is a common language for business entities that represents commonly used concepts and activities across a business. Azure Synapse Analytics provides industry specific database templates to help standardize data in the lake. [Lake database templates](../../database-designer/concepts-database-templates.md) provide schemas for predefined business areas, enabling data to be loaded into a lake database in a structured way. The power comes when data integration software is used to create lake database common data assets. This results in self-describing trusted data that can be consumed by applications and analytical systems. Create a lake database in Azure Data Lake Storage by using Azure Data Factory, and consume it with Power BI, Azure Synapse Spark, Azure Synapse, and Azure Machine Learning. The following diagram shows a lake database used in Azure Synapse Analytics.
+>[!TIP]
+>Integrate data to create lake database logical entities in shared storage to maximize the reuse of common data assets.
## Integration with Microsoft data science technologies on Azure
-Another key requirement in modernizing your migrated data warehouse is to integrate it with Microsoft and third-party data science technologies on Azure to produce insights for competitive advantage. Let's look at what Microsoft offers in terms of machine learning and data science technologies and see how these can be used with Azure Synapse in a modern data warehouse environment.
+Another key objective when modernizing a data warehouse is to produce insights for competitive advantage. You can produce insights by integrating your migrated data warehouse with Microsoft and third-party data science technologies in Azure. The following sections describe machine learning and data science technologies offered by Microsoft to see how they can be used with Azure Synapse in a modern data warehouse environment.
### Microsoft technologies for data science on Azure
-> [!TIP]
-> Develop machine learning models using a no/low-code approach or from a range of programming languages like Python, R, and .NET.
-
-Microsoft offers a range of technologies to build predictive analytical models using machine learning, analyze unstructured data using deep learning, and perform other kinds of advanced analytics. This includes:
+Microsoft offers a range of technologies that support advance analysis. With these technologies, you can build predictive analytical models using machine learning or analyze unstructured data using deep learning. The technologies include:
-- Azure Machine Learning Studio
+- Azure Machine Learning studio
- Azure Machine Learning
Microsoft offers a range of technologies to build predictive analytical models u
- .NET for Apache Spark
-Data scientists can use RStudio (R) and Jupyter Notebooks (Python) to develop analytical models, or they can use other frameworks such as Keras or TensorFlow.
+Data scientists can use RStudio (R) and Jupyter Notebooks (Python) to develop analytical models, or they can use frameworks such as Keras or TensorFlow.
-#### Azure Machine Learning Studio
+>[!TIP]
+>Develop machine learning models using a no/low-code approach or by using programming languages like Python, R, and .NET.
-Azure Machine Learning Studio is a fully managed cloud service that lets you easily build, deploy, and share predictive analytics via a drag-and-drop web-based user interface. The next screenshot shows an Azure Machine Learning Studio user interface.
+#### Azure Machine Learning studio
+Azure Machine Learning studio is a fully managed cloud service that lets you build, deploy, and share predictive analytics using a drag-and-drop, web-based UI. The following screenshot shows the Azure Machine Learning studio UI.
+ #### Azure Machine Learning
-> [!TIP]
-> Azure Machine Learning provides an SDK for developing machine learning models using several open-source frameworks.
+Azure Machine Learning provides an SDK and services for Python that supports can help you quickly prepare data and also train and deploy machine learning models. You can use Azure Machine Learning in Azure notebooks using Jupyter Notebook, with open-source frameworks, such as PyTorch, TensorFlow, scikit-learn, or Spark MLlib&mdash;the machine learning library for Spark. Azure Machine Learning provides an AutoML capability that automatically tests multiple algorithms to identify the most accurate algorithms to expedite model development.
+
+>[!TIP]
+>Azure Machine Learning provides an SDK for developing machine learning models using several open-source frameworks.
-Azure Machine Learning provides a software development kit (SDK) and services for Python to quickly prepare data, as well as train and deploy machine learning models. Use Azure Machine Learning from Azure notebooks (a Jupyter Notebook service) and utilize open-source frameworks, such as PyTorch, TensorFlow, Spark MLlib (Azure Synapse Spark pool notebooks), or scikit-learn. Azure Machine Learning provides an AutoML capability that automatically identifies the most accurate algorithms to expedite model development. You can also use it to build machine learning pipelines that manage end-to-end workflow, programmatically scale on the cloud, and deploy models both to the cloud and the edge. Azure Machine Learning uses logical containers called workspaces, which can be either created manually from the Azure portal or created programmatically. These workspaces keep compute targets, experiments, data stores, trained machine learning models, Docker images, and deployed services all in one place to enable teams to work together. Use Azure Machine Learning from Visual Studio with a Visual Studio for AI extension.
+You can also use Azure Machine Learning to build machine learning pipelines that manage end-to-end workflow, programmatically scale in the cloud, and deploy models both to the cloud and the edge. Azure Machine Learning contains [workspaces](../../../machine-learning/concept-workspace.md), which are logical spaces that you can programmatically or manually create in the Azure portal. These workspaces keep compute targets, experiments, data stores, trained machine learning models, Docker images, and deployed services all in one place to enable teams to work together. You can use Azure Machine Learning in Visual Studio with the Visual Studio for AI extension.
-> [!TIP]
-> Organize and manage related data stores, experiments, trained models, Docker images, and deployed services in workspaces.
+>[!TIP]
+>Organize and manage related data stores, experiments, trained models, Docker images, and deployed services in workspaces.
#### Azure Synapse Spark pool notebooks
-> [!TIP]
-> Azure Synapse Spark is Microsoft's dynamically scalable Spark-as-a-service, offering scalable execution of data preparation, model development, and deployed model execution.
+An [Azure Synapse Spark pool notebook](../../spark/apache-spark-development-using-notebooks.md) is an Azure-optimized Apache Spark service. With Azure Synapse Spark pool notebooks:
-[Azure Synapse Spark pool notebooks](../../spark/apache-spark-development-using-notebooks.md?msclkid=cbe4b8ebcff511eca068920ea4bf16b9) is an Apache Spark service optimized to run on Azure, which:
+- Data engineers can build and run scalable data preparation jobs using Data Factory.
-- Allows data engineers to build and execute scalable data preparation jobs using Azure Data Factory.
+- Data scientists can build and run machine learning models at scale using notebooks written in languages such as Scala, R, Python, Java, and SQL to visualize results.
-- Allows data scientists to build and execute machine learning models at scale using notebooks written in languages such as Scala, R, Python, Java, and SQL; and to visualize results.
+>[!TIP]
+>Azure Synapse Spark is a dynamically scalable Spark as a service offering from Microsoft, Spark offers scalable execution of data preparation, model development, and deployed model execution.
-> [!TIP]
-> Azure Synapse Spark can access data in a range of Microsoft analytical ecosystem data stores on Azure.
+Jobs running in Azure Synapse Spark pool notebooks can retrieve, process, and analyze data at scale from Azure Blob Storage, Data Lake Storage, Azure Synapse, HDInsight, and streaming data services such as Apache Kafka.
-Jobs running in Azure Synapse Spark pool notebook can retrieve, process, and analyze data at scale from Azure Blob Storage, Azure Data Lake Storage, Azure Synapse, Azure HDInsight, and streaming data services such as Kafka.
+>[!TIP]
+>Azure Synapse Spark can access data in a range of Microsoft analytical ecosystem data stores on Azure.
-Autoscaling and auto-termination are also supported to reduce total cost of ownership (TCO). Data scientists can use the MLflow open-source framework to manage the machine learning lifecycle.
+Azure Synapse Spark pool notebooks support autoscaling and auto-termination to reduce total cost of ownership (TCO). Data scientists can use the MLflow open-source framework to manage the machine learning lifecycle.
#### ML.NET
-> [!TIP]
-> Microsoft has extended its machine learning capability to .NET developers.
+ML.NET is an open-source, cross-platform machine learning framework for Windows, Linux, macOS. Microsoft created ML.NET so that .NET developers can use existing tools, such as ML.NET Model Builder for Visual Studio, to develop custom machine learning models and integrate them into their .NET applications.
-ML.NET is an open-source and cross-platform machine learning framework (Windows, Linux, macOS), created by Microsoft for .NET developers so that they can use existing tools&mdash;like ML.NET Model Builder for Visual Studio&mdash;to develop custom machine learning models and integrate them into .NET applications.
+>[!TIP]
+>Microsoft has extended its machine learning capability to .NET developers.
#### .NET for Apache Spark
-.NET for Apache Spark aims to make Spark accessible to .NET developers across all Spark APIs. It takes Spark support beyond R, Scala, Python, and Java to .NET. While initially only available on Apache Spark on HDInsight, Microsoft intends to make this available on Azure Synapse Spark pool notebook.
+.NET for Apache Spark extends Spark support beyond R, Scala, Python, and Java to .NET and aims to make Spark accessible to .NET developers across all Spark APIs. While .NET for Apache Spark is currently only available on Apache Spark in HDInsight, Microsoft intends to make .NET for Apache Spark available on Azure Synapse Spark pool notebooks.
### Use Azure Synapse Analytics with your data warehouse
-> [!TIP]
-> Train, test, evaluate, and execute machine learning models at scale on Azure Synapse Spark pool notebook by using data in Azure Synapse.
+To combine machine learning models with Azure Synapse, you can:
-Combine machine learning models with Azure Synapse by:
+- Use machine learning models in batch or in real-time on streaming data to produce new insights, and add those insights to what you already know in Azure Synapse.
-- Using machine learning models in batch mode or in real-time to produce new insights, and add them to what you already know in Azure Synapse.
+- Use the data in Azure Synapse to develop and train new predictive models for deployment elsewhere, such as in other applications.
-- Using the data in Azure Synapse to develop and train new predictive models for deployment elsewhere, such as in other applications.
+- Deploy machine learning models, including models trained elsewhere, in Azure Synapse to analyze data in your data warehouse and drive new business value.
-- Deploying machine learning models, including those trained elsewhere, in Azure Synapse to analyze data in the data warehouse and drive new business value.
+>[!TIP]
+>Train, test, evaluate, and run machine learning models at scale on Azure Synapse Spark pool notebooks by using data in Azure Synapse.
-> [!TIP]
-> Produce new insights using machine learning on Azure in batch or in real-time and add to what you know in your data warehouse.
+Data scientists can use RStudio, Jupyter Notebooks, and Azure Synapse Spark pool notebooks together with Azure Machine Learning to develop machine learning models that run at scale on Azure Synapse Spark pool notebooks using data in Azure Synapse. For example, data scientists could create an unsupervised model to segment customers to drive different marketing campaigns. Use supervised machine learning to train a model to predict a specific outcome, such as to predict a customer's propensity to churn, or to recommend the next best offer for a customer to try to increase their value. The following diagram shows how Azure Synapse can be used for Azure Machine Learning.
-In terms of machine learning model development, data scientists can use RStudio, Jupyter Notebooks, and Azure Synapse Spark pool notebooks together with Azure Machine Learning to develop machine learning models that run at scale on Azure Synapse Spark pool notebooks using data in Azure Synapse. For example, they could create an unsupervised model to segment customers for use in driving different marketing campaigns. Use supervised machine learning to train a model to predict a specific outcome, such as predicting a customer's propensity to churn, or recommending the next best offer for a customer to try to increase their value. The next diagram shows how Azure Synapse Analytics can be leveraged for Azure Machine Learning.
+In another scenario, you can ingest social network or review website data into Data Lake Storage,